University of Virginia Library

Search this document 


  

expand section 
collapse section 
Electronic Computers and Elizabethan Texts by Ephim G. Fogel
 1. 
 2. 
 3. 
 4. 
 5. 
 6. 
 7. 
 8. 
  
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 

expand section 

15

Page 15

Electronic Computers and Elizabethan Texts
by
Ephim G. Fogel

Librarians, archivists, linguists, and students of literature are rapidly coming to realize that electronic computers, or, better, data-processing machines, can help to solve problems in fields ordinarily regarded as remote from the world of advanced technology.[1] Already there is a formidable bibliography of books and articles discussing automation in the library, automatic search, indexing, and abstracting of documents, automatic linguistic analysis, and automatic translation.[2] So far as I know, however, no papers have been published on the application of electronic aids to the solution of problems in Elizabethan scholarship. The chief purpose of the present essay is to stimulate wider discussion of such applications. I shall concentrate mainly on the possible uses of computer-prepared concordances to and magnetic tape files of Elizabethan texts.[3]

I

It has long been recognized that concordances are essential tools in the critical, historical, and philological analysis of literary texts. Until very recently, it was also apparent that anyone who agreed to compile a concordance had assumed an appalling task. "An exhaustive concordance


16

Page 16
to the Bible, such as that of James Strong," John W. Ellison estimates, "takes about a quarter of a century of careful, tedious work to guarantee accuracy."[4] When in February, 1911, Professor Lane Cooper of Cornell University, with the aid of sixty-seven workers, saw the Wordsworth concordance through the press only two years and three months after excerpting of the Hutchinson edition had begun, his achievement was quite properly regarded as remarkable. After tens of thousands of man-hours had been spent in excerpting, alphabetizing, and checking, a concordance-editor was usually compelled to search far and wide for a publisher (the Wordsworth was delayed about nine months until a suitable one could be found) and, often, a handsome subvention. As printing costs soared, large concordances became more and more rare. The only conventionally-produced large concordance to an English or American poet which has appeared since the end of 1941 seems to be Professor Eby's concordance to Whitman's Leaves of Grass and selected prose. This work of 980 pages lists at $25.

That concordance-makers should turn to electro-mechanical and electronic aids was only to be expected. After World War II, there were efforts to compile indexes by the use of punched-card systems. But the limited capacity and sorting speeds of electro-mechanical equipment made the automatic production of very large concordances impractical. In the last few years, therefore, researchers have turned to large-scale electronic data-processing machines such as those marketed by Remington Rand and IBM.[5] The year 1957 witnessed three independent developments: Paul Tasman, with the collaboration of Rev. Roberto Busa, S. J., worked out a program for indexing the words in the Dead Sea Scrolls on the IBM 705;[6] John W. Ellison brought out Nelson's Complete Concordance to the Revised Standard Version Bible, automatically indexed by the Remington Rand Univac I; and Cornell University launched a program for a computer-produced series of concordances, with Stephen M. Parrish as General Editor.


17

Page 17

In the same year, the University of California published the late Guy Montgomery's concordance to Dryden's poetry. Since this cumbersome oddity has given many of its users an erroneous impression of what a machine-prepared concordance looks like, it deserves some mention here. One must emphasize that it is not at all an electronically produced work and indeed only in small part an electro-mechanically produced one. When Professor Montgomery died in 1951, he left 240,000 manually indexed cards based on Noyes's edition of Dryden's complete poetical works. Out of the decision to use accounting machines to help in checking these cards grew the decision to print by offset from IBM sheets a list of index-words with abbreviated references to the places where they occurred, but without any context whatsoever. A sample entry from page 1 of the resulting concordance will indicate the difficulties that confront the user:

ABIDE     HAP 1928
For each such entry under ABIDE, the reader must consult the prefatory list of full titles geared to the cryptic symbols. He will then learn that HAP stands for "The Hind and the Panther" and that the poem begins on page 218 of Noyes. He must next turn to that page and move forward until he locates line 1928, "No Martin there in winter shall abide," in column B of page 243. But his work is just beginning. In order to ascertain Dryden's various uses of ABIDE (eighteen instances), he must either write out each line as he locates it or else jot down all the page and line numbers of Noyes in which ABIDE occurs and riffle the pages back and forth as he tries to compare instances. One doesn't like to think of the agonies of a reader who wishes to locate and analyze occurrences of the fifty-seven words in "Dryden's major vocabulary" which, according to the preface, occur "from 400 to 1,100 times apiece."

Professor Parrish's Concordance to the Poems of Matthew Arnold (Ithaca, 1959) shows that a concordance compiled and printed by electronic data-processing machines (in this case the IBM 704) can give as complete a verse-context and an array of identifying data as the manually compiled type. The first three entries under ABIDE will indicate the advantages of the Arnold:

  • OTHERS ABIDE OUR QUESTION THOU ART FREE . . 2 SHAKESPEARE 1
  • HE ESCAPES THENCE BUT WE ABIDE . . . . . . . . . 58 RESIGNATION 213
  • THE LAW IS PLANTED TO ABIDE . . . . . . . . . . . . . 94 SICK KING BOKH 208
Here the concordance provides a full line of context for each instance of the index-word and prints the instances in the order of their occurrence

18

Page 18
in Tinker and Lowry's edition of Arnold's Poetical Works. The identifying information to the right gives the page of Tinker and Lowry on which the line appears, then the title of the poem in which it occurs, or a rather full and readily understood abbreviation, and lastly the line number. In most cases, the reader will probably be able to determine the different uses of the index-word from the entries themselves, but if he should wish to consult an even fuller context, he can immediately turn to the indicated page of Tinker and Lowry. An appendix gives a helpful index of words in order of their frequency in Arnold's text. The production of this volume of 965 pages required some two hundred hours of card-punching, tape-recording, data-processing, and listing.[7] The IBM sheets were then reproduced by offset and bound in an attractive volume which is priced at $10.

Cornell concordances to follow the Arnold will incorporate refinements as rapidly as they are developed. Special print-wheels will provide a full array of punctuation marks and of characters such as the thorn and the ligatures (the Arnold has only the hyphen). Presently available techniques can instruct computers to discriminate between homographs and print them under separate headings, to cross-index hyphenated words, and, for earlier poets, to collect the old-spelling variants of a single word under their modern-spelling equivalent, as in Osgood's Spenser or Tatlock and Kennedy's Chaucer.

New possibilities in concordance-making and in other kinds of literary data-processing will doubtless emerge as computers rapidly become more and more complex, swift, and powerful. "The latest [computers]," writes Ritchie Calder, "are a thousand times faster than those of three years ago and a million times faster than those of ten years ago," and he reports that in June, 1959, in Paris, at an International Conference on Information Processing, scientists seriously discussed "machines which would memorize all the knowledge in the world."[8] One's mind reels and retreats to somewhat less staggering fantasies in which the C. W. Wallaces and Leslie Hotsons of the twenty-first century, working in American repositories, ask computers to search magnetic tapes of British archives for all occurrences of names with, say, the components Sh, k, sp, r or M, r, l. A daydream high fantastical, perhaps; yet the photoduplication during World War II of a vast number of British


19

Page 19
documents, now available at the Library of Congress on microfilm, provides a notable precedent for the internationalization of archives.[9]

But consideration of what can be done now is likely to be more fruitful than heady speculations about the future. Literary scholars should give earnest thought to making use of the machines available to them on their campuses: much can be done even with small-scale computers or punch-card and perforated-tape equipment.[10] Efforts should be coordinated in order to determine important needs in different specialties, to prevent duplication of work at different universities, and to disseminate information about new developments in the processing of literary texts. In this connection I am authorized to state that the Department of English at Cornell University will be glad to share its experience in preparing concordances by computer, and its knowledge of work being done at other centers, with those who may be ready to embark on projects of their own.

II

The scholar is the key person in the development of specific programs to process literary data. It is he who must define goals for research and arrive at the most rational procedures for achieving these goals. His indispensable colleague, the computer-engineer, cannot move forward until the scholar himself knows what he wants to do. On the other hand, the scholar must have some awareness of how a literary text is prepared for computer-processing. I shall limit myself here to a simplified, non-technical outline of the steps required to record a text on magnetic tape.

  • i. Having selected a base-text, the scholar edits it for punching.
  • ii. Working at a machine with a conventional typewriter keyboard, an operator punches the text on cards. Each card contains a line of poetry or a similar amount of prose; the punched text is automatically recorded in print at the top of the card.
  • iii. The cards are verified to insure accuracy of transcription. In this process, a second operator punches the same text on the already punched cards. A light flashes if there is any discrepancy between her punch and that

    20

    Page 20
    of the first operator; she then pulls the card in question, checks the print for errors, and punches the line correctly.
  • iv. Identifying data (page of base-edition and line of poem) are automatically punched onto each card in the set; title cards separately punched for each poem are introduced into the set.
  • v. The information on each card is transferred seriatim onto a magnetic master-tape and can then be processed on a computer according to a previously designed program. When the computer-run is finished, the master-tape can be stored indefinitely, or processed again as required. Through the use of other tapes, the recorded data can be altered so that a fresh master-tape is produced.

It will be apparent that the master-tape is in many ways more important than any single list of analyzed data which can be automatically printed from it. The tape is a compact, permanent record. It gives the editor of a computer-prepared concordance, for example, much greater flexibility than the editor of a manually prepared work can enjoy. If he decides at the last moment to include five "common" words that he had originally planned not to print, he need only instruct the computer to add those words to its processing list. And long after his concordance is published, he can quickly retrieve from the tape any verbal data excluded from the book: "the [IBM 704] computer can locate all occurrences of even a high-frequency word in about 20 minutes."[11]

So far as data-processing equipment is concerned, then, the tasks of Elizabethan scholarship in the coming years may be defined as the recording on master-tapes of the widest possible array of literary works in their most authoritative and most usable textual form; the duplication and depositing of such tapes in key centers of scholarship; the searching of the tapes on request to provide individual scholars with information that will increase the comprehensiveness and validity of their conclusions; and the selective publishing of machine-prints made from these tapes (concordances, lexicons, frequency lists, textual collations, etc.) so as to serve the needs of the profession as a whole. It will of course be necessary for appropriate groups of scholars to rationalize and allocate these labors.

For the rest of this paper, I should like to suggest the kinds of aid that philology, textual criticism, concordance-making, enumerative bibliography, and canonical studies can expect from data-processing machines. I am obviously taking on more than can be handled by any man, unless there exists somewhere a Hercules who is both a computerengineer


21

Page 21
and a master of the immense domain of Elizabethan scholarship. It will be understood, then, that the following remarks, whether they assume an imperative or interrogative form, are provisional. They are meant rather to raise questions for discussion than to try to supply definitive answers.

Philology

The philological and linguistic applications of computers have been much discussed. Various classes of documents from different historical periods and linguistic communities can be recorded on tapes which computers can then process to produce indexes of graphic and graphicsemantic forms, with accompanying context. As a result, philological studies can be more comprehensive and exact than in the past. Anyone who has come upon instances of a usage earlier than those recorded by the monumental Oxford English Dictionary, and upon other usages that are not recorded at all, can testify to the need for a complete and accurate dictionary of Elizabethan English. Computers could speed the publication of such a dictionary.

Textual Criticism

An editor preparing a critical old-spelling edition of an Elizabethan poet or dramatist must process an enormous amount of literary data. Collation of early printed editions can be facilitated by machines, as Professor Charlton J. K. Hinman has demonstrated in his collation of dozens of copies of the First Folio. Future editors will also wish to explore the possibilities of computer collation. The more complicated the textual tradition, the more the scholar will appreciate electronic aid in reconstructing a stemma.[12] Again, every editor has to analyze such matters as characteristic locutions and linguistic preferences through all of his author's extant writings, as well as the spelling of any surviving holographs, so that he can decide to what extent a base-text which is not a holograph, and perhaps was not transcribed or printed from a holograph, represents his author's idioms and orthography. In the past, an editor has had to depend upon his memory, at best an incomplete and unreliable guide, or to compile by hand a private concordance, as it were, an index of his author's graphic-semantic patterns. Computers can relieve him of this labor, which adds far too much to his already heavy burdens. Accurate and complete counts of an author's particular word-sequences can also help to detect contaminations. On the basis of such analyses, computers can automatically fill in lacunae or offer conjectural


22

Page 22
emendations of corrupt or suspect passages;[13] these reconstructions may then serve as a check on the editor's conjectures, or they may stimulate further insights. It is not, of course, a question of a machine's replacing the judgments of a Greg or a Grierson, but of freeing future Gregs and Griersons from the mechanical drudgery that must precede final editorial judgment.

Since a concordance is valuable for textual criticism, we may expect that editors of Elizabethan dramatists and poets will also edit concordances to their authors. A preliminary tape that will help the editor to establish his text can be corrected to embody final editorial decisions and a concordance can then be published by offset from machine-prints.

Concordance-Making

Here I should like to pose a series of questions. Some of the answers given below have been worked out during preliminary preparations for a concordance to the poems of Ben Jonson, to be edited by Professor Parrish and myself. None of the answers, however, are necessarily final, and I should appreciate comments and suggestions.

    1. What kind of base-text should one use?

  • A concordance will have the greatest value for philologists, editors, and canonical scholars if it is based on a definitive edition, preferably in old spelling. The concordance can then refer to the page and line numbers of a readily available standard work and can include editorial emendations as well as authorial variants. If there is no definitive edition, it might be possible to compile a concordance from an early printed text, provided that an acceptable photoduplicate of that text has been published.[14] In that case, the concordance-maker can identify citations by referring to signature and line number of page or column, in the manner of Professor Hinman's references to lines in the Shakespeare First Folio. But reference to authorial variants in other early texts (one thinks of Daniel's and Drayton's frequent revisions) and to modern emendations will perhaps pose a problem.

    If an acceptable photoduplicate or a definitive edition is unavailable, the concordance-maker should probably pass on to another author. To provide the general reader with references to a virtually inaccessible text is of little use, and to base a concordance on an


    23

    Page 23
    inadequate edition is unwise. If there were a Jonson concordance based on the Gifford text, it would of course be helpful, but it would have to be redone now that the Herford and Simpson text is available, even as Bartlett's Shakespeare will probably have to be redone when a critical old-spelling edition appears.

  • 2. To what extent should one normalize the text?

  • It seems to serve no useful purpose and is in some cases impossible to retain scribal abbreviations. On the other hand, to normalize i-j and u-v according to modern usage, if the base-edition has not done so, seems to require excessive intervention extending to many lines in every poem or passage of dialogue. The automatic collection of variant spellings under a single head-word will assure that such forms as IELOSIE and IOYND will be conveniently indexed under their modern equivalents.
  • 3. What about textual variants and emendations?

  • As has been indicated, one should include authorial variants. Both a textual crux and the emendation adopted by the editor of the base-text should be indexed. Variants and emendations should be labeled as such (in Professor Parrish's Arnold, a "V" for "variant" precedes the line number).

    The concordance-maker is not the editor of a critical edition, but he should correct obvious misprints in his base-text and include variants unavailable to or perhaps overlooked by the editor. Sometimes he may have to display the courage of an editor's convictions. The Herford and Simpson text, for example, reproduces in square brackets "a letter or word wrongly inserted in the original." There is no point in indexing such a word; in our Jonson concordance, we have substituted the reading which Herford and Simpson indicate as correct.

  • 4. Should stage directions be indexed?

  • By all means. Stage directions are important elements in plays, masques, pageants, and entertainments. But Ariel's making the banquet vanish with a quaint device and Jack Cade's striking his staff on London stone cannot be located in Bartlett's concordance, which omits all stage directions, as do Crawford's Kyd and Marlowe. Lists of dramatis personae in the early prints should also be included; that such lists call Shakespeare's Lucio "a fantastique" and his Apemantus "a Churlish Philosopher" is surely worthy of alphabetized record. Whether one should include in the same index with

    24

    Page 24
    the dialogue and the stage directions the copious marginalia which edify the reader of Jonson's masques seems rather more doubtful.
  • 5. How comprehensive should the index be?

  • This is one of the most difficult questions confronting the concordance-maker. Every concordance leaves out all or most of the instances of many common words such as prepositions, articles, pronouns, and auxiliary verbs. In general, computer-prepared concordances will have to follow suit: common words make up more than half of the individual words of any text; a decision to index all of them may push some computers beyond their capacity, will in any case materially increase the running time of a very busy and very expensive machine, and will swell the printed version of, say, a Shakespeare or Jonson or Bible concordance to grotesque proportions. According to John W. Ellison, 131 common words "account for approximately 59% of the text of the Bible," and the large Nelson Bible Concordance would have been "two and a half times its present size" if these words had been indexed (Preface).

    Yet who is to say that even the commonest word is without poetic or dramatic significance? LIKE, THAN, AS, and SO can lead us directly to the poet's similes; I and related forms to his use of an autobiographical mask and his personifications ("I bring fresh showers for the thirsting flowers"); O and THOU to his apostrophes ("O wild West Wind, thou breath of Autumn's being"); ME, THEE, and HIM to striking inversions of word order ("Him the Almighty Power/ Hurled headlong flaming from th' ethereal sky"). Philological interests also press their claims. Tatlock and Kennedy index all instances of SHALL and WILL "owing to the importance of these words for the history of the future tense" (preface to the Chaucer, p. viii). The Elizabethan philologist may point out, further, that complete omission of the following common words will deprive him of an opportunity for rapid location of the special meanings indicated parenthetically: A (he), AN, AND (if), FROM (at variance with, alien to), ON (of), SHE (woman), WHETHER (which of the two). A canonical scholar may object that failure to list a dramatist's common contractions or his uses of YOU and YE will compel him to duplicate the arduous labors of Cyrus Hoy in compiling tables of linguistic preferences so as to discriminate between different authors in collaborate plays.[15]


    25

    Page 25

    All true enough. But a concordance to a prolific dramatist will nevertheless have to exclude some of these words and list others only in part. Consider the ubiquitous I. In the 17,500 lines of Arnold's poetry there are more than a thousand I's, and their listing takes up almost twelve pages. In the more than 100,000 lines of Shakespeare's plays, there are probably many times that number. Will the reproduction of all these instances yield advantages proportional to the space required? The concordance-maker will have to answer many such painful questions. At least he can assure fellow scholars that omitted index-words can be retrieved from the master-tape at some later time.

    I should like to plead, however, for the routine printing in all drama concordances of such common words as ALL, ANY, NEVER, NONE. These terse counters can contribute greatly to dramatic magnitude and intensity. Moreover, in drama as in life, the extent to which a person makes categorical statements is an important clue to the quality of his mind. I have the impression, for example, that Hamlet makes more all-or-nothing assertions ("Thus conscience does make cowards of us all." "We are arrant knaves all; believe none of us.") than does any other Shakespearean character. But I cannot verify my impression in Bartlett, since it entirely omits ALL and gives only a partial listing of NONE.[16] It is doubtless a weakness on my part, but I have thus far been unwilling to make up the deficiencies in the available concordances by tracking all instances of these words through thirty-six plays.

  • 6. Should a writer's dramas and poems be indexed separately?

  • Bartlett's Shakespeare separates drama and poetry; Crawford's Marlowe combines them. Combination seems appropriate for a moderately productive writer. Separation seems desirable when a writer is prolific in one of the genres and almost inevitable when he is prolific in both (cf. Dryden). Separation facilitates study of the verbal artistry appropriate to each genre and enables the compiler to make separate decisions about comprehensiveness (e.g., to omit I from the drama but include it in the poetry concordance).
  • 7. To what extent should disputes about authorship determine the design of the concordance?

  • Crawford's Kyd and Marlowe concordances are both "designed

    26

    Page 26
    to be helpful to students who wish to study" questions of authorship (Marlowe, p. vii). In the Kyd proper, Crawford includes Arden of Feversham, which he believes is Kyd's; in an appendix he indexes the first two quartos and the folio version of Hamlet in order to lighten "the labour of those who are interested in investigating the claim of Kyd to the Ur-Hamlet." In the Marlowe proper, he includes the three parts of Henry VI, Edward III, Selimus, and Locrine, which last, he believes, is certainly not Marlowe's but has borrowed heavily from his work.

    These procedures are indefensible. It is not the concordancemaker's office to argue for or against a disputed attribution. "That task," as Sister Eugenia Logan rightly observes in her Coleridge concordance (p. ix), "belongs in another field of scholarship." Where an attribution has in its favor evidence approaching certainty, the concordance-maker should include the attributed work. Where the evidence is weak, he should exclude the work. Where the evidence is highly probable but not certain, he should index the work and indicate its status, perhaps by an appropriate symbol. Apparently distinguishable portions of collaborate plays should be analyzed not in separate concordances, but in a single concordance bearing the names of the collaborators.[17] Anonymous plays and plays whose authorship is in serious dispute should be left for the last, and should be grouped in concordances of convenient size according to chronology of composition, as nearly as that can be determined.

Enumerative Bibliography

The concordance principle has been successfully applied to the production of fully analyzed enumerative bibliographies in the fields of chemistry and physics.[18] A new publication called Chemical Titles indexes the key words of titles of articles in 550 journals so that each key word appears in context in a concordance of key words. The bibliography in each issue consists of two parts: a list of articles alphabetized according to author, with full titles and publication data, and a concordance of key title-words in context, with an easily interpreted identifying code that provides a cross-reference to the first part. On the


27

Page 27
average, there are 5.3 concordance-entries per title. The indexing of key words is entirely automatic. As soon as the journals are received, the titles of articles are transcribed into machine-readable form by punch-card operators. When the file of punch-cards is complete, it is transferred to magnetic tape, which is then processed by an IBM 704. Omitting non-distinctive words such as OF, ON, AND, EFFECTS, ANALYSIS, CHEMISTRY by referring to a dictionary of excluded terms stored in its core memory, the computer edits the materials on the tape in 12 minutes; auxiliary equipment arranges and prints the bibliography in 18 to 20 hours; altogether no more than 21 days elapse between the time the journals are received and the time Chemical Titles is published.

Leading scholarly organizations should seriously consider the production of bibliographies by computer. Apart from the rapidity of its appearance in print and its comprehensiveness, a computer-produced bibliography in the field of English literature will satisfy the criterion of full analysis more completely than any of the presently available bibliographies. Each item will appear under various subject-headings, so that a scholar interested in a subject covered only in part in a given book, chapter, or article will find a reference to that source under the subject-heading of his interest. A title such as "Hamlet, Antonio's Revenge, and the Ur-Hamlet"[19] ought rightly to appear not only under "Shakespeare" but also under "Marston" and under the key word "Revenge," so that the article will be brought to the attention of anyone interested in the theme of revenge in Elizabethan literature. In the latest PMLA bibliography, the title seems to appear only under "Shakespeare."[20] I am not, of course, singling out the PMLA bibliographies for special criticism. Within the limits of their chosen form, they are admirably comprehensive, and they offer many cross-references; the instance just mentioned is doubtless atypical. My point, rather, is that all of the present bibliographies are subject to human error which can be much reduced by computer techniques and that none of them meet the criterion of full analysis, which computers can easily satisfy.

A certain amount of processing will probably be necessary before one can be certain that a reference will appear under all appropriate rubrics. If "Shakespeare" and "Marston" were inserted in square brackets in the title cited above, the article would be automatically indexed under those names. For purposes of subject-analysis, it would help if scholars curbed their metaphorical propensities and made their


28

Page 28
titles as nearly descriptive as possible. But even such titles as The Unicorn and the Crocodile can be made to yield their literal contents, sometimes by reference to the descriptive subtitle (A Study of Allegorical Motifs in Medieval and Renaissance Painting and Literature), sometimes by a little effort on the part of the scholars who will review the titles before passing them on to punch-card operators.

May one take this opportunity to plead for a coordinated effort to satisfy still another criterion of enumerative bibliography, the criterion of efficiency or non-duplication? As printed materials in active fields of research increase exponentially, it becomes increasingly wasteful for independent groups of workers to prepare largely identical bibliographies. The devotion of scholars who spend long hours compiling indexes of current research is impressive, but it is disheartening to think of the extravagant repetition of routine tasks. Does it really serve the needs of our profession to produce half a dozen annual bibliographies of Shakespearean scholarship? Would not one bibliography — complete, fully analyzed, and swiftly produced by data-processing machines — suffice?

A bibliography becomes even more useful when it provides a brief summary of the contents of a work. Since the beginning of 1958, English Abstracts has been filling a serious gap in research resources. But this excellent publication may some day be confronted by grave problems. Its coverage has been growing constantly and gives every promise of continuing to do so. In January, 1958, the journal listed 32 abstractors on its cover; in December, 1960, it listed 122, an increase of almost 400 per cent. To be sure, the number of items abstracted did not increase by so large a factor. But it did increase very considerably. The first three issues of 1958 printed 426 abstracts on 79 pages; the last three issues of 1960 printed 678 abstracts on 144 pages — an increase, in less than three years, of about 35 per cent in the number of items and 80 per cent in the number of pages. In the not so distant future, English Abstracts, like its kindred services in scientific fields, may be forced to seek machine aid to avoid being engulfed by a tidal wave of publications. Should such a need arise, there is a good chance that data-processing machines will be able to meet it. H. P. Luhn, the IBM engineer whose research made possible the concordance-index of Chemical Titles, has already conducted successful experiments in automatic abstracting.[21] Before very many years pass, text-reading machines may


29

Page 29
be scanning printed articles, encoding them on magnetic tape, and producing a rapid succession of automatic abstracts — in translation, where necessary.

Studies in Attribution

Concordances and magnetic tape files will obviously facilitate the gathering of internal evidence for the solution of canonical problems. They will enable one to find parallels more rapidly and to make various special checks. Professor R. C. Bald, for example, observes that Hand D in The Booke of Sir Thomas More makes likely a "graphic confusion between x and y" and that such a confusion seems to have occurred in Troilus and Cressida v.i.16, where "the Quarto reads 'box' [and] the Folio corrects to 'boy.'"[22] J. Dover Wilson has collected similar examples of misreadings which could easily have resulted if a good quarto was printed from copy in a hand such as D's.[23] If all of the Shakespeare quartos and the First Folio were recorded on magnetic tape, one could ask a computer to sort out all words that ended in x and y and other easily confused characters. Or one could ask it to search tapes of other Elizabethan dramatists for complete lists of linguistic preferences such as Professor Hoy used to determine the shares of various collaborators in the Beaumont and Fletcher canon. Freed from the tedium of amassing examples, scholars could devote their higher energies to the interpretation of evidence retrieved and classified by machines.[24]

Again, the more concordances there are, the easier it will be to make negative checks — to show that a seemingly unusual parallel occurs in many writers and is not therefore probative of a particular author's claim to an anonymous work. One hopes that the use of parallels, whether for purposes of proof or disproof, will cease to be fragmentary and unsystematic. If we had reliable information about the average frequencies of certain locutions in the vocabularies of educated men using certain forms of discourse at a certain time, the coincidence of many above- or below-average occurrences of even common phrases


30

Page 30
might become probative of authorship. Perhaps the accumulation of linguistic frequencies by computers will encourage mathematicians with an interest in literature or literary scholars with a flair for mathematics to push onward in the directions indicated by G. Udny Yule's The Statistical Study of Literary Vocabulary.[25]

Unfortunately, however, electronic computers and their printed products will probably fail to discourage some scholars from playing the game of parallels badly. Those who in the past have been intent on parading insignificant agreements between two texts as strong arguments for common authorship have seldom taken the trouble to make negative checks in available concordances. Will a special pleader in a hurry pause to reflect merely because aids to reflection are more abundant? "It is the peculiar and perpetual error of the human intellect," Francis Bacon warns us (Novum Organum, I, xlvi), "to be more moved and excited by affirmatives than by negatives; whereas it ought properly to hold itself indifferently disposed towards both alike." "Indeed," he adds, "in the establishment of any true axiom, the negative instance is the more forcible of the two." Computers will not do away with the Idols of the Tribe; to guard against such illusions is the province of education in the spirit of scholarly and scientific argument. With the spread of that spirit, one may hope with Bacon (I, cxxx) "that the art of discovery may advance as discoveries advance."

Meanwhile, one trusts that more and more scholars will find ways to advance Elizabethan studies by enlisting the aid of electronic dataprocessing machines. The chief barrier to such an effort is likely to be a lingering suspicion that these machines are somehow baleful, that they somehow constitute a threat to the humanist's distinctive values. But such fears are groundless; they can only be damaging to the progress of Elizabethan and indeed of all humane studies. It is surely inhumane to scorn mechanical aids which by releasing from soul-killing drudgery that most remarkable of all instruments, the brain, free it for its proper function — the enlargement of man's intellectual and spiritual realms through the use of creative intelligence. It may be appropriate to conclude with a striking Elizabethan example of humanist initiative and persistence in making available a novel means for the


31

Page 31
achievement of a noble end.[26] On March 16, 1542/3, the musician John Marbeck, organist at St. George's Chapel, Windsor, was arrested for possessing heretical writings, among which were materials for a concordance to the English Bible. On July 26, 1544, Marbeck was found guilty of heresy and was sentenced to die at the stake the following day. Fortunately for music and scholarship, however, he was pardoned by Henry VIII and was released from prison. When the accession of Edward VI created a friendlier climate for innovation, Marbeck again took up his suppressed project. In July of 1550 he at long last published A Concordāce, that is to saie a worke wherein by the ordre of the letters of the A. B. C. ye maie redely finde any word conteigned in the whole Bible, so often as it is there expressed or mencioned. Elizabethan scholars may well take inspiration from this Elizabethan precedent.

Notes

 
[1]

The present essay is a revised version of a paper written for the Report of M.L.A. Conference 20 (Opportunities for Research in Renaissance Drama) and delivered before the Conference on Dec. 28, 1960. I am indebted to my colleague Professor Stephen M. Parrish, who introduced me to the mysteries of computers and who has answered my queries with invariable kindness and lucidity.

[2]

See the classified bibliography in B. Quemada, "La Mécanisation dans les Recherches Lexicologiques," in the Univ. of Besancon Cahiers de Lexicologie, I (1959), 41-46. See also Proceedings of the International Conference on Scientific Information, 2 vols. (1959); Martha Boaz, ed., Modern Trends in Documentation (1959); M. E. Maron, "Handling of Non-Numerical Information," Chap. 11 in Vol. 2 of Handbook of Automation, Computation, and Control, ed. Eugene M. Grabbe et al. (1959); and for a non-technical discussion of information-retrieval systems, Francis Bello, "How to Cope with Information," Fortune, LII (Sept., 1960), 162-167, 180-192.

[3]

I use "Elizabethan" as a convenient term referring to the period from Wyatt to Milton.

[4]

Preface to Nelson's Complete Concordance to the Revised Standard Version Bible (1957).

[5]

In "Literary Data Processing," IBM Journal of Research and Development, I (1957), 256, Paul Tasman gives comparative figures for compiling a lexicon file index and concordance to the Summa Theologica of St. Thomas Aquinas (a work of about 2,000 pages and almost 1,600,000 words) : manual method—3 persons, 20,000 hours; punched-card method—3 persons, 1,000 hours; large-scale data-processing method—1 person, 60 hours, "exclusive of the presentation and programming time." ("Programming" means devising a sequence of operations so that a computer can perform a particular job of data-processing.) For discussions of techiques involving smallscale equipment, see n. 10, below.

[6]

See Tasman (n. 5, above), and Busa, "The Index of all Non-Biblical Dead Sea Scrolls Published up to December, 1957," Revue de Qumran, I (1958), 187-198.

[7]

For a breakdown of the time, which is, again, exclusive of editorial work and of programming, see the preface to the Arnold, pp. vii-viii; for an account of the programming, see James A. Painter, "Computer Preparation of a Poetry Concordance," Communications of the A[ssociation for] C[omputing] M[achinery], III (1960), 91-95.

[8]

The Unesco Courier, Jan., 1960, pp. 26-27.

[9]

See British Manuscripts Project (1955); this insufficiently known checklist of the microfilms, compiled by Lester K. Born, is sold by the Photoduplication Service of the Library of Congress.

[10]

See the extended discussion by Quemada, "La Mécanisation," pp. 9-33. Cf. also the useful mimeographed Reports of the Groth Institute, founded by Professor Ray Pepinsky at the Pennsylvania State University. With the aim of preparing a revised edition of Paul von Groth's encyclopedia of crystallography "in perhaps a hundred volumes," Professor Pepinsky and his co-workers have developed effective techniques of indexing and informationretrieval using inexpensive electro-mechanical equipment: see esp. Reports Nos. 40, 41, 44-48, 53.

[11]

Parrish preface to the Arnold p. xiii, n. 1.

[12]

John W. Ellison, for example, is planning to collate electronically 800 versions of the Greek text of the Bible.

[13]

Tasman, p. 256, mentions reconstructions of lacunae in the Dead Sea Scrolls. "Up to five consecutive words," he reports, "have been 're-written' by the data processing machine in experimental tests where the words were intentionally left out of the text and blank spots indicated."

[14]

I am indebted to Professor Fredson Bowers for this suggestion.

[15]

See Hoy's "The Shares of Fletcher and his Collaborators in the Beaumont and Fletcher Canon," SB, VIII (1956), 129-146, IX (1957), 143-162, XI (1958), 85-106, XII (1959), 91-116, and XIII (1960), 77-108.

[16]

Neither word is indexed in Mrs. Cowden Clarke's concordance; except for a few inadvertent omissions, all occurrences in Hamlet only are listed in the Appendix to Crawford's Kyd concordance.

[17]

Here as elsewhere, procedure will have to be flexible. It would probably be advisable to include all the plays in the "Beaumont and Fletcher" canon in one concordance. Again, if a successor to Bartlett believes that hands other than Shakespeare's are present in Henry VIII and Pericles, he should nevertheless include these two plays in his concordance and content himself with stating his views in the preface.

[18]

See "Chemical Literature Gets a Quicker Index," Chemical and Engineering News, XXXVIII (April 4, 1960), 27-28.

[19]

John Harrington Smith et al., SQ, IX (1958), 493-498.

[20]

LXXV, No. 2 (May, 1960), 203.

[21]

See Luhn's "Auto-Encoding of Documents for Information Retrieval Systems," in Boaz, Trends in Documentation, pp. 45-58.

[22]

"The Booke of Sir Thomas More and its Problems," Shakespeare Survey, II (1949), 58.

[23]

"Bibliographical Links Between the Three Pages and the Good Quartos," in A. W. Pollard et al., Shakespeare's Hand in "Sir Thomas More" (1923), pp. 113-141.

[24]

The kinds of evidence bearing on attribution are more various than can be discussed here. By counting distances between punctuation marks, a computer can gather statistics about sentence-length and sentence-segmentation. By collecting all words with certain medial or terminal letters, it can help one to establish authorial or compositorial spellings. By retrieving blank-verse lines with two- or three-letter final words, it can provide information about weak endings. The scholar armed with knowledge of a computer's capabilities will readily think of ways of exploiting them.

[25]

Since I wrote these remarks, I have learned that Professors Frederick Mosteller and David L. Wallace, using computers, have applied statistical methods to the determination of the authorship of the disputed Federalist papers. A report on their work will appear shortly in a book on the Harvard Computer Symposium.

[26]

I am indebted to Professor J. B. Bessinger, Jr., editor of the forthcoming Cornell Concordance to Old English poetry, for calling my attention to Marbeck's concordance. The biographical information that follows is taken from the DNB.


32

Page 32