| ||
Electronic Computers and Elizabethan
Texts
by
Ephim G. Fogel
Librarians, archivists, linguists, and students of literature are rapidly coming to realize that electronic computers, or, better, data-processing machines, can help to solve problems in fields ordinarily regarded as remote from the world of advanced technology.[1] Already there is a formidable bibliography of books and articles discussing automation in the library, automatic search, indexing, and abstracting of documents, automatic linguistic analysis, and automatic translation.[2] So far as I know, however, no papers have been published on the application of electronic aids to the solution of problems in Elizabethan scholarship. The chief purpose of the present essay is to stimulate wider discussion of such applications. I shall concentrate mainly on the possible uses of computer-prepared concordances to and magnetic tape files of Elizabethan texts.[3]
I
It has long been recognized that concordances are essential tools in the critical, historical, and philological analysis of literary texts. Until very recently, it was also apparent that anyone who agreed to compile a concordance had assumed an appalling task. "An exhaustive concordance
That concordance-makers should turn to electro-mechanical and electronic aids was only to be expected. After World War II, there were efforts to compile indexes by the use of punched-card systems. But the limited capacity and sorting speeds of electro-mechanical equipment made the automatic production of very large concordances impractical. In the last few years, therefore, researchers have turned to large-scale electronic data-processing machines such as those marketed by Remington Rand and IBM.[5] The year 1957 witnessed three independent developments: Paul Tasman, with the collaboration of Rev. Roberto Busa, S. J., worked out a program for indexing the words in the Dead Sea Scrolls on the IBM 705;[6] John W. Ellison brought out Nelson's Complete Concordance to the Revised Standard Version Bible, automatically indexed by the Remington Rand Univac I; and Cornell University launched a program for a computer-produced series of concordances, with Stephen M. Parrish as General Editor.
In the same year, the University of California published the late Guy Montgomery's concordance to Dryden's poetry. Since this cumbersome oddity has given many of its users an erroneous impression of what a machine-prepared concordance looks like, it deserves some mention here. One must emphasize that it is not at all an electronically produced work and indeed only in small part an electro-mechanically produced one. When Professor Montgomery died in 1951, he left 240,000 manually indexed cards based on Noyes's edition of Dryden's complete poetical works. Out of the decision to use accounting machines to help in checking these cards grew the decision to print by offset from IBM sheets a list of index-words with abbreviated references to the places where they occurred, but without any context whatsoever. A sample entry from page 1 of the resulting concordance will indicate the difficulties that confront the user:
Professor Parrish's Concordance to the Poems of Matthew
Arnold (Ithaca, 1959) shows that a concordance compiled and
printed
by electronic data-processing machines (in this case the IBM 704) can give
as complete a verse-context and an array of identifying data as the manually
compiled type. The first three entries under ABIDE will indicate the
advantages of the Arnold:
- OTHERS ABIDE OUR QUESTION THOU ART FREE . . 2 SHAKESPEARE 1
- HE ESCAPES THENCE BUT WE ABIDE . . . . . . . . . 58 RESIGNATION 213
- THE LAW IS PLANTED TO ABIDE . . . . . . . . . . . . . 94 SICK KING BOKH 208
Cornell concordances to follow the Arnold will incorporate refinements as rapidly as they are developed. Special print-wheels will provide a full array of punctuation marks and of characters such as the thorn and the ligatures (the Arnold has only the hyphen). Presently available techniques can instruct computers to discriminate between homographs and print them under separate headings, to cross-index hyphenated words, and, for earlier poets, to collect the old-spelling variants of a single word under their modern-spelling equivalent, as in Osgood's Spenser or Tatlock and Kennedy's Chaucer.
New possibilities in concordance-making and in other kinds of literary data-processing will doubtless emerge as computers rapidly become more and more complex, swift, and powerful. "The latest [computers]," writes Ritchie Calder, "are a thousand times faster than those of three years ago and a million times faster than those of ten years ago," and he reports that in June, 1959, in Paris, at an International Conference on Information Processing, scientists seriously discussed "machines which would memorize all the knowledge in the world."[8] One's mind reels and retreats to somewhat less staggering fantasies in which the C. W. Wallaces and Leslie Hotsons of the twenty-first century, working in American repositories, ask computers to search magnetic tapes of British archives for all occurrences of names with, say, the components Sh, k, sp, r or M, r, l. A daydream high fantastical, perhaps; yet the photoduplication during World War II of a vast number of British
But consideration of what can be done now is likely to be more fruitful than heady speculations about the future. Literary scholars should give earnest thought to making use of the machines available to them on their campuses: much can be done even with small-scale computers or punch-card and perforated-tape equipment.[10] Efforts should be coordinated in order to determine important needs in different specialties, to prevent duplication of work at different universities, and to disseminate information about new developments in the processing of literary texts. In this connection I am authorized to state that the Department of English at Cornell University will be glad to share its experience in preparing concordances by computer, and its knowledge of work being done at other centers, with those who may be ready to embark on projects of their own.
II
The scholar is the key person in the development of specific programs to process literary data. It is he who must define goals for research and arrive at the most rational procedures for achieving these goals. His indispensable colleague, the computer-engineer, cannot move forward until the scholar himself knows what he wants to do. On the other hand, the scholar must have some awareness of how a literary text is prepared for computer-processing. I shall limit myself here to a simplified, non-technical outline of the steps required to record a text on magnetic tape.
- i. Having selected a base-text, the scholar edits it for punching.
- ii. Working at a machine with a conventional typewriter keyboard, an operator punches the text on cards. Each card contains a line of poetry or a similar amount of prose; the punched text is automatically recorded in print at the top of the card.
- iii. The cards are verified to insure accuracy of transcription. In
this process, a second operator punches the same text on the already
punched cards. A light flashes if there is any discrepancy between her
punch and that
of the first operator; she then pulls the card in question, checks the print for errors, and punches the line correctly.20
- iv. Identifying data (page of base-edition and line of poem) are automatically punched onto each card in the set; title cards separately punched for each poem are introduced into the set.
- v. The information on each card is transferred seriatim onto a magnetic master-tape and can then be processed on a computer according to a previously designed program. When the computer-run is finished, the master-tape can be stored indefinitely, or processed again as required. Through the use of other tapes, the recorded data can be altered so that a fresh master-tape is produced.
It will be apparent that the master-tape is in many ways more important than any single list of analyzed data which can be automatically printed from it. The tape is a compact, permanent record. It gives the editor of a computer-prepared concordance, for example, much greater flexibility than the editor of a manually prepared work can enjoy. If he decides at the last moment to include five "common" words that he had originally planned not to print, he need only instruct the computer to add those words to its processing list. And long after his concordance is published, he can quickly retrieve from the tape any verbal data excluded from the book: "the [IBM 704] computer can locate all occurrences of even a high-frequency word in about 20 minutes."[11]
So far as data-processing equipment is concerned, then, the tasks of Elizabethan scholarship in the coming years may be defined as the recording on master-tapes of the widest possible array of literary works in their most authoritative and most usable textual form; the duplication and depositing of such tapes in key centers of scholarship; the searching of the tapes on request to provide individual scholars with information that will increase the comprehensiveness and validity of their conclusions; and the selective publishing of machine-prints made from these tapes (concordances, lexicons, frequency lists, textual collations, etc.) so as to serve the needs of the profession as a whole. It will of course be necessary for appropriate groups of scholars to rationalize and allocate these labors.
For the rest of this paper, I should like to suggest the kinds of aid that philology, textual criticism, concordance-making, enumerative bibliography, and canonical studies can expect from data-processing machines. I am obviously taking on more than can be handled by any man, unless there exists somewhere a Hercules who is both a computerengineer
Philology
The philological and linguistic applications of computers have been much discussed. Various classes of documents from different historical periods and linguistic communities can be recorded on tapes which computers can then process to produce indexes of graphic and graphicsemantic forms, with accompanying context. As a result, philological studies can be more comprehensive and exact than in the past. Anyone who has come upon instances of a usage earlier than those recorded by the monumental Oxford English Dictionary, and upon other usages that are not recorded at all, can testify to the need for a complete and accurate dictionary of Elizabethan English. Computers could speed the publication of such a dictionary.
Textual Criticism
An editor preparing a critical old-spelling edition of an Elizabethan poet or dramatist must process an enormous amount of literary data. Collation of early printed editions can be facilitated by machines, as Professor Charlton J. K. Hinman has demonstrated in his collation of dozens of copies of the First Folio. Future editors will also wish to explore the possibilities of computer collation. The more complicated the textual tradition, the more the scholar will appreciate electronic aid in reconstructing a stemma.[12] Again, every editor has to analyze such matters as characteristic locutions and linguistic preferences through all of his author's extant writings, as well as the spelling of any surviving holographs, so that he can decide to what extent a base-text which is not a holograph, and perhaps was not transcribed or printed from a holograph, represents his author's idioms and orthography. In the past, an editor has had to depend upon his memory, at best an incomplete and unreliable guide, or to compile by hand a private concordance, as it were, an index of his author's graphic-semantic patterns. Computers can relieve him of this labor, which adds far too much to his already heavy burdens. Accurate and complete counts of an author's particular word-sequences can also help to detect contaminations. On the basis of such analyses, computers can automatically fill in lacunae or offer conjectural
Since a concordance is valuable for textual criticism, we may expect that editors of Elizabethan dramatists and poets will also edit concordances to their authors. A preliminary tape that will help the editor to establish his text can be corrected to embody final editorial decisions and a concordance can then be published by offset from machine-prints.
Concordance-Making
Here I should like to pose a series of questions. Some of the answers given below have been worked out during preliminary preparations for a concordance to the poems of Ben Jonson, to be edited by Professor Parrish and myself. None of the answers, however, are necessarily final, and I should appreciate comments and suggestions.
- A concordance will have the greatest value for philologists, editors,
and canonical scholars if it is based on a definitive edition, preferably in old
spelling. The concordance can then refer to the page and line numbers of
a readily available standard work and can include editorial emendations as
well as authorial variants. If there is no definitive edition, it might be
possible to compile a concordance from an early printed text, provided that
an acceptable photoduplicate of that text has been published.[14] In that case, the
concordance-maker can
identify citations by referring to signature and line number of page or
column, in the manner of Professor Hinman's references to lines in the
Shakespeare First Folio. But reference to authorial variants in other early
texts (one thinks of Daniel's and Drayton's frequent revisions) and to
modern emendations will perhaps pose a problem.
If an acceptable photoduplicate or a definitive edition is unavailable, the concordance-maker should probably pass on to another author. To provide the general reader with references to a virtually inaccessible text is of little use, and to base a concordance on an
inadequate edition is unwise. If there were a Jonson concordance based on the Gifford text, it would of course be helpful, but it would have to be redone now that the Herford and Simpson text is available, even as Bartlett's Shakespeare will probably have to be redone when a critical old-spelling edition appears.23 - It seems to serve no useful purpose and is in some cases impossible to retain scribal abbreviations. On the other hand, to normalize i-j and u-v according to modern usage, if the base-edition has not done so, seems to require excessive intervention extending to many lines in every poem or passage of dialogue. The automatic collection of variant spellings under a single head-word will assure that such forms as IELOSIE and IOYND will be conveniently indexed under their modern equivalents.
- As has been indicated, one should include authorial variants. Both a
textual crux and the emendation adopted by the editor of the base-text
should be indexed. Variants and emendations should be labeled as such (in
Professor Parrish's Arnold, a "V" for "variant" precedes the line
number).
The concordance-maker is not the editor of a critical edition, but he should correct obvious misprints in his base-text and include variants unavailable to or perhaps overlooked by the editor. Sometimes he may have to display the courage of an editor's convictions. The Herford and Simpson text, for example, reproduces in square brackets "a letter or word wrongly inserted in the original." There is no point in indexing such a word; in our Jonson concordance, we have substituted the reading which Herford and Simpson indicate as correct.
- By all means. Stage directions are important elements in plays,
masques, pageants, and entertainments. But Ariel's making the banquet
vanish with a quaint device and Jack Cade's striking his staff on London
stone cannot be located in Bartlett's concordance, which omits all stage
directions, as do Crawford's Kyd and Marlowe. Lists of dramatis
personae in the early prints should also be included; that such lists
call Shakespeare's Lucio "a fantastique" and his Apemantus "a Churlish
Philosopher" is surely worthy of alphabetized record. Whether one should
include in the same index with
the dialogue and the stage directions the copious marginalia which edify the reader of Jonson's masques seems rather more doubtful.24
- This is one of the most difficult questions confronting the
concordance-maker. Every concordance leaves out all or most of the
instances of many common words such as prepositions, articles, pronouns,
and auxiliary verbs. In general, computer-prepared concordances will have
to follow suit: common words make up more than half of the individual
words of any text; a decision to index all of them may push some
computers beyond their capacity, will in any case materially increase the
running time of a very busy and very expensive machine, and will swell the
printed version of, say, a Shakespeare or Jonson or Bible concordance to
grotesque proportions. According to John W. Ellison, 131 common words
"account for approximately 59% of the text of the Bible," and the large
Nelson Bible Concordance would have been "two and a half times its
present size" if these words had been indexed (Preface).
Yet who is to say that even the commonest word is without poetic or dramatic significance? LIKE, THAN, AS, and SO can lead us directly to the poet's similes; I and related forms to his use of an autobiographical mask and his personifications ("I bring fresh showers for the thirsting flowers"); O and THOU to his apostrophes ("O wild West Wind, thou breath of Autumn's being"); ME, THEE, and HIM to striking inversions of word order ("Him the Almighty Power/ Hurled headlong flaming from th' ethereal sky"). Philological interests also press their claims. Tatlock and Kennedy index all instances of SHALL and WILL "owing to the importance of these words for the history of the future tense" (preface to the Chaucer, p. viii). The Elizabethan philologist may point out, further, that complete omission of the following common words will deprive him of an opportunity for rapid location of the special meanings indicated parenthetically: A (he), AN, AND (if), FROM (at variance with, alien to), ON (of), SHE (woman), WHETHER (which of the two). A canonical scholar may object that failure to list a dramatist's common contractions or his uses of YOU and YE will compel him to duplicate the arduous labors of Cyrus Hoy in compiling tables of linguistic preferences so as to discriminate between different authors in collaborate plays.[15]
25All true enough. But a concordance to a prolific dramatist will nevertheless have to exclude some of these words and list others only in part. Consider the ubiquitous I. In the 17,500 lines of Arnold's poetry there are more than a thousand I's, and their listing takes up almost twelve pages. In the more than 100,000 lines of Shakespeare's plays, there are probably many times that number. Will the reproduction of all these instances yield advantages proportional to the space required? The concordance-maker will have to answer many such painful questions. At least he can assure fellow scholars that omitted index-words can be retrieved from the master-tape at some later time.
I should like to plead, however, for the routine printing in all drama concordances of such common words as ALL, ANY, NEVER, NONE. These terse counters can contribute greatly to dramatic magnitude and intensity. Moreover, in drama as in life, the extent to which a person makes categorical statements is an important clue to the quality of his mind. I have the impression, for example, that Hamlet makes more all-or-nothing assertions ("Thus conscience does make cowards of us all." "We are arrant knaves all; believe none of us.") than does any other Shakespearean character. But I cannot verify my impression in Bartlett, since it entirely omits ALL and gives only a partial listing of NONE.[16] It is doubtless a weakness on my part, but I have thus far been unwilling to make up the deficiencies in the available concordances by tracking all instances of these words through thirty-six plays.
- Bartlett's Shakespeare separates drama and poetry; Crawford's Marlowe combines them. Combination seems appropriate for a moderately productive writer. Separation seems desirable when a writer is prolific in one of the genres and almost inevitable when he is prolific in both (cf. Dryden). Separation facilitates study of the verbal artistry appropriate to each genre and enables the compiler to make separate decisions about comprehensiveness (e.g., to omit I from the drama but include it in the poetry concordance).
- Crawford's Kyd and Marlowe concordances are both "designed
to be helpful to students who wish to study" questions of authorship (Marlowe, p. vii). In the Kyd proper, Crawford includes Arden of Feversham, which he believes is Kyd's; in an appendix he indexes the first two quartos and the folio version of Hamlet in order to lighten "the labour of those who are interested in investigating the claim of Kyd to the Ur-Hamlet." In the Marlowe proper, he includes the three parts of Henry VI, Edward III, Selimus, and Locrine, which last, he believes, is certainly not Marlowe's but has borrowed heavily from his work.26
These procedures are indefensible. It is not the concordancemaker's office to argue for or against a disputed attribution. "That task," as Sister Eugenia Logan rightly observes in her Coleridge concordance (p. ix), "belongs in another field of scholarship." Where an attribution has in its favor evidence approaching certainty, the concordance-maker should include the attributed work. Where the evidence is weak, he should exclude the work. Where the evidence is highly probable but not certain, he should index the work and indicate its status, perhaps by an appropriate symbol. Apparently distinguishable portions of collaborate plays should be analyzed not in separate concordances, but in a single concordance bearing the names of the collaborators.[17] Anonymous plays and plays whose authorship is in serious dispute should be left for the last, and should be grouped in concordances of convenient size according to chronology of composition, as nearly as that can be determined.
1. What kind of base-text should one use?
2. To what extent should one normalize the text?
3. What about textual variants and emendations?
4. Should stage directions be indexed?
5. How comprehensive should the index be?
6. Should a writer's dramas and poems be indexed separately?
7. To what extent should disputes about authorship determine the design of the concordance?
Enumerative Bibliography
The concordance principle has been successfully applied to the production of fully analyzed enumerative bibliographies in the fields of chemistry and physics.[18] A new publication called Chemical Titles indexes the key words of titles of articles in 550 journals so that each key word appears in context in a concordance of key words. The bibliography in each issue consists of two parts: a list of articles alphabetized according to author, with full titles and publication data, and a concordance of key title-words in context, with an easily interpreted identifying code that provides a cross-reference to the first part. On the
Leading scholarly organizations should seriously consider the production of bibliographies by computer. Apart from the rapidity of its appearance in print and its comprehensiveness, a computer-produced bibliography in the field of English literature will satisfy the criterion of full analysis more completely than any of the presently available bibliographies. Each item will appear under various subject-headings, so that a scholar interested in a subject covered only in part in a given book, chapter, or article will find a reference to that source under the subject-heading of his interest. A title such as "Hamlet, Antonio's Revenge, and the Ur-Hamlet"[19] ought rightly to appear not only under "Shakespeare" but also under "Marston" and under the key word "Revenge," so that the article will be brought to the attention of anyone interested in the theme of revenge in Elizabethan literature. In the latest PMLA bibliography, the title seems to appear only under "Shakespeare."[20] I am not, of course, singling out the PMLA bibliographies for special criticism. Within the limits of their chosen form, they are admirably comprehensive, and they offer many cross-references; the instance just mentioned is doubtless atypical. My point, rather, is that all of the present bibliographies are subject to human error which can be much reduced by computer techniques and that none of them meet the criterion of full analysis, which computers can easily satisfy.
A certain amount of processing will probably be necessary before one can be certain that a reference will appear under all appropriate rubrics. If "Shakespeare" and "Marston" were inserted in square brackets in the title cited above, the article would be automatically indexed under those names. For purposes of subject-analysis, it would help if scholars curbed their metaphorical propensities and made their
May one take this opportunity to plead for a coordinated effort to satisfy still another criterion of enumerative bibliography, the criterion of efficiency or non-duplication? As printed materials in active fields of research increase exponentially, it becomes increasingly wasteful for independent groups of workers to prepare largely identical bibliographies. The devotion of scholars who spend long hours compiling indexes of current research is impressive, but it is disheartening to think of the extravagant repetition of routine tasks. Does it really serve the needs of our profession to produce half a dozen annual bibliographies of Shakespearean scholarship? Would not one bibliography — complete, fully analyzed, and swiftly produced by data-processing machines — suffice?
A bibliography becomes even more useful when it provides a brief summary of the contents of a work. Since the beginning of 1958, English Abstracts has been filling a serious gap in research resources. But this excellent publication may some day be confronted by grave problems. Its coverage has been growing constantly and gives every promise of continuing to do so. In January, 1958, the journal listed 32 abstractors on its cover; in December, 1960, it listed 122, an increase of almost 400 per cent. To be sure, the number of items abstracted did not increase by so large a factor. But it did increase very considerably. The first three issues of 1958 printed 426 abstracts on 79 pages; the last three issues of 1960 printed 678 abstracts on 144 pages — an increase, in less than three years, of about 35 per cent in the number of items and 80 per cent in the number of pages. In the not so distant future, English Abstracts, like its kindred services in scientific fields, may be forced to seek machine aid to avoid being engulfed by a tidal wave of publications. Should such a need arise, there is a good chance that data-processing machines will be able to meet it. H. P. Luhn, the IBM engineer whose research made possible the concordance-index of Chemical Titles, has already conducted successful experiments in automatic abstracting.[21] Before very many years pass, text-reading machines may
Studies in Attribution
Concordances and magnetic tape files will obviously facilitate the gathering of internal evidence for the solution of canonical problems. They will enable one to find parallels more rapidly and to make various special checks. Professor R. C. Bald, for example, observes that Hand D in The Booke of Sir Thomas More makes likely a "graphic confusion between x and y" and that such a confusion seems to have occurred in Troilus and Cressida v.i.16, where "the Quarto reads 'box' [and] the Folio corrects to 'boy.'"[22] J. Dover Wilson has collected similar examples of misreadings which could easily have resulted if a good quarto was printed from copy in a hand such as D's.[23] If all of the Shakespeare quartos and the First Folio were recorded on magnetic tape, one could ask a computer to sort out all words that ended in x and y and other easily confused characters. Or one could ask it to search tapes of other Elizabethan dramatists for complete lists of linguistic preferences such as Professor Hoy used to determine the shares of various collaborators in the Beaumont and Fletcher canon. Freed from the tedium of amassing examples, scholars could devote their higher energies to the interpretation of evidence retrieved and classified by machines.[24]
Again, the more concordances there are, the easier it will be to make negative checks — to show that a seemingly unusual parallel occurs in many writers and is not therefore probative of a particular author's claim to an anonymous work. One hopes that the use of parallels, whether for purposes of proof or disproof, will cease to be fragmentary and unsystematic. If we had reliable information about the average frequencies of certain locutions in the vocabularies of educated men using certain forms of discourse at a certain time, the coincidence of many above- or below-average occurrences of even common phrases
Unfortunately, however, electronic computers and their printed products will probably fail to discourage some scholars from playing the game of parallels badly. Those who in the past have been intent on parading insignificant agreements between two texts as strong arguments for common authorship have seldom taken the trouble to make negative checks in available concordances. Will a special pleader in a hurry pause to reflect merely because aids to reflection are more abundant? "It is the peculiar and perpetual error of the human intellect," Francis Bacon warns us (Novum Organum, I, xlvi), "to be more moved and excited by affirmatives than by negatives; whereas it ought properly to hold itself indifferently disposed towards both alike." "Indeed," he adds, "in the establishment of any true axiom, the negative instance is the more forcible of the two." Computers will not do away with the Idols of the Tribe; to guard against such illusions is the province of education in the spirit of scholarly and scientific argument. With the spread of that spirit, one may hope with Bacon (I, cxxx) "that the art of discovery may advance as discoveries advance."
Meanwhile, one trusts that more and more scholars will find ways to advance Elizabethan studies by enlisting the aid of electronic dataprocessing machines. The chief barrier to such an effort is likely to be a lingering suspicion that these machines are somehow baleful, that they somehow constitute a threat to the humanist's distinctive values. But such fears are groundless; they can only be damaging to the progress of Elizabethan and indeed of all humane studies. It is surely inhumane to scorn mechanical aids which by releasing from soul-killing drudgery that most remarkable of all instruments, the brain, free it for its proper function — the enlargement of man's intellectual and spiritual realms through the use of creative intelligence. It may be appropriate to conclude with a striking Elizabethan example of humanist initiative and persistence in making available a novel means for the
Notes
The present essay is a revised version of a paper written for the Report of M.L.A. Conference 20 (Opportunities for Research in Renaissance Drama) and delivered before the Conference on Dec. 28, 1960. I am indebted to my colleague Professor Stephen M. Parrish, who introduced me to the mysteries of computers and who has answered my queries with invariable kindness and lucidity.
See the classified bibliography in B. Quemada, "La Mécanisation dans les Recherches Lexicologiques," in the Univ. of Besancon Cahiers de Lexicologie, I (1959), 41-46. See also Proceedings of the International Conference on Scientific Information, 2 vols. (1959); Martha Boaz, ed., Modern Trends in Documentation (1959); M. E. Maron, "Handling of Non-Numerical Information," Chap. 11 in Vol. 2 of Handbook of Automation, Computation, and Control, ed. Eugene M. Grabbe et al. (1959); and for a non-technical discussion of information-retrieval systems, Francis Bello, "How to Cope with Information," Fortune, LII (Sept., 1960), 162-167, 180-192.
In "Literary Data Processing," IBM Journal of Research and Development, I (1957), 256, Paul Tasman gives comparative figures for compiling a lexicon file index and concordance to the Summa Theologica of St. Thomas Aquinas (a work of about 2,000 pages and almost 1,600,000 words) : manual method—3 persons, 20,000 hours; punched-card method—3 persons, 1,000 hours; large-scale data-processing method—1 person, 60 hours, "exclusive of the presentation and programming time." ("Programming" means devising a sequence of operations so that a computer can perform a particular job of data-processing.) For discussions of techiques involving smallscale equipment, see n. 10, below.
See Tasman (n. 5, above), and Busa, "The Index of all Non-Biblical Dead Sea Scrolls Published up to December, 1957," Revue de Qumran, I (1958), 187-198.
For a breakdown of the time, which is, again, exclusive of editorial work and of programming, see the preface to the Arnold, pp. vii-viii; for an account of the programming, see James A. Painter, "Computer Preparation of a Poetry Concordance," Communications of the A[ssociation for] C[omputing] M[achinery], III (1960), 91-95.
See British Manuscripts Project (1955); this insufficiently known checklist of the microfilms, compiled by Lester K. Born, is sold by the Photoduplication Service of the Library of Congress.
See the extended discussion by Quemada, "La Mécanisation," pp. 9-33. Cf. also the useful mimeographed Reports of the Groth Institute, founded by Professor Ray Pepinsky at the Pennsylvania State University. With the aim of preparing a revised edition of Paul von Groth's encyclopedia of crystallography "in perhaps a hundred volumes," Professor Pepinsky and his co-workers have developed effective techniques of indexing and informationretrieval using inexpensive electro-mechanical equipment: see esp. Reports Nos. 40, 41, 44-48, 53.
John W. Ellison, for example, is planning to collate electronically 800 versions of the Greek text of the Bible.
Tasman, p. 256, mentions reconstructions of lacunae in the Dead Sea Scrolls. "Up to five consecutive words," he reports, "have been 're-written' by the data processing machine in experimental tests where the words were intentionally left out of the text and blank spots indicated."
See Hoy's "The Shares of Fletcher and his Collaborators in the Beaumont and Fletcher Canon," SB, VIII (1956), 129-146, IX (1957), 143-162, XI (1958), 85-106, XII (1959), 91-116, and XIII (1960), 77-108.
Neither word is indexed in Mrs. Cowden Clarke's concordance; except for a few inadvertent omissions, all occurrences in Hamlet only are listed in the Appendix to Crawford's Kyd concordance.
Here as elsewhere, procedure will have to be flexible. It would probably be advisable to include all the plays in the "Beaumont and Fletcher" canon in one concordance. Again, if a successor to Bartlett believes that hands other than Shakespeare's are present in Henry VIII and Pericles, he should nevertheless include these two plays in his concordance and content himself with stating his views in the preface.
See "Chemical Literature Gets a Quicker Index," Chemical and Engineering News, XXXVIII (April 4, 1960), 27-28.
See Luhn's "Auto-Encoding of Documents for Information Retrieval Systems," in Boaz, Trends in Documentation, pp. 45-58.
"Bibliographical Links Between the Three Pages and the Good Quartos," in A. W. Pollard et al., Shakespeare's Hand in "Sir Thomas More" (1923), pp. 113-141.
The kinds of evidence bearing on attribution are more various than can be discussed here. By counting distances between punctuation marks, a computer can gather statistics about sentence-length and sentence-segmentation. By collecting all words with certain medial or terminal letters, it can help one to establish authorial or compositorial spellings. By retrieving blank-verse lines with two- or three-letter final words, it can provide information about weak endings. The scholar armed with knowledge of a computer's capabilities will readily think of ways of exploiting them.
Since I wrote these remarks, I have learned that Professors Frederick Mosteller and David L. Wallace, using computers, have applied statistical methods to the determination of the authorship of the disputed Federalist papers. A report on their work will appear shortly in a book on the Harvard Computer Symposium.
I am indebted to Professor J. B. Bessinger, Jr., editor of the forthcoming Cornell Concordance to Old English poetry, for calling my attention to Marbeck's concordance. The biographical information that follows is taken from the DNB.
| ||