University of Virginia Library

Search this document 


  

expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
collapse section 
 1. 
  
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 

expand section 

I

It has long been recognized that concordances are essential tools in the critical, historical, and philological analysis of literary texts. Until very recently, it was also apparent that anyone who agreed to compile a concordance had assumed an appalling task. "An exhaustive concordance


16

Page 16
to the Bible, such as that of James Strong," John W. Ellison estimates, "takes about a quarter of a century of careful, tedious work to guarantee accuracy."[4] When in February, 1911, Professor Lane Cooper of Cornell University, with the aid of sixty-seven workers, saw the Wordsworth concordance through the press only two years and three months after excerpting of the Hutchinson edition had begun, his achievement was quite properly regarded as remarkable. After tens of thousands of man-hours had been spent in excerpting, alphabetizing, and checking, a concordance-editor was usually compelled to search far and wide for a publisher (the Wordsworth was delayed about nine months until a suitable one could be found) and, often, a handsome subvention. As printing costs soared, large concordances became more and more rare. The only conventionally-produced large concordance to an English or American poet which has appeared since the end of 1941 seems to be Professor Eby's concordance to Whitman's Leaves of Grass and selected prose. This work of 980 pages lists at $25.

That concordance-makers should turn to electro-mechanical and electronic aids was only to be expected. After World War II, there were efforts to compile indexes by the use of punched-card systems. But the limited capacity and sorting speeds of electro-mechanical equipment made the automatic production of very large concordances impractical. In the last few years, therefore, researchers have turned to large-scale electronic data-processing machines such as those marketed by Remington Rand and IBM.[5] The year 1957 witnessed three independent developments: Paul Tasman, with the collaboration of Rev. Roberto Busa, S. J., worked out a program for indexing the words in the Dead Sea Scrolls on the IBM 705;[6] John W. Ellison brought out Nelson's Complete Concordance to the Revised Standard Version Bible, automatically indexed by the Remington Rand Univac I; and Cornell University launched a program for a computer-produced series of concordances, with Stephen M. Parrish as General Editor.


17

Page 17

In the same year, the University of California published the late Guy Montgomery's concordance to Dryden's poetry. Since this cumbersome oddity has given many of its users an erroneous impression of what a machine-prepared concordance looks like, it deserves some mention here. One must emphasize that it is not at all an electronically produced work and indeed only in small part an electro-mechanically produced one. When Professor Montgomery died in 1951, he left 240,000 manually indexed cards based on Noyes's edition of Dryden's complete poetical works. Out of the decision to use accounting machines to help in checking these cards grew the decision to print by offset from IBM sheets a list of index-words with abbreviated references to the places where they occurred, but without any context whatsoever. A sample entry from page 1 of the resulting concordance will indicate the difficulties that confront the user:

ABIDE     HAP 1928
For each such entry under ABIDE, the reader must consult the prefatory list of full titles geared to the cryptic symbols. He will then learn that HAP stands for "The Hind and the Panther" and that the poem begins on page 218 of Noyes. He must next turn to that page and move forward until he locates line 1928, "No Martin there in winter shall abide," in column B of page 243. But his work is just beginning. In order to ascertain Dryden's various uses of ABIDE (eighteen instances), he must either write out each line as he locates it or else jot down all the page and line numbers of Noyes in which ABIDE occurs and riffle the pages back and forth as he tries to compare instances. One doesn't like to think of the agonies of a reader who wishes to locate and analyze occurrences of the fifty-seven words in "Dryden's major vocabulary" which, according to the preface, occur "from 400 to 1,100 times apiece."

Professor Parrish's Concordance to the Poems of Matthew Arnold (Ithaca, 1959) shows that a concordance compiled and printed by electronic data-processing machines (in this case the IBM 704) can give as complete a verse-context and an array of identifying data as the manually compiled type. The first three entries under ABIDE will indicate the advantages of the Arnold:

  • OTHERS ABIDE OUR QUESTION THOU ART FREE . . 2 SHAKESPEARE 1
  • HE ESCAPES THENCE BUT WE ABIDE . . . . . . . . . 58 RESIGNATION 213
  • THE LAW IS PLANTED TO ABIDE . . . . . . . . . . . . . 94 SICK KING BOKH 208
Here the concordance provides a full line of context for each instance of the index-word and prints the instances in the order of their occurrence

18

Page 18
in Tinker and Lowry's edition of Arnold's Poetical Works. The identifying information to the right gives the page of Tinker and Lowry on which the line appears, then the title of the poem in which it occurs, or a rather full and readily understood abbreviation, and lastly the line number. In most cases, the reader will probably be able to determine the different uses of the index-word from the entries themselves, but if he should wish to consult an even fuller context, he can immediately turn to the indicated page of Tinker and Lowry. An appendix gives a helpful index of words in order of their frequency in Arnold's text. The production of this volume of 965 pages required some two hundred hours of card-punching, tape-recording, data-processing, and listing.[7] The IBM sheets were then reproduced by offset and bound in an attractive volume which is priced at $10.

Cornell concordances to follow the Arnold will incorporate refinements as rapidly as they are developed. Special print-wheels will provide a full array of punctuation marks and of characters such as the thorn and the ligatures (the Arnold has only the hyphen). Presently available techniques can instruct computers to discriminate between homographs and print them under separate headings, to cross-index hyphenated words, and, for earlier poets, to collect the old-spelling variants of a single word under their modern-spelling equivalent, as in Osgood's Spenser or Tatlock and Kennedy's Chaucer.

New possibilities in concordance-making and in other kinds of literary data-processing will doubtless emerge as computers rapidly become more and more complex, swift, and powerful. "The latest [computers]," writes Ritchie Calder, "are a thousand times faster than those of three years ago and a million times faster than those of ten years ago," and he reports that in June, 1959, in Paris, at an International Conference on Information Processing, scientists seriously discussed "machines which would memorize all the knowledge in the world."[8] One's mind reels and retreats to somewhat less staggering fantasies in which the C. W. Wallaces and Leslie Hotsons of the twenty-first century, working in American repositories, ask computers to search magnetic tapes of British archives for all occurrences of names with, say, the components Sh, k, sp, r or M, r, l. A daydream high fantastical, perhaps; yet the photoduplication during World War II of a vast number of British


19

Page 19
documents, now available at the Library of Congress on microfilm, provides a notable precedent for the internationalization of archives.[9]

But consideration of what can be done now is likely to be more fruitful than heady speculations about the future. Literary scholars should give earnest thought to making use of the machines available to them on their campuses: much can be done even with small-scale computers or punch-card and perforated-tape equipment.[10] Efforts should be coordinated in order to determine important needs in different specialties, to prevent duplication of work at different universities, and to disseminate information about new developments in the processing of literary texts. In this connection I am authorized to state that the Department of English at Cornell University will be glad to share its experience in preparing concordances by computer, and its knowledge of work being done at other centers, with those who may be ready to embark on projects of their own.