University of Virginia Library

Alan Markman's pioneering essay on the creation of "A Computer Concordance to a Middle English Text"[1] was one of the earliest attempts to explain how to make such a concordance. However, he concentrated primarily on the problems he and Barnett Kottler faced in the production of their concordance to the works of the Pearl Poet, rather than on a general method of compiling a concordance to a Middle English text. I am now nearly finished compiling a concordance to LaƷamon's Brut, done with an ingenious programmer, George Rompot, and the University of Iowa's IBM 360-65 computer. The results are promising, I think, and some of the methods deserve to be set forth as a practical and practicable means of making a concordance. I have formed some of my ideas with this in mind: that the person compiling a concordance should keep in mind the many uses to which his labors will be put. If this means "extra" work in categorizing and cross-referencing words, separating homographs, or gathering variant forms of one word, this extra work is not only worth-while but necessary to make a useful concordance.

The first step in preparing a concordance is selecting a copy text. Most Middle English texts worth making a concordance for exist in more than one edition, and if there is no "standard" edition of the work, one should select a single text and stick to it, noting in footnotes or a separate table the variants in the other major editions, if they are worth noting. The quality of the edition and its ancillary material should help to determine which one will be used. I emphasize this because it is necessary for the person making the concordance to adopt a "neutral" stance toward his material, and he can do this most easily if he does not have to worry about editorial matters in the text — matters which the editor of the text should be relied upon to decide. By "neutral" I mean that he should rely on his edition for decisions such as the reading of a carelessly-spelled word in the manuscript or the meaning of a word on which other editors disagree. For example, if an editor whose edition is being used glosses a word which is spelled "gode" as "good," the concordancer[2] should list this word under


Page 220
this meaning, despite another editor's decision to interpret this word as "god." Warnings that this is the concordancer's policy must appear in the concordance; for if one tries to adopt one reading from one editor, one from another, and yet a third from his own reading of the text, he is no longer a mere concordance-compiler, but an editor, and his concordance will be of little use to anyone trying to find a word. To be useful, a concordance must enable its user to locate quickly the word he wants to find; if the text used for the concordance is edition A, but the concordance lists words under interpretations given in edition B or C, the user will not be able to find the word he seeks.[3]

Markman says that the next major problem faced by the concordancer is to put the text into what he calls a "machine readable" format "so that the computer system can digest it and perform operations with it" (p. 55). By now many people in the humanities have worked enough with computers to consider this a fairly trivial problem; but one important thing must be kept in mind before this phase of the computer project is begun: what will the final output look like, and therefore, what sort of information is it necessary to feed into the computer to arrive at this design? For example, before I actually began the work on my concordance, Professor Stephen Parrish of Cornell University Press suggested that to make my final output more readable and attractive than some of the earlier computer-assisted concordances, I should try to design the final output to be printed in upper and lower case letters, rather than the usual upper case computer printing. So when I put the text onto IBM cards, I coded in a special character which preceded all upper case letters. Eventually the special symbol was deleted, the letter following it was printed in upper case, and the line was back-spaced to fill up the space formerly taken by the special character.

Concordancers who will be separating their texts into grammatical categories for, say, nouns, verbs, and adjectives, may want to indicate in the initial input of data which words fall into which categories. If a separation of homographs and a gathering of variant forms can be made while or before the material is fed into the computer, perhaps during the feeding in of the text some codes can be inserted to separate these forms automatically later.[4] The frequently-used modern term for this activity is "pre-editing,"


Page 221
much of which is necessary to help one anticipate what sort of things he must tell the computer when his text is being put onto cards, tape, or disc.

It was my experience that working with IBM cards was the easiest and least expensive way of handling the material. I found it more convenient to have the cards in order to proofread and make corrections in my own study rather than at a computer terminal. Therefore, I put the entire text into card form, had the cards listed (i.e. printed out by the computer exactly as they had been punched on the cards), and did my proofreading of the text from these lists. If time and finances permit, the proofreading should be done with someone reading the lists aloud, while another person checks this oral reading against the edition used for the basis of the cards. Because of the great variations in spelling which most Middle English texts present, it is advisable to proofread by spelling out most words rather than by trying to pronounce them. It may slow the work considerably, but it will result in a vastly more accurate text, and is thus worth the effort.

While the text is being proofread, some of the corrections can be made and inserted into the deck of cards. It is of course practical to overlap these two tasks if time allows. And again, by dealing directly with the cards, the concordancer remains closer to his text than if he were working with the much more costly computer terminal. Ultimately using cards is more accurate.

As I said earlier, I used a method of "pre-editing" which I found to be most helpful and efficient. While the text is being punched, the key-punch operator is required to do little thinking, and also has little room on the machine for extra papers. But during the proofreading stage, the mind can "think" or "interpret," when much of the sorting of lexical items which will eventually need to be done can be accomplished. The proofreading goes slowly enough to allow the concordancer and his assistant (assuming his assistant knows something about the language being read) to think; therefore during the reading homographs can be separated, variant spellings gathered,[5] typographical errors and key-punch errors located, and so on. These simple tasks also help to break the monotony of the proofreading.

For example, I made charts separating 'good' from 'god,' 'idon' ('did') from 'idon' ('excellent, noble'), 'ræd' ('advice, counsel, situation') from 'ræd' ('to advise, tell') and from 'ræd' ('ready' or the past tense of 'ride' or


Page 222
'read'), and so on. It is important to anticipate all homographs in the text, so that later there will be little trouble separating them.[6]

Also during this proofreading phase, the concordancer can list for his own benefit any irregularities he might encounter which deserve to be glossed in special places in the concordance, which ought to be given special cross-references, or which deserve mention in an introduction.[7] Following the editorial convention of his day, Madden had the printers set his edition of the Brut with all the abbreviations and peculiarities of the manuscript, so I was able to locate all instances of such things as dotted y's [y], unusual abbreviations, interchanging of u's, v's, f's and w's for one another, etc., during the proofreading. My charts were eventually helpful in the commentary, and in the compiling of the final concordance.