University of Virginia Library


Page 55

A Computer Concordance to Middle English Texts
Alan Markman

The production of a computer concordance to Middle English texts for which no standard, or received, edition exists, but which exist in several non-uniform editions, requires two distinct operations.[1] Our preparation of A Computer Concordance to Five Middle English Poems: Pearl, Patience, Cleanness, Sir Gawain and the Green Knight, and St. Erkenwald, soon to be printed by the University of Pittsburgh Press, illustrates very nicely the exact nature of the operations.[2] The first is textual. A satisfactory reading of the MSS. must be obtained; i.e., one has to have or prepare a "best" copy edition or text, taking "best" in such a case to mean that which might best accord with the capabilities of the available computer system as well as best represent the MSS. The second operation is technical. A method has to be devised to translate the text into a "machine readable" format so that the computer system can digest it and perform operations with it; i.e., listing words, comparing them, sorting them, collecting them with an appropriate context, and printing at last the completed concordance. These operations are different, to be sure, yet we soon found out that we were unable to proceed with the textual problems until we had clearly understood the technical requirements. We had, for example, to resolve certain orthographical distinctive features of our text in favor of the printing limitations of our computer system; not only did we eliminate all punctuation


Page 56
points but also all occurrences of the grapheme þ, to cite one instance, had to be changed to upper case TH, and, for another instance, the entire corpus had to printed out in upper case letters.[3]

The entire textual problem is perhaps best described by Professor C. O. Chapman. In response to our inquiry, Mr. Chapman reminded us that ". . . there is no uniform text of the five poems such as we have in Skeat's Chaucer, and the variations of spelling within each poem, as well as the presence of four or five editors, each transcribing a poem according to his own method, have resulted in great confusion. It has long seemed to me that such a uniform text for these poems, as Skeat's for Chaucer, is an indispensible prerequisite to the making of a concordance."[4] The wisdom of that conclusion is evident to all who know these poems. The occurrence of BOT in "Cleanness," 473 is a case in point. The form BOT occurs 324 times in these texts. It may be a noun, a verb, or an adverb, or also a preposition or a conjunction. When it equates generally with Modern English but it is not to be concorded, but is to be found in a listing of common words not concorded. If, however, it equates with Modern English remedy, help, announce, or proclaim it is concorded. Since the form BOTE also occurs 11 times in these texts, either with the meaning of boat or boot (i.e., remedy), one would expect that BOT ought not to occur where elsewhere BOTE is used. Now "Cleanness," 473 reads "Bryng bod-worde to bot blysse to us all." Sir Israel Gollancz reads a noun, remedy, taking BOT from Old English bōt; R. J. Menner reads a verb, announce, taking BOT from Old English bodian. This difference means, of course, that both men take a different attitude towards


Page 57
BLYSSE. If BOT in this instance means remedy it has to be found in our concordance under Modern English boot, whereas if it means announce, we should have to list it under Modern English bode, where immediately it would look like one of the more than a dozen forms of BIDE. As a saving grace, BOOT, meaning any sort of covering for the foot, does not, thank goodness, occur. In the same poem, does FLEE3, 1476, mean fleece or flies, and does GENTYLE, 1432, mean gentle or gentile? Editors do not agree. Or in "Sir Gawain and the Green Knight," 1634, does HERE mean hear or praise? These few instances are but a fraction of the occurrence of forms which called for editorial decisions. One word, two words, or compound; spelling variations; homographs and homonyms; these are a few of the editorial problems we faced. We obviously needed the "uniform text" Mr. Chapman spoke of. There was nothing for us to do but to make one. We did it in this fashion.

Our first step, of course, was to assemble all the printed critical editions of the five poems and, lacking access to the MSS., facsimile reproductions of MS. Cott. Nero A.x. + 4 and Brit. Mus. MS. Harl. 2250. We were able, after examination, to exclude several of the printed editions, so that as we faced up to the tedious chore of collation we found it necessary to include but one "variant" edition with each of our "base" editions for four of the poems, but three "variant" editions of "Pearl" were required along with the "base" edition. In order to provide a three space identification symbol for the five poems, a symbol the computer could easily manage, we chose the standard abbreviations of the titles of these poems. The following designations, wherein the three letter symbol alone signifies our base edition, and the symbol followed by V or a number signifies a variant edition, were assigned:

  • GGK The 1940 Gollancz edition of Sir Gawain and the Green Knight.
  • GGK V The 1952 reprint of the 1925 Tolkien and Gordon edition.
  • CLN The 1921 Gollancz edition of Cleanness.
  • CLN V The 1920 Menner edition.
  • PRL The 1953 Gordon edition of Pearl.
  • PRL 1 The 1906 Osgood edition.
  • PRL 2 The 1921 Gollancz edition.
  • PRL 3 The 1933 Bowdoin College edition.
  • PAT The 1924 Gollancz edition of Patience.
  • PAT V The 1918 Bateson edition.
  • ERK The 1920 Gollancz edition of St. Erkenwald.
  • ERK V The 1926 Savage edition.


Page 58
We knew at this stage of our work that eventually an IBM card punch operator should have to have in front of her an accurate copy text of these poems so that she could punch on one card one line of the text followed by an identification symbol and the line number of that line in its text. The single object of our collation was to provide that copy text. Because Sir Gawain and the Green Knight is the longest of these five poems we chose to work first with it.

Our decision to use the 1940 Gollancz edition of Sir Gawain and the Green Knight as base text committed us to certain procedures. For example, we retain -3 as a grapheme, we retain the U/V practise of both Gollancz and Tolkien and Gordon, and we retain the I/Y practice of both editions. Since no punctuation whatsoever is retained, we were obliged to disregard punctuation variants.[5] All occurrences of þ we had to change to TH. All brackets in the text were eliminated, & was changed to AND, and we simply treated all italicized printings as if they were printed regularly.

The hyphen caused the greatest difficulty. In general, we eliminated the hyphen when, for example, it was used in the base text to separate morphemes which ordinarily are not separated. Thus to cite but a few examples which occur early in the poem, we changed VP-ON to VPON (GGK, 47), DE-BATED to DEBATED (GGK, 68), and IN-NOGHE to INNOGHE (GGK, 77). Here, of course, we followed the practice of Tolkien and Gordon. Similarly, we print FORSOTHE as one word in each instance, whereas, in all editions of all five poems, that form may occur as one word, as two words (GGK, 415), or as a hyphenated word. A curious instance is the occurrence of GOD + MON. This combination may occur as two words, i.e., a good man, but it also occurs as one word, i.e., head of a household. Thus GGK, 1029, reads GOD-MON, GGK V, 1029, reads GOD MON, but we print one word, GODMON. Even more disturbing is GGK, 157, where HEME-WEL HALED occurs. In GGK V, 157, that form is printed as HEME WEL-HALED. The facsimile MS. indicates three words.[6] Since this line shows about as well as any other what an editor of a Middle English


Page 59
text faces, it will be worth a moment to look at it closely. The line is printed as follows:
The MS. reading has to be taken as HEME WEL HALED. Now both Gollancz and Tolkien and Gordon derive Middle English HAL (L)E from Old French haler, and suggest "rise, depart, rush, draw, lift, come, go, pass, and loose from a bow" as Modern English equivalents. Tolkien and Gordon regard HEME as an adjective, derived from Old English gehœme, meaning "neat" and WEL-HALED as another adjective, simply a combination of WEL + HAL (L) E, maening "pulled up properly" or "drawn tight." Gollancz considers HEME-WEL to be an adjective meaning "well fitting" which is closely allied to HEMELY, "closely," which he derives from Old Norse heimolliga, "privately," and for which he suggests Old English cognates hām and hæm-. One should therefore have to translate GGK, 157, as "well fitting drawn (up) stockings of that same green color" and GGK V, 157, as "neat properly pulled up stockings of that same green color." Admittedly the concorder is not a lexicographer, and ordinarily one turns to a concordance not to find a discrimination of lexical meanings but instead a listing of forms, of words. And it probably is of little importance, in this instance, to suggest that "well fitting drawn up stockings" are not very much different, if different at all, from "neat properly pulled up stockings," or that HEME-WEL and HEME might be synonyms after all. Yet it is a matter of importance to the counter of words to know how many times HEME, HEME-WEL, HALED, and WEL-HALED occur in these texts.

Here we were obliged to face up to the entire matter of compounds versus two words or hyphenated words. After a great deal of thought, we decided to eliminate the hyphen wherever possible, and to choose one word rather than two words wherever it clearly seemed to us that the resulting one form represented a unified lexical entity which readers of Middle English would recognize. Thus we print HEME-WEL for GGK, 157, and WELHALED for GGK V, 157.[7] Some further


Page 60
idea of the complications we struggled with can be got from observing that HAL and HAL (L) E, the word we started with in this demonstration, also occur as nouns, meaning "castle" or "hall." Nevertheless, our decision to eliminate the hyphen wherever possible provides, we think, a clearer reading of the texts. It also precluded several technical problems, since an abundance of hyphenated forms could cause a certain amount of machine confusion. Of course, we did retain the hyphen in PRL, 195, FLOR-DE-LYS, where we felt FLORDELYS was an impossible form. Treated as separate words, moreover, we would have created an unnecessary homonym with FLOR meaning "floor" and have inserted, rather falsely, two French forms in our listings.[8]

Two other editorial decisions were made. First, we retained all proper names in their Middle English forms. This seemingly innocuous procedure caused an amusing problem, for the form AGRAUYN A LA DURE MAYN presents four French words. We quickly eliminated A and LA from the list of words to be concorded, but both DURE and MAYN occur elsewhere as Middle English words, meaning endure and main, and these occurrences of those forms naturally turned up with the others.[9] But under the Modern English headwords HARD and HAND, DURE and MAYN do not appear. In justice to Agravain's too sordid late reputation, we suppose they ought to appear. Second, a certain amount of regularization seemed desirable. We therefore follow the orthographical practice of GGK throughout; i.e., we did not consistently normalize spellings, but retained the C-K, the I-Y, the I-J, the -ES and -E3, and the U-V or U-W distinctions. We did, however, change all occurrences of QUOD to QUOTH, all occurrences of VUS to VS. Our principle here is a simple one. A concordance is used with


Page 61
the available printed texts. There is no standard, received edition of these poems, so our copy text actually is the closest yet to the "uniform text" Mr. Chapman spoke of. Having chosen the Gollancz edition of Sir Gawain and the Green Knight as the base text for the longest of our poems, it seemed only reasonable to us to make all other printed editions conform to its orthography. In view of the presently available printed editions of these poems, the editions one would have at hand while using our concordance, little good would have resulted had we expended further energy in replacing initial 3 with Y and medial or final 3 with GH or W. Finally, to be done with these matters, we changed BERTILAK, GGK, 2445, to the better supported form BERCILAK, we did not use at all the spurious line, GGK, 2445*, we changed TON and DON, ERK, 5-6, to TOUN and DOUN, and we omitted altogether all occurrences of AMEN as well as the motto HONY SOYT QUI MAL PENCE.

The results of our collation were interesting but not startling. The principal achievement, after all, was the production, following the procedures just described, of a uniform text of the five poems. We did not make and adopt a single new emendation. We did not suggest, even, a single new reading of the MS. But we did record, and therefore preserve for consideration, a rather large number of variants. All are truly lexical variants. That is a finding of some significance. Differences of spelling or punctuation do not alter the word stock of a MS., but some differences of MS. readings do. Let us be specific for a moment. The editors of all our printed editions examined the same MS. If, as they do, one editor prints for GGK, 77, Toulouse and the other prints tolouse, and one prints [&] where the other prints of, we need not be concerned. The difference between an upper case and lower case printing is not lexically a significant difference, especially for a computer concordance which is printed entirely in upper case letters. Neither is the difference between [&] (or AND) and of significant, because both words are omitted, not concorded at all, since they meet the requirements of a classification found in all concordances, a list of "common words not concorded." Actually, as the one editor admits, the MS. clearly reads of. We found many variants like these two occurrences, and we simply ignored them. Many variants, however, could not be ignored. When in CLN, 745, the line is printed as THEN THE BURNE OBECHED HYM AND BO3SOMLY HIM THONKKE3 while in CLN V, 745, the same line is rendered THEN ABRAHAM OBECHED HYM AND HY3LY HIM THONKKE3, and the MS. clearly reads ABRAHAM and not THE BURNE, and appears to read


Page 62
LO3LY where these editors show BO3SOMLY and HY3LY, we found a problem on our hands. We saw no reason to retain THE BURNE, but we saw that it was reasonable to retain BO3SOMLY and HY3LY, since both forms represent reasonable decisions of editors who examined the very same spot in the MS. Therefore this one line in the MS. grew into two lines in printed editions and two lines for us also. Our concordance will show both lines, for we concluded that an honest lexical difference existed, and that support for both BO3SOMLY and HY3LY could be adduced, and that we should count these forms as actual occurrences.

In all we found a considerable number of significant variant lines. In examining the poems, we noted 72 lines of the total of 2530 lines in GGK where significant differences occurred, 64 lines of 1812 lines of CLN, 112 lines of 1212 lines of PRL, only 9 lines of 531 lines of PAT, and 18 lines of 352 lines of ERK. Of the 6437 total lines, then, we discovered 275 variant lines. That means, of course, that we not only added 275 lines of verse to our corpus (almost another poem as long as ERK) but that we also considerably affected both the total word stock and the number of occurrences of a good many individual forms. For example, if a line showed but one difference, say MUCH as against MUTH (PAT and PAT V, 54), both forms, to be sure, were concorded, but also every other word in what was originally one MS. line had to be tallied twice, for each word in both lines of the printed editions was spotted under its headword, and a single word, say CHEKES, which both editors show for this same line, would be cited twice under its headword. Our frequency tally shows that CHEKES occurs three times in these poems, but it really occurs only twice. Since all variant printings are marked with a V, or a number in the case of PRL, a reader of our concordance will have to subtract one from the total occurrences of any given form for each occurrence which is so marked. To make this perfectly clear, suppose a reader wishes to know how many times the word "date" occurs. Our frequency list will show 15 occurrences, but a glance at the fifteen lines containing this word under the headword DATE will reveal that three of the lines are marked as variant lines, and that therefore DATE actually occurs only 12 times. It is not that DATE is in way suspect, but that in ERK, 205, it is A LAPPID DATE while ERK V, 205, has it A LEWID DATE, and that in PRL, 528, it is WYL DAY WAT3 PASSED DATE while both PRL 1 and PRL 3 have it WYLDAY WAT3 PASSED DATE — our one word or two word puzzle again, and in this instance not to be resolved. It will be seen that variants are a bit of a bother to the concorder.


Page 63
This is not the place to display all the variants we discovered; let it suffice that our collation was necessary, that it turned up some interesting significant differences, and that it is one more problem a maker of a concordance of a Middle English text must reckon with.[10]

Our uniform text in hand, we turned to our second operation. Any sense of relief we might have experienced at having "solved" the textual problems, being able at last to release our text to an efficient machine, was short-lived. Initially, things did indeed run smoothly. A very capable operator, loaned to us for the purpose by the University of Pittsburgh's Health Law Center, punched our text and produced, in approximately 40 hours, 6712 IBM cards, each card punched with one line of text, its proper title abbreviation, and its proper line number in its own text. The Health Law Center also printed out that initial deck of cards. In another week the print-out was proofread, and all errors noted were quickly corrected by simply punching a complete new card for each error discovered. Our initial deck of cards, now corrected and carrying in its own punched format our uniform text, was now ready for the IBM 7070.

Once again we were fortunate. Mr. Charles Bacon, a Systems Analyst in our Computation and Data Processing Center who had become interested in our project, agreed to prepare the program for


Page 64
the concordance.[11] Largely in Mr. Bacon's words, this is the program he prepared.

The initial deck of cards was punched (as we have just described) so that each card was completely identified without regard to the order in which the deck was stacked. The first computer program was written to read these cards and perform the following functions. First, it loaded into the computer a common word list, 146 words not to be concorded, which, eliminated, lightened the load on the rest of the system. Second, the program read each text card and preserved its content as a "card image." That "card image," at this stage, was scanned by a sub-program designed to find each separate word and move it to a twenty-letter area, making sure that the first letter of each word was always placed in the first column of that twenty-letter area. Third, the program wrote out that area, immediately followed by the complete "card image," as a connected tape record. When all the words from a given card had either been written on to tape, the "card image" being repeated as many times as non-common words appeared on the card, or eliminated as common words, the next card was read, and the process repeated.

The second computer program was designed to sort the connected tape record. The record was sorted, alphabetically, first according to the individual words and second according to the poem title abbreviation. A third sorting according to line number completed this program. The entire sorting process was performed by making use of a standard program written for general sorting requirements.

The third, and last, computer program was designed to transfer the sorted tape record on to cards in a format suited to the printing requirements of the concordance. At this stage a second common word list was introduced. (Personal pronouns and all forms of the verbs have and be comprised this list. Our reasons for desiring this list will be explained later on.) Words contained in this list are to be printed in a separate listing, identified by poem title abbreviation and line number only, instead of being displayed in an entire line of text. The sorted tape record, therefore, was punched out in such a manner that three card formats were produced: first, the individual word by itself (i.e., a headword), second, the original line of verse taken from the copy text, and, third, the display of numerical references. Thus 36,831 cards were produced and stacked in an order which would produce, as they were run through a printer, the final concordance.


Page 65

After Mr. Bacon had devised that program he supervised the actual run of the entire process, from the first step of loading our initial deck to the last step of stacking the 36,831 cards for the printer. The IBM 7070 is an extremely rapid machine. The entire operation took but a very short time. The first phase, loading our initial deck of 6712 cards and transferring our text to magnetic tape, required but 25 minutes. The second phase, the entire sorting process, required 45 minutes. The last phase, the punch-out of the sorted tape record, required 4 hours. The actual machine time, then, from initial loading to stacking of the punched output in proper order for printing, was just 5 hours and 10 minutes. The printer was a bit slower; it required 8 hours to print the whole concordance. The entire operation thus required 13 hours and 10 minutes. We very carefully gathered up the continuous sheet which bore our project, separated the roll into sheets, stacked and bound them, and carried them off, hopefully, after a rapid proofreading, to send the concordance, with some slight introduction prepared conventionally, to be reproduced in an offset process by the university press.

At this happy time we encountered our largest disappointment. Had our poems been written in twentieth century English our first printout, without doubt, could have been reproduced, just as Mr. Parrish described it for the Arnold concordance. But ours are fourteenth century poems, and they were not written in twentieth century English. It is almost, looking back at it now, as if some huge irony were on purpose lodged within our texts, as if those old poets were now able to confound us, to dare us to reduce their art to a modern machine manageable format. Mr. Parrish and his associates were dismayed, he tells us, to see AAR and AARAU as the first two items in their concordance. As we looked over our product we were shocked. The more carefully we searched, the greater was our concern. We saw chaos.

It seems best to us to produce for examination here a sample of our first print out. In this fashion we can describe the true nature of the technological problems we now faced. Also, most of our principles, our feeling of what this concordance should be, can be induced from the very data we worked with. Picked quite at random, here is one sheet of our first print out.

DEUOYDE  7743 


Page 66
DEUYNE  7750 
DEUYS  7752 
OF DIAMAUNTE3 A DEUYS ...GGK  617  7753 
DEUYSE  7754 
DEUYSED  7759 
DEUYSE3  7764 
DEUYSIT  7767 
DEVAYE  7769 
DEVISED  7771 
DEVOYDE  7773 
DEVYSE  7777 
DEVYSED  7779 
DEW  7782 


Page 67
DEWE  7784 
DEWOYDE  7788 
DEWYNE  7790 
DERE  7793 
DE3EN  7795 
DE3TER  7797 

Notice, in the first place, the run of numbers at the right margin. These numbers, 7714-7802, identify the position of the 62 cards of our total deck of 36,831 cards, placed in the order shown, which carried that portion of the first print out reproduced here. They are concordance line numbers. In the final off-set printing, these numbers, suppressed, will not appear. Their utility to us at this stage of our work, as will soon be evident, was enormous.

Actually, we faced the problem of orthographical variants as well as that of Middle English versus Modern English headwords long before we had our first print out of the entire concordance. We at the start obtained a print out of the complete lexicon, and, working with it and our copy text, attempted to devise a system of cross referencing which would allow to retain all headwords in their Middle English form. But cross referencing became so complicated that we were forced to give it up; users of the concordance so arranged would find it unnecessarily complex. We decided then that we should have to use some Modern English headwords. As we shall point out shortly, what


Page 68
we had to do was no less than to print out a concordance (our original complete print out) in order to make our final concordance. We explored the possibility of producing a program which would permit the computer to gather forms together and shift them from their original position to some other position in the final ordering of all forms, but were unable to produce such a program. We were obliged to proceed along other lines.

Suppose, now, that this 62 line sample concordance were the final version of this portion of the whole concordance. And suppose, further, that a reader entered the concordance at the headword DEWYNE. How would he know that he should find, 421 lines further along, roughly eight pages in the printed book, a variant form, DOWYNE? What instructions ought we provide? Or, earlier in our work, ought we to have spotted the DEWYNE — DOWYNE variation and regularized all three occurrences to DEWYNE? (It would have been a simple matter, actually, because the excellent glossary in Gordon's Pearl does, on p. 127, link these forms.) Such a procedure, carried out consistently in our five poems, however, would have been an impossible task; thousands of forms would have had to be changed and our texts, as well, would have been seriously violated. Moreover, we would have produced a concordance to a text unknown to anyone. Hence, we did not regularize spelling. Ought we to have chosen a Modern English headword and printed both forms under it? The Middle English form is derived from Old English dwīnan, meaning "languish, pine away." We do not use Modern English headwords unless the Modern English development of the Middle English form closely resembles the Middle English form. It would not do, in this case, to use LANGUISH as a headword for these forms (the poets did not know that word), and there is no Modern English form *DEVINE. We chose to retain DEWYNE as headword, and to print all three occurrences of it here, these two occurrences, concordance line numbers 7791 and 7792, and concordance line number 8212, which contains the form DOWYNE. Where DOWYNE falls out in the concordance as a headword, at present as concordance line number 8211, we have replaced that card with a new card which carries this format: DOWYNE (V. DEWYNE). We therefore make it certain for the reader that he indeed finds the three occurrences of the Middle English forms which equate with Modern English "languish."[12]


Page 69

It might appear, at this juncture, that our simplest solution would have been the elaborate cross reference key we attempted earlier to devise. We could have retained all Middle English headwords, and directed attention to other places in the concordance where variant forms of semantically similar items occurred. Very carefully we started through our first print out and marked vide references after those headwords for which variant forms occurred. The very first entry on the print out was ABATAYLMENT. It was marked to read: ABATAYLMENT (V. BATELMENT). That procedure was designed to carry a reader from the first line to concordance line number 1975, where he would see BATELMENT. BATELMENT, of course, had to be changed to read: BATELMENT (V. ABATAYLMENT). After some time we gave it up. Not only did it turn out that we should have to mark well over half of our total stock of headwords, but it turned out also, in some cases, that we had to cite as many as fourteen vide references for a single headword, and thus to mark each of those fourteen co-occurring variant forms. Moreover, it proved impossible always to indicate which form constituted the principal entry. Had we persisted, the result would have been an inefficient concordance, if not an inadequate one. Users would have been forced to turn back and forth so frequently that they well might have concluded that such searching was not worth the effort.


Page 70

We therefore concluded that the concordance had to be rearranged so that it would present the sorted word stock of our five poems in the most convenient and useful form for the type of reader we imagined would most want to consult it. We decided, then, that forms which were phonemically and lexically alike but graphemically different had to be drawn together under a common headword. For example, we have grouped under the Modern English headword CHRIST all occurrences of CRIST, CRISTE, CRYST, KRYST, and KRYSTE. Forms which are lexically alike but are morphemically or morphophonemically different are distinguished. We count DEUEL and DEUELE3 as two words, and we count GODE (or GOOD, GOODE, GOUD, GOUDE), BETTER, BEST (or BESTE) as three words. In short, all lexical variants are distinguished (the chief aim of a concordance) and, at a second level of distinction, within the major form classes all significant structural differences are distinguished. Thus: Nouns — singular and plural forms are listed separately, but case is ignored, especially, lacking an apostrophe, since genitive singulars and plurals cannot be distinguished from other case singulars and plurals;[13] Verbs — the five paradigmatic forms are listed separately, i.e., infinitive, 3d sing. pres. indic., pret., p. partic., and pres. partic., with no distinction made between pret. and p. partic. forms unless a distinctive difference exists, as, in Modern English, say, between RODE and RIDDEN; Adjectives — the stem, comparative degree, and superlative degree are listed separately; Adverbs and all other Form Classes — these presented no especial problems.

We have, here and there, already said a few things about the choice of headwords. In general, we find Parrish's description of the matter quite sound, and his solution for a text like ours reasonable. We have, however, made one significant change. Parrish suggests that the ". . . optimum concordance to an early text is of a third type, in which the lines of verse are given in their original form but index words are


Page 71
modernized . . . ." (p. 10.) Our lines of verse are in their original form, but we did not modernize all headwords (index words). Usefulness to the reader was again in our minds, and consequently we modernized a headword only when the Middle English form closely resembles the Modern English form. We retained the Middle English form when no closely similar Modern English form exists. Where modernized, the Middle English form will not occur at all as a headword; e.g., from the sample page we have reproduced here, the headword DEWOUTLY, concordance line 7787, occurs, but in our revised concordance that form will not appear as a headword, and, instead, under the Modern English form DEVOUTLY, which will be inserted in place of the present concordance line 7741, concordance lines 7742 and 7787 will be listed. From the list of words which appear on this same sample page, only three — DEVAYE, DEW, and DEWYNE — are retained in their Middle English forms; DEW also happens to be the Modern English form, but "deny" does not at all look like DEVAYE, and DEWYNE had to be treated as we have already described it. There is a third choice. Some Middle English words do not closely resemble their Modern English counterparts but do tend to suggest them. Words of this class are retained as headword entries, but citations are not printed beneath them. Instead, such a headword will carry a vide reference after it, that reference being a Modern English form, and under the Modern English form the citations will be situated. From this sample page, two such entries are indicated: DE3E will read DE3E (V. DIE) and DE3TER will read DE3TER (V. DAUGHTER), and concordance lines 7794 and 7796 (since DE3EN is an unlevelled form of DE3E) will be located under DIE, which will follow DID, line 7833, while the five DE3TER citations will be spotted under DAUGHTER, to be relocated some 650 lines back, immediately after concordance line 7059, a line containing the form DAUBE. In short, we derived three types of headwords: the Middle English form, the Modern English form, and the "in between" Middle English vide Modern English form. With that in mind, it is clear to see how we established, to finish with the forms on this sample page, the paradigms DEVOID—DEVOIDS—DEVOIDED—DEVOIDING and DEVISE—DEVISES—DEVISED. The sources of our headwords DEVOUTLY, DIVINE, and DEVICE are likewise clear. And all the forms on this sample will be resituated as we have explained it.

All this activity, with some decision called for at every entry in the concordance, with uncommon persistence required to uncover all the alternates and variant spellings (even making sure, to give another


Page 72
example, to search out apathetic forms—thus we chose to take concordance line 2030, BAYST, and relocate it as line 1, as Modern English ABASHED), with a gnawing frustration but ever growing knowledge of our texts, and respect for them, attending it, all this constituted our "post-machine" editing. From the original print out, then, we made the necessary changes in headwords, and using the line numbers assigned to the entries, we rearranged the concordance. As we have said, we discussed the possibility of writing a program which would enable the computer to rearrange the concordance, but were unable to produce that program. In any case, it seems clear that we should have had to produce the changes in headwords and the instructions for rearrangement by hand, as we did, before a complex program could be written. It took two people 200 hours to do that work by hand—a staggering number of hours compared with what the computer was able to do in less than 14 hours. At this moment we see no other alternative. It is another problem for all concorders of Old English or Middle English texts.

Relocating the out put deck of 36,831 cards has proved to be an equally arduous task. Since thousands of operations are involved, it was apparent that we could not count on the IBM 7070 to perform the work. To switch card and concordance line number 2030 to line number 1, for example, would require a distinct program, simple to be sure, but costly. To sort out the line numbers 16575, 16576, 16463 . . . 5013, 16582, 16583 . . . 16592-98 . . . 508, 16465 (40 occurrences, in all, of what we list under Modern English CAST) and move this entire newly arranged unit so that it is spotted after what is at present line number 5011 would require another distinct program, a more complicated one, also costly. With these two instances expanded to thousands, it should be clear why the computer is ill suited for that assignment. Perhaps in the future an extremely sophisticated program might be devised to perform that kind of operation efficiently. At the present time it is not possible. We are completing this arduous work manually.

The utility of the concordance line numbers will now be apparent. A simple program was devised to print the matching concordance line number on each of our 36,831 out put cards. That entire deck now is so marked; i.e., each punched card now carries the same number on it as the line number it represents. Thus an individual card, run through the printer some time ago, and which generated a line of our original print out, can now be picked out from all the other 36,830 out put cards. All our "post machine" editing was performed right on the original print out sheets. The instructions are quite clear. They


Page 73
go along like this: place card 2030 in position 1; next place card 1952, then 2028, then 1, followed by 2, 3, 6, 9, 7, and so on. Nimble fingers and a patient eye have taken over the work. It is not yet completed, but the end is in sight. When new headword cards have been punched and inserted at the proper places (i.e., those instances where we did not retain the Middle English headword) the freshly ordered deck of cards will be ready for machine printing. The print out of that deck should constitute our final concordance, and it, after proofreading, will be reproduced in an offset printing.

A few minor problems occurred. We had, for example, to examine each of the 2937 occurrences of THE to pick out the 146 instances where the form equates with the pronoun "thee." The other 2791 occurrences were not concorded. So, too, did we examine the 687 occurrences of ON to pick out the 34 instances where it equates with "one." We wanted to get a separate listing of all pronouns, and therefore all occurrences of THE and ON had to be examined.

Readers of Middle English will, we think, find one of the appendices to our concordance especially useful. All occurrences of pronouns, sometimes missing from concordances or scattered alphabetically throughout the concordance, and all forms of BE and HAVE (as well as BE and HAVE plus a negative particle) are collected in one listing. Here the reader will find these forms concorded by poem title and poem line number, but not with line of verse citations. The structure and syntactical behavior of these forms are particularly interesting in Middle English. We have made it possible for a reader to locate very quickly every occurrence of these forms in our five poems. Two other appendices should also be useful. We have provided a list of headwords (here entirely in Middle English) in order of the frequency of their occurrence. Parrish's statement of this feature of the Cornell Concordances (p. 12) receives our wholehearted support. It is good to know, very quickly, what words a poet chooses to use more frequently than other words. It is more than merely 'good to know' that; here is an avenue to insight. To use the Arnold concordance itself, for example, it was something of a surprise to discover that Arnold used the word day more frequently than the word night, for, influenced no doubt by admiration of "Dover Beach," it somehow seemd that it should have turned out the other way around. The frequency list of our concordance will no doubt be put to good use. We also thought it would be helpful (our last appendix) to list all the variant lines we uncovered in our collation of the texts we used. Read against each base text we used, this list, sorted according to poem titles, will disclose where


Page 74
variations exist in the other editions we employed. The 9 variant lines of PAT, for example, are, by line number only, disclosed at a glance.

There is no more. I want now to return to my own voice. Mr. Kottler likes to think of our work as a computer concordance shaped by man. It is hard to imagine a better way to put it. Surely, as this record is reviewed, it will occur to anyone to ask why in the world did they use the 7070 in the first place. We used it, after our "pre-machine" textual work was completed, work to be done in any case no matter what technique might afterwards be employed to evolve the concordance, because the IBM 7070 gave us the original print out in a day. It might have taken us a year to do it manually. And, most important of all, we hardly realized what we had to contend with until we had that print out. Unless a corps of co-workers is constituted to prepare a concordance, as Parrish describes the work of the Dante Society of America (p. 1), the machine is indispensible. As I hear of new techniques, particularly those which eliminate the necessity for using punched cards to transfer a text to magnetic tape, or those which skillfully have made use of extremely sophisticated programs, programs designed to handle enormous numbers of variables, and some of these programs aimed squarely at human discourse, oral or written, and not just numbers or other symbols more readily amenable than phonemic or graphemic structures to the operation of binary arithmetic, I feel sure that in the future a concordance such as ours could be produced much more efficiently than we have managed it.

If we have not been efficient, we have been reliable. I have been reading Lane Cooper's delightful account of the evolution of his Concordance of Wordsworth, "The Making and the Use of a Verbal Concordance."[14] Naturally, I wondered how he might have described what we have been doing with our concordance. I can also recall the day when I should have looked elsewhere for entertaining reading— "An undergraduate student once assured me that the word God was rare in the writings of Wordsworth; he had heard so in a lecture. It occurs 274 times in the poems of that author. . . ." (p. 21) Indeed. I doubt that this account of our concordance is that entertaining. I also am not sure that it will prove to be "the gift of Hermes to Apollo" (p. 19). But I am sure that we have produced a reliable concordance to five important Middle English poems. I am sure, too, that students of Middle English poetry will find it extremely useful. I am sure, again, that two mediævalists who scarcely knew one another before they started the work have learned a great deal about these poems, about


Page 75
Middle English, and about the behavior of a sophisticated, at many times a very delicate, language. I am sure, at last, that Mr. Kottler is anxious to return to Boethius, to later mediæval commentaries on Boethius, to Chaucer, and, I suspect, to the ordinary human race. I too find myself looking away, to Beowulf, to Gawain, and to Sir Thomas Malory, Kt., who, at this very instant, prods me with his usual good words—"here is the ende."



It will be helpful to keep in mind S. M. Parrish's "Problems in the Making of Computer Concordances," SB, XV (1962), 1-14; see too Paul L. Garvin, "Computer Participation in Linguistic Research," Language, XXXVIII (Oct-Dec 1962), 385-389. A very useful general discussion of Information Retrieval techniques is E. Herbert's "Finding What's Known," International Science and Technology (Jan 1962), pp. 14-23.


From the very beginning Prof. Barnet Kottler, Department of English, Purdue University, and I have shared this work. He has seen this article in typescript. It seems fair to me, then, from this point on, to speak as "we, us, or our."


We were unable to acquire the "special set of print wheels" described by Parrish, p. 7. Throughout we were able to make use of an IBM 7070 Data Processing System which was, early in the fall of 1960, installed at the University of Pittsburgh's Computation and Data Processing Center. It occurs to us that all future computer concordances might well be prepared in such a fashion that the facilities of the Cornell Concordances Center could be employed; e.g., cards cut and sorted elsewhere could be sent to Ithaca to be printed. Whether or not such an arrangement would be acceptable to the Cornell staff is unknown to us at the moment, but it would appear that this sort of scholarly cooperation and coordination is desirable. The printing requirements for Old English and Middle English texts are considerable. Beyond the need for concordances, other types of analyses beg attention; see W. N. Francis's excellent description, "Graphemic Analysis of Late Middle English Manuscripts," Speculum, XXXVII (Jan 1962), 32-47, esp. 46.


Letter to the author from Professor C. O. Chapman of the University of Puget Sound, Tacoma 6, Washington, 19 December, 1960. We have, of course, consulted his Index of Names in Pearl, Purity, Patience, and Gawain, Cornell Studies in English, Vol. XXXVIII (1951) as well as his "Concordance of the Works of the Pearl-Poet," unpublished dissertation (Cornell University, 1927).


It was a choice between a restricted number of punctuation points, the period and comma only, or none at all. We chose none at all as the less confusing practice. Absence of the apostrophe, however, throws genitive singular and any plural forms together; e.g., ABRAHAMS equals Abraham's. This, of course, reflects the MSS. more accurately than the printed editions do. Cf. n. 2, supra.


Although we frequently examined the facsimile MSS., as we did in this instance, such examinations, in general, were of little help to us. Word divisions in fourteenth century MSS. are often capricious, and it would be risky indeed to base decisions on word boundaries just on the MS. evidence alone. As will be seen, we were obliged to consider other evidence and other requirements before reaching our decisions.


Since we chose to use some Middle English words and some Modern English words as headwords, readers of our concordance will find an entry, HEMEWEL, and one occurrence of it, this line, GGK, 157. HEMEWEL is thus for us a Middle English word. But WELHALED, which also occurs this one time only, and is for us a Middle English word, will be found under the Modern English headword WELL-HALED, / -heyld / (as in "haled into court"), where, ironically, we were forced to use the hyphenated form, because WELLHALED is an unrecognizable Modern English form. As it is, only a reader who realizes that "hale" is an older form of "haul" will recognize WELL-HALED at first sight. The whole matter of choice of headwords is described later on.


We chose the hyphen, too, to retain a delightful Middle English preposition, and printed in GGK, 1200, TO-HIR-WARDE. Our practice, to be done with this puzzle of hyphenated or single forms, would cause us to print, were our poems in Modern English, such forms as COWCATCHER, STREETCAR, and perhaps STREETWALKER. It did cause us to print NW3ER and NW3ERES as single words under the Modern English headwords NEW YEAR and NEW YEARS, as well as to print ASTIT for either AS TIT or AS-TIT. But if one keeps the relationship which exists in English between phonemes and graphhemes in mind, some sanity will be seen to occur in our procedure. We can state this in spite of the fact that if, say, one were to ask us how many times a distinct form, such as WEL, occurs we should have to give an evasive answer. For a succinct, sensible statement about the use of the hyphen in English spelling, see R. A. Hall, Jr., Sound and Spelling in English (Philadelphia, Chilton Co., 1961), pp. 16-22.


In GGK, 1281, incidentally, A occurs as a variant of HO (Modern English "she"), and will be noted in a separate listing of pronouns, although elsewhere, as an article, it is not concorded. Similarly, we eliminated DE and LA from MADOR DE LA PORT and DODDINAL DE SAUAGE.


At the risk of being exceedingly tedious, let us explore this matter a bit further. A fundamental question is involved here. To what extent must the concorder of a Middle English text be an editor? And what kind of an editor? Our aim was to produce an accurate uniform copy edition (text) for a key punch operator. We did produce a uniform text, but although we made as reasonable a text as we could, although we constantly sought a defensible rationale for the sort of editorial decisions we made, and have described here, it is still in some of its parts an arbitrary text. We see no other solution. Prof. Chapman, for example, might well object that we have made no significant advance in textual considerations beyond his work of thirty-six years ago. In some particulars we have. To have gone all the way, however, would have meant that we should have had to produce a fresh critical edition, to have checked every word against the MS. and all printed editions. In short, we would have had to produce a variorum edition. There was no time for that. Moreover, the concordance will be used with the available editions of these poems, and for that purpose it is accurate. Even if a fresh, final edition of these poems does appear later on, the usefulness of the concordance will not seriously be impaired. Nevertheless, the 275 variant lines do indeed suggest that we do not have a perfect concordance. All this further suggests that we never will, with certainty, be able to state what it is the poet actually wrote, and that, consequently, a perfect concordance is not possible. Almost six centuries have elapsed since these poems were composed. All one can hope to do is to come as close to that composition as his data and knowledge will permit. We trust we are not too far removed from the poet. And, to be done with this, we will point out to anyone who prepares a computer concordance of any other Old English or Middle English text that the distance from his composition will undoubtedly be one of his largest concerns.


Here again it will be helpful to look to Parrish's description, pp. 4-6. Our program is comparable, although perhaps a bit more sophisticated. It is not, however, as sophisticated as that which Parrish later describes in his paragraph beginning at the bottom of p. 10.


It will be of some interest to point out here that the Middle English Dictionary, Part D. 5, Ed. Hans Kurath and Sherman Kuhn (Ann Arbor, University of Michigan Press, 1962), p. 1370 cites these forms, and these very same lines, under Dwĩnen. None of the infinitive forms there cited occurs in our texts, and it is only these three occurrences of the 1st sing. pres. indic. which occur; therefore we did not treat this form as, say, we treated DEVOID-DEVOIDS-DEVOIDED. Note, too, that the MED prints LUF-DAUN-GERE where we print one word. The occurrence of DOL in PRL, 326, our concordance line 8212, containing the form DOWYNE, will permit us, also, to point out here that we and the MED agree in bringing similar forms together. Under the Modern English headword DOLE we print all the variant forms listed in the MED, Part D. 4, pp. 1208-1211, which occur in our texts, but we do not, as the MED does, separate the forms as homonyms; i.e., under DOLE we show 18 citations, but the form may vary lexically, in Modern English, from "grief" to "a portion." We were unable, in short, to achieve the resolution of the "homograph problem" which Parrish (p. 9) suggests. We would like to agree with Parrish (p.8) that in a Middle English concordance ". . . users may expect to find find homographs discriminated." But the cost and the machine time to effect that discrimination, as Parrish describes it, are prohibitive. Furthermore, it is open to question, we think, if such discrimination actually is necessary. A glance at the citations under ROSE in the Tatlock and Kennedy Chaucer concordance (p. 753), for example, reveals that the context very quickly shows when ROSE is a noun and when a verb. Similarly, although the cases are not precisely alike, a glance at the citations under our DOLE reveals that the context indicates when "grief" is appropriate and when "a portion." Moreover, since DELE in PRL, 51, also occurs as "grief" where elsewhere it occurs as "devil" or "deal out," it seemed wiser to us to bring this occurrence of DELE under our DOLE. Two secondary points to this wandering note are, first, that we had to effect such changes manually and, second, that we assume a user of our concordance will know how, generally, to read Middle English.


Thinking of the concordance of the Old English poetic remains (Parrish, p. 8), the question of the significance of case of nouns arises. Were we producing that concordance, we do not, at the moment, know exactly what we would do with an irregular feminine noun, say burg. Would the genitive and dative singular form byrig be given a separate entry, or would both be listed under CITY? BURG would not, we should think, suffice because of current connotations attached to it. And how should the nominative and accusative plural form byrig be distinguished from its singular counterpart? Perhaps it would fall under CITIES (BURGS) along with burga and burgum, the genitive and dative plurals. Is this one or four words? Are CITY and CITIES identical? Perhaps only the context of the line in question can be counted on. That problem, thank the good Lord, did not arise in our work. If a sigh of exasperation is detected, it is intentional; it will be, we feel, a frequent response as man and machine wrestle with obsolete language.


In Evolution and Repentance (1935), pp. 18-53.


Page 76