| ||
The production of a computer concordance to Middle English texts for which no standard, or received, edition exists, but which exist in several non-uniform editions, requires two distinct operations.[1] Our preparation of A Computer Concordance to Five Middle English Poems: Pearl, Patience, Cleanness, Sir Gawain and the Green Knight, and St. Erkenwald, soon to be printed by the University of Pittsburgh Press, illustrates very nicely the exact nature of the operations.[2] The first is textual. A satisfactory reading of the MSS. must be obtained; i.e., one has to have or prepare a "best" copy edition or text, taking "best" in such a case to mean that which might best accord with the capabilities of the available computer system as well as best represent the MSS. The second operation is technical. A method has to be devised to translate the text into a "machine readable" format so that the computer system can digest it and perform operations with it; i.e., listing words, comparing them, sorting them, collecting them with an appropriate context, and printing at last the completed concordance. These operations are different, to be sure, yet we soon found out that we were unable to proceed with the textual problems until we had clearly understood the technical requirements. We had, for example, to resolve certain orthographical distinctive features of our text in favor of the printing limitations of our computer system; not only did we eliminate all punctuation
The entire textual problem is perhaps best described by Professor C. O. Chapman. In response to our inquiry, Mr. Chapman reminded us that ". . . there is no uniform text of the five poems such as we have in Skeat's Chaucer, and the variations of spelling within each poem, as well as the presence of four or five editors, each transcribing a poem according to his own method, have resulted in great confusion. It has long seemed to me that such a uniform text for these poems, as Skeat's for Chaucer, is an indispensible prerequisite to the making of a concordance."[4] The wisdom of that conclusion is evident to all who know these poems. The occurrence of BOT in "Cleanness," 473 is a case in point. The form BOT occurs 324 times in these texts. It may be a noun, a verb, or an adverb, or also a preposition or a conjunction. When it equates generally with Modern English but it is not to be concorded, but is to be found in a listing of common words not concorded. If, however, it equates with Modern English remedy, help, announce, or proclaim it is concorded. Since the form BOTE also occurs 11 times in these texts, either with the meaning of boat or boot (i.e., remedy), one would expect that BOT ought not to occur where elsewhere BOTE is used. Now "Cleanness," 473 reads "Bryng bod-worde to bot blysse to us all." Sir Israel Gollancz reads a noun, remedy, taking BOT from Old English bōt; R. J. Menner reads a verb, announce, taking BOT from Old English bodian. This difference means, of course, that both men take a different attitude towards
Our first step, of course, was to assemble all the printed critical editions of the five poems and, lacking access to the MSS., facsimile reproductions of MS. Cott. Nero A.x. + 4 and Brit. Mus. MS. Harl. 2250. We were able, after examination, to exclude several of the printed editions, so that as we faced up to the tedious chore of collation we found it necessary to include but one "variant" edition with each of our "base" editions for four of the poems, but three "variant" editions of "Pearl" were required along with the "base" edition. In order to provide a three space identification symbol for the five poems, a symbol the computer could easily manage, we chose the standard abbreviations of the titles of these poems. The following designations, wherein the three letter symbol alone signifies our base edition, and the symbol followed by V or a number signifies a variant edition, were assigned:
- GGK The 1940 Gollancz edition of Sir Gawain and the Green Knight.
- GGK V The 1952 reprint of the 1925 Tolkien and Gordon edition.
- CLN The 1921 Gollancz edition of Cleanness.
- CLN V The 1920 Menner edition.
- PRL The 1953 Gordon edition of Pearl.
- PRL 1 The 1906 Osgood edition.
- PRL 2 The 1921 Gollancz edition.
- PRL 3 The 1933 Bowdoin College edition.
- PAT The 1924 Gollancz edition of Patience.
- PAT V The 1918 Bateson edition.
- ERK The 1920 Gollancz edition of St. Erkenwald.
- ERK V The 1926 Savage edition.
Our decision to use the 1940 Gollancz edition of Sir Gawain and the Green Knight as base text committed us to certain procedures. For example, we retain -3 as a grapheme, we retain the U/V practise of both Gollancz and Tolkien and Gordon, and we retain the I/Y practice of both editions. Since no punctuation whatsoever is retained, we were obliged to disregard punctuation variants.[5] All occurrences of þ we had to change to TH. All brackets in the text were eliminated, & was changed to AND, and we simply treated all italicized printings as if they were printed regularly.
The hyphen caused the greatest difficulty. In general, we eliminated the hyphen when, for example, it was used in the base text to separate morphemes which ordinarily are not separated. Thus to cite but a few examples which occur early in the poem, we changed VP-ON to VPON (GGK, 47), DE-BATED to DEBATED (GGK, 68), and IN-NOGHE to INNOGHE (GGK, 77). Here, of course, we followed the practice of Tolkien and Gordon. Similarly, we print FORSOTHE as one word in each instance, whereas, in all editions of all five poems, that form may occur as one word, as two words (GGK, 415), or as a hyphenated word. A curious instance is the occurrence of GOD + MON. This combination may occur as two words, i.e., a good man, but it also occurs as one word, i.e., head of a household. Thus GGK, 1029, reads GOD-MON, GGK V, 1029, reads GOD MON, but we print one word, GODMON. Even more disturbing is GGK, 157, where HEME-WEL HALED occurs. In GGK V, 157, that form is printed as HEME WEL-HALED. The facsimile MS. indicates three words.[6] Since this line shows about as well as any other what an editor of a Middle English
- HEME-WEL HALED HOSE OF THAT SAME GRENE GGK 157
- HEME WEL-HALED HOSE OF THAT SAME GRENE GGK V 157
Here we were obliged to face up to the entire matter of compounds versus two words or hyphenated words. After a great deal of thought, we decided to eliminate the hyphen wherever possible, and to choose one word rather than two words wherever it clearly seemed to us that the resulting one form represented a unified lexical entity which readers of Middle English would recognize. Thus we print HEME-WEL for GGK, 157, and WELHALED for GGK V, 157.[7] Some further
Two other editorial decisions were made. First, we retained all proper names in their Middle English forms. This seemingly innocuous procedure caused an amusing problem, for the form AGRAUYN A LA DURE MAYN presents four French words. We quickly eliminated A and LA from the list of words to be concorded, but both DURE and MAYN occur elsewhere as Middle English words, meaning endure and main, and these occurrences of those forms naturally turned up with the others.[9] But under the Modern English headwords HARD and HAND, DURE and MAYN do not appear. In justice to Agravain's too sordid late reputation, we suppose they ought to appear. Second, a certain amount of regularization seemed desirable. We therefore follow the orthographical practice of GGK throughout; i.e., we did not consistently normalize spellings, but retained the C-K, the I-Y, the I-J, the -ES and -E3, and the U-V or U-W distinctions. We did, however, change all occurrences of QUOD to QUOTH, all occurrences of VUS to VS. Our principle here is a simple one. A concordance is used with
The results of our collation were interesting but not startling. The principal achievement, after all, was the production, following the procedures just described, of a uniform text of the five poems. We did not make and adopt a single new emendation. We did not suggest, even, a single new reading of the MS. But we did record, and therefore preserve for consideration, a rather large number of variants. All are truly lexical variants. That is a finding of some significance. Differences of spelling or punctuation do not alter the word stock of a MS., but some differences of MS. readings do. Let us be specific for a moment. The editors of all our printed editions examined the same MS. If, as they do, one editor prints for GGK, 77, Toulouse and the other prints tolouse, and one prints [&] where the other prints of, we need not be concerned. The difference between an upper case and lower case printing is not lexically a significant difference, especially for a computer concordance which is printed entirely in upper case letters. Neither is the difference between [&] (or AND) and of significant, because both words are omitted, not concorded at all, since they meet the requirements of a classification found in all concordances, a list of "common words not concorded." Actually, as the one editor admits, the MS. clearly reads of. We found many variants like these two occurrences, and we simply ignored them. Many variants, however, could not be ignored. When in CLN, 745, the line is printed as THEN THE BURNE OBECHED HYM AND BO3SOMLY HIM THONKKE3 while in CLN V, 745, the same line is rendered THEN ABRAHAM OBECHED HYM AND HY3LY HIM THONKKE3, and the MS. clearly reads ABRAHAM and not THE BURNE, and appears to read
In all we found a considerable number of significant variant lines. In examining the poems, we noted 72 lines of the total of 2530 lines in GGK where significant differences occurred, 64 lines of 1812 lines of CLN, 112 lines of 1212 lines of PRL, only 9 lines of 531 lines of PAT, and 18 lines of 352 lines of ERK. Of the 6437 total lines, then, we discovered 275 variant lines. That means, of course, that we not only added 275 lines of verse to our corpus (almost another poem as long as ERK) but that we also considerably affected both the total word stock and the number of occurrences of a good many individual forms. For example, if a line showed but one difference, say MUCH as against MUTH (PAT and PAT V, 54), both forms, to be sure, were concorded, but also every other word in what was originally one MS. line had to be tallied twice, for each word in both lines of the printed editions was spotted under its headword, and a single word, say CHEKES, which both editors show for this same line, would be cited twice under its headword. Our frequency tally shows that CHEKES occurs three times in these poems, but it really occurs only twice. Since all variant printings are marked with a V, or a number in the case of PRL, a reader of our concordance will have to subtract one from the total occurrences of any given form for each occurrence which is so marked. To make this perfectly clear, suppose a reader wishes to know how many times the word "date" occurs. Our frequency list will show 15 occurrences, but a glance at the fifteen lines containing this word under the headword DATE will reveal that three of the lines are marked as variant lines, and that therefore DATE actually occurs only 12 times. It is not that DATE is in way suspect, but that in ERK, 205, it is A LAPPID DATE while ERK V, 205, has it A LEWID DATE, and that in PRL, 528, it is WYL DAY WAT3 PASSED DATE while both PRL 1 and PRL 3 have it WYLDAY WAT3 PASSED DATE — our one word or two word puzzle again, and in this instance not to be resolved. It will be seen that variants are a bit of a bother to the concorder.
Our uniform text in hand, we turned to our second operation. Any sense of relief we might have experienced at having "solved" the textual problems, being able at last to release our text to an efficient machine, was short-lived. Initially, things did indeed run smoothly. A very capable operator, loaned to us for the purpose by the University of Pittsburgh's Health Law Center, punched our text and produced, in approximately 40 hours, 6712 IBM cards, each card punched with one line of text, its proper title abbreviation, and its proper line number in its own text. The Health Law Center also printed out that initial deck of cards. In another week the print-out was proofread, and all errors noted were quickly corrected by simply punching a complete new card for each error discovered. Our initial deck of cards, now corrected and carrying in its own punched format our uniform text, was now ready for the IBM 7070.
Once again we were fortunate. Mr. Charles Bacon, a Systems Analyst in our Computation and Data Processing Center who had become interested in our project, agreed to prepare the program for
The initial deck of cards was punched (as we have just described) so that each card was completely identified without regard to the order in which the deck was stacked. The first computer program was written to read these cards and perform the following functions. First, it loaded into the computer a common word list, 146 words not to be concorded, which, eliminated, lightened the load on the rest of the system. Second, the program read each text card and preserved its content as a "card image." That "card image," at this stage, was scanned by a sub-program designed to find each separate word and move it to a twenty-letter area, making sure that the first letter of each word was always placed in the first column of that twenty-letter area. Third, the program wrote out that area, immediately followed by the complete "card image," as a connected tape record. When all the words from a given card had either been written on to tape, the "card image" being repeated as many times as non-common words appeared on the card, or eliminated as common words, the next card was read, and the process repeated.
The second computer program was designed to sort the connected tape record. The record was sorted, alphabetically, first according to the individual words and second according to the poem title abbreviation. A third sorting according to line number completed this program. The entire sorting process was performed by making use of a standard program written for general sorting requirements.
The third, and last, computer program was designed to transfer the sorted tape record on to cards in a format suited to the printing requirements of the concordance. At this stage a second common word list was introduced. (Personal pronouns and all forms of the verbs have and be comprised this list. Our reasons for desiring this list will be explained later on.) Words contained in this list are to be printed in a separate listing, identified by poem title abbreviation and line number only, instead of being displayed in an entire line of text. The sorted tape record, therefore, was punched out in such a manner that three card formats were produced: first, the individual word by itself (i.e., a headword), second, the original line of verse taken from the copy text, and, third, the display of numerical references. Thus 36,831 cards were produced and stacked in an order which would produce, as they were run through a printer, the final concordance.
After Mr. Bacon had devised that program he supervised the actual run of the entire process, from the first step of loading our initial deck to the last step of stacking the 36,831 cards for the printer. The IBM 7070 is an extremely rapid machine. The entire operation took but a very short time. The first phase, loading our initial deck of 6712 cards and transferring our text to magnetic tape, required but 25 minutes. The second phase, the entire sorting process, required 45 minutes. The last phase, the punch-out of the sorted tape record, required 4 hours. The actual machine time, then, from initial loading to stacking of the punched output in proper order for printing, was just 5 hours and 10 minutes. The printer was a bit slower; it required 8 hours to print the whole concordance. The entire operation thus required 13 hours and 10 minutes. We very carefully gathered up the continuous sheet which bore our project, separated the roll into sheets, stacked and bound them, and carried them off, hopefully, after a rapid proofreading, to send the concordance, with some slight introduction prepared conventionally, to be reproduced in an offset process by the university press.
At this happy time we encountered our largest disappointment. Had our poems been written in twentieth century English our first printout, without doubt, could have been reproduced, just as Mr. Parrish described it for the Arnold concordance. But ours are fourteenth century poems, and they were not written in twentieth century English. It is almost, looking back at it now, as if some huge irony were on purpose lodged within our texts, as if those old poets were now able to confound us, to dare us to reduce their art to a modern machine manageable format. Mr. Parrish and his associates were dismayed, he tells us, to see AAR and AARAU as the first two items in their concordance. As we looked over our product we were shocked. The more carefully we searched, the greater was our concern. We saw chaos.
It seems best to us to produce for examination here a sample of our first print out. In this fashion we can describe the true nature of the technological problems we now faced. Also, most of our principles, our feeling of what this concordance should be, can be induced from the very data we worked with. Picked quite at random, here is one sheet of our first print out.
DEUOUTLY | 7741 | |
HIS TWO DERE DO3TERE3 DEUOUTLY HEM HAYLSED ...CLN | 814 | 7742 |
DEUOYDE | 7743 |
THAT WONT WAT3 WHYLE DEUOYDE MY WRANGE ...PRL | 15 | 7744 |
DEUOYDES | 7745 | |
DEUOYDES VCHE A VAYNEGLORIE THAT VAYLES SO LITELLE ...ERK | 348 | 7746 |
DEUOYDIT | 7747 | |
AND DEUOYDIT FROM THE DOUTHE AND DITTE THE DURRE AFTER ...ERK | 116 | 7748 |
AND DEUOYDIT FROM THE DEDE AND DITTE THE DURRE AFTER ...ERK V | 116 | 7749 |
DEUYNE | 7750 | |
AND SO DO WE NOW OURE DEDE DEUYNE WE NO FYRRE ...ERK | 169 | 7751 |
DEUYS | 7752 | |
OF DIAMAUNTE3 A DEUYS ...GGK | 617 | 7753 |
DEUYSE | 7754 | |
DERE SER QUOTH THE DEDE BODY DEUYSE THE I THENKE ...ERK | 225 | 7755 |
THE DERTHE THEROF FOR TO DEUYSE ...PRL | 99 | 7756 |
I HOPED THE WATER WERE A DEUYSE ...PRL | 139 | 7757 |
WYTH THE MYRYESTE MARGARYS AT MY DEUYSE ...PRL | 199 | 7758 |
DEUYSED | 7759 | |
ER DALT WERE THAT ILK DOME THAT DANYEL DEUYSED ...CLN | 1756 | 7760 |
AS JOHN DEUYSED 3ET SA3 I THARE ...PRL | 1021 | 7761 |
DEUYSEMENT | 7762 | |
I KNEW HIT BY HIS DEUYSEMENT ...PRL | 1019 | 7763 |
DEUYSE3 | 7764 | |
AS DEUYSE3 HIT THE APOSTEL JHON ...PRL | 984 | 7765 |
AS DERELY DEUYSE3 THIS ILK TOUN ...PRL | 995 | 7766 |
DEUYSIT | 7767 | |
THE DENE OF THE DERE PLACE DEUYSIT AL ON FYRST ...ERK | 144 | 7768 |
DEVAYE | 7769 | |
3IF ANY WERE SO VILANOUS THAT YOW DEVAYE WOLDE ...GGK | 1497 | 7770 |
DEVISED | 7771 | |
THER PRYUELY IN PARADYS HIS PLACE WAT3 DEVISED ...CLN | 238 | 7772 |
DEVOYDE | 7773 | |
WYTH ALLE THISE WY3E3 SO WYKKE WY3TLY DEVOYDE ...CLN | 908 | 7774 |
DEVOYDYNGE | 7775 | |
IN DEVOYDYNGE THE VYLANYE THAT VENKQUYST HIS THEWE3 ...CLN | 544 | 7776 |
DEVYSE | 7777 | |
WEL CLANNER THEN ANY CRAFTE COWTHE DEVYSE ...CLN | 1100 | 7778 |
DEVYSED | 7779 | |
DANYEL IN HIS DIALOKE3 DEVYSED SUMTYME ...CLN | 1117 | 7780 |
HE DEVYSED HIS DREMES TO THE DERE TRAWTHE ...CLN | 1604 | 7781 |
DEW | 7782 |
THAT ALLE WAT3 DUBBED AND DY3T IN THE DEW OF HEUEN ...CLN | 1688 | 7783 |
DEWE | 7784 | |
WHEN THE DONKANDE DEWE DROPE3 OF THE LEUE3 ...GGK | 519 | 7785 |
DEWOUTLY | 7786 | |
BOT I DEWOUTLY AWOWE THAT VERRAY BET3 HALDEN ...PAT | 333 | 7787 |
DEWOYDE | 7788 | |
DEWOYDE NOW THY VENGAUNCE THUR3 VERTU OF RAUTHE ...PAT | 284 | 7789 |
DEWYNE | 7790 | |
I DEWYNE FORDOLKED OF LUFDAUNGERE ...PRL | 11 | 7791 |
I DEWYNE FORDOKKED OF LUFDAUNGERE ...PRL 2 | 11 | 7792 |
DERE | 7793 | |
THAT DRY3TYN FOR OURE DESTYNE TO DE3E WAT3 BORNE ...GGK | 996 | 7794 |
DE3EN | 7795 | |
WHAT THAY BRAYEN AND BLEDEN BI BONKKE3 THAT DE3EN ...GGK | 1163 | 7796 |
DE3TER | 7797 | |
HOW THE DE3TER OF THE DOUTHE WERE DERELYCH FAYRE ...CLN | 270 | 7798 |
I HAF A TRESSOR IN MY TELDE OF TWO MY FAYRE DE3TER ...CLN | 866 | 7799 |
THO WERN LOTH AND HIS LEF HIS LUFLYCHE DE3TER ...CLN | 939 | 7800 |
LOTH AND THO LULYWHIT HIS LEEFLY TWO DE3TER ...CLN | 977 | 7801 |
THE THRE LEDE3LENT THERIN LOTH AND DE3TER ...CLN | 933 | 7802 |
Notice, in the first place, the run of numbers at the right margin. These numbers, 7714-7802, identify the position of the 62 cards of our total deck of 36,831 cards, placed in the order shown, which carried that portion of the first print out reproduced here. They are concordance line numbers. In the final off-set printing, these numbers, suppressed, will not appear. Their utility to us at this stage of our work, as will soon be evident, was enormous.
Actually, we faced the problem of orthographical variants as well as that of Middle English versus Modern English headwords long before we had our first print out of the entire concordance. We at the start obtained a print out of the complete lexicon, and, working with it and our copy text, attempted to devise a system of cross referencing which would allow to retain all headwords in their Middle English form. But cross referencing became so complicated that we were forced to give it up; users of the concordance so arranged would find it unnecessarily complex. We decided then that we should have to use some Modern English headwords. As we shall point out shortly, what
Suppose, now, that this 62 line sample concordance were the final version of this portion of the whole concordance. And suppose, further, that a reader entered the concordance at the headword DEWYNE. How would he know that he should find, 421 lines further along, roughly eight pages in the printed book, a variant form, DOWYNE? What instructions ought we provide? Or, earlier in our work, ought we to have spotted the DEWYNE — DOWYNE variation and regularized all three occurrences to DEWYNE? (It would have been a simple matter, actually, because the excellent glossary in Gordon's Pearl does, on p. 127, link these forms.) Such a procedure, carried out consistently in our five poems, however, would have been an impossible task; thousands of forms would have had to be changed and our texts, as well, would have been seriously violated. Moreover, we would have produced a concordance to a text unknown to anyone. Hence, we did not regularize spelling. Ought we to have chosen a Modern English headword and printed both forms under it? The Middle English form is derived from Old English dwīnan, meaning "languish, pine away." We do not use Modern English headwords unless the Modern English development of the Middle English form closely resembles the Middle English form. It would not do, in this case, to use LANGUISH as a headword for these forms (the poets did not know that word), and there is no Modern English form *DEVINE. We chose to retain DEWYNE as headword, and to print all three occurrences of it here, these two occurrences, concordance line numbers 7791 and 7792, and concordance line number 8212, which contains the form DOWYNE. Where DOWYNE falls out in the concordance as a headword, at present as concordance line number 8211, we have replaced that card with a new card which carries this format: DOWYNE (V. DEWYNE). We therefore make it certain for the reader that he indeed finds the three occurrences of the Middle English forms which equate with Modern English "languish."[12]
It might appear, at this juncture, that our simplest solution would have been the elaborate cross reference key we attempted earlier to devise. We could have retained all Middle English headwords, and directed attention to other places in the concordance where variant forms of semantically similar items occurred. Very carefully we started through our first print out and marked vide references after those headwords for which variant forms occurred. The very first entry on the print out was ABATAYLMENT. It was marked to read: ABATAYLMENT (V. BATELMENT). That procedure was designed to carry a reader from the first line to concordance line number 1975, where he would see BATELMENT. BATELMENT, of course, had to be changed to read: BATELMENT (V. ABATAYLMENT). After some time we gave it up. Not only did it turn out that we should have to mark well over half of our total stock of headwords, but it turned out also, in some cases, that we had to cite as many as fourteen vide references for a single headword, and thus to mark each of those fourteen co-occurring variant forms. Moreover, it proved impossible always to indicate which form constituted the principal entry. Had we persisted, the result would have been an inefficient concordance, if not an inadequate one. Users would have been forced to turn back and forth so frequently that they well might have concluded that such searching was not worth the effort.
We therefore concluded that the concordance had to be rearranged so that it would present the sorted word stock of our five poems in the most convenient and useful form for the type of reader we imagined would most want to consult it. We decided, then, that forms which were phonemically and lexically alike but graphemically different had to be drawn together under a common headword. For example, we have grouped under the Modern English headword CHRIST all occurrences of CRIST, CRISTE, CRYST, KRYST, and KRYSTE. Forms which are lexically alike but are morphemically or morphophonemically different are distinguished. We count DEUEL and DEUELE3 as two words, and we count GODE (or GOOD, GOODE, GOUD, GOUDE), BETTER, BEST (or BESTE) as three words. In short, all lexical variants are distinguished (the chief aim of a concordance) and, at a second level of distinction, within the major form classes all significant structural differences are distinguished. Thus: Nouns — singular and plural forms are listed separately, but case is ignored, especially, lacking an apostrophe, since genitive singulars and plurals cannot be distinguished from other case singulars and plurals;[13] Verbs — the five paradigmatic forms are listed separately, i.e., infinitive, 3d sing. pres. indic., pret., p. partic., and pres. partic., with no distinction made between pret. and p. partic. forms unless a distinctive difference exists, as, in Modern English, say, between RODE and RIDDEN; Adjectives — the stem, comparative degree, and superlative degree are listed separately; Adverbs and all other Form Classes — these presented no especial problems.
We have, here and there, already said a few things about the choice of headwords. In general, we find Parrish's description of the matter quite sound, and his solution for a text like ours reasonable. We have, however, made one significant change. Parrish suggests that the ". . . optimum concordance to an early text is of a third type, in which the lines of verse are given in their original form but index words are
All this activity, with some decision called for at every entry in the concordance, with uncommon persistence required to uncover all the alternates and variant spellings (even making sure, to give another
Relocating the out put deck of 36,831 cards has proved to be an equally arduous task. Since thousands of operations are involved, it was apparent that we could not count on the IBM 7070 to perform the work. To switch card and concordance line number 2030 to line number 1, for example, would require a distinct program, simple to be sure, but costly. To sort out the line numbers 16575, 16576, 16463 . . . 5013, 16582, 16583 . . . 16592-98 . . . 508, 16465 (40 occurrences, in all, of what we list under Modern English CAST) and move this entire newly arranged unit so that it is spotted after what is at present line number 5011 would require another distinct program, a more complicated one, also costly. With these two instances expanded to thousands, it should be clear why the computer is ill suited for that assignment. Perhaps in the future an extremely sophisticated program might be devised to perform that kind of operation efficiently. At the present time it is not possible. We are completing this arduous work manually.
The utility of the concordance line numbers will now be apparent. A simple program was devised to print the matching concordance line number on each of our 36,831 out put cards. That entire deck now is so marked; i.e., each punched card now carries the same number on it as the line number it represents. Thus an individual card, run through the printer some time ago, and which generated a line of our original print out, can now be picked out from all the other 36,830 out put cards. All our "post machine" editing was performed right on the original print out sheets. The instructions are quite clear. They
A few minor problems occurred. We had, for example, to examine each of the 2937 occurrences of THE to pick out the 146 instances where the form equates with the pronoun "thee." The other 2791 occurrences were not concorded. So, too, did we examine the 687 occurrences of ON to pick out the 34 instances where it equates with "one." We wanted to get a separate listing of all pronouns, and therefore all occurrences of THE and ON had to be examined.
Readers of Middle English will, we think, find one of the appendices to our concordance especially useful. All occurrences of pronouns, sometimes missing from concordances or scattered alphabetically throughout the concordance, and all forms of BE and HAVE (as well as BE and HAVE plus a negative particle) are collected in one listing. Here the reader will find these forms concorded by poem title and poem line number, but not with line of verse citations. The structure and syntactical behavior of these forms are particularly interesting in Middle English. We have made it possible for a reader to locate very quickly every occurrence of these forms in our five poems. Two other appendices should also be useful. We have provided a list of headwords (here entirely in Middle English) in order of the frequency of their occurrence. Parrish's statement of this feature of the Cornell Concordances (p. 12) receives our wholehearted support. It is good to know, very quickly, what words a poet chooses to use more frequently than other words. It is more than merely 'good to know' that; here is an avenue to insight. To use the Arnold concordance itself, for example, it was something of a surprise to discover that Arnold used the word day more frequently than the word night, for, influenced no doubt by admiration of "Dover Beach," it somehow seemd that it should have turned out the other way around. The frequency list of our concordance will no doubt be put to good use. We also thought it would be helpful (our last appendix) to list all the variant lines we uncovered in our collation of the texts we used. Read against each base text we used, this list, sorted according to poem titles, will disclose where
There is no more. I want now to return to my own voice. Mr. Kottler likes to think of our work as a computer concordance shaped by man. It is hard to imagine a better way to put it. Surely, as this record is reviewed, it will occur to anyone to ask why in the world did they use the 7070 in the first place. We used it, after our "pre-machine" textual work was completed, work to be done in any case no matter what technique might afterwards be employed to evolve the concordance, because the IBM 7070 gave us the original print out in a day. It might have taken us a year to do it manually. And, most important of all, we hardly realized what we had to contend with until we had that print out. Unless a corps of co-workers is constituted to prepare a concordance, as Parrish describes the work of the Dante Society of America (p. 1), the machine is indispensible. As I hear of new techniques, particularly those which eliminate the necessity for using punched cards to transfer a text to magnetic tape, or those which skillfully have made use of extremely sophisticated programs, programs designed to handle enormous numbers of variables, and some of these programs aimed squarely at human discourse, oral or written, and not just numbers or other symbols more readily amenable than phonemic or graphemic structures to the operation of binary arithmetic, I feel sure that in the future a concordance such as ours could be produced much more efficiently than we have managed it.
If we have not been efficient, we have been reliable. I have been reading Lane Cooper's delightful account of the evolution of his Concordance of Wordsworth, "The Making and the Use of a Verbal Concordance."[14] Naturally, I wondered how he might have described what we have been doing with our concordance. I can also recall the day when I should have looked elsewhere for entertaining reading— "An undergraduate student once assured me that the word God was rare in the writings of Wordsworth; he had heard so in a lecture. It occurs 274 times in the poems of that author. . . ." (p. 21) Indeed. I doubt that this account of our concordance is that entertaining. I also am not sure that it will prove to be "the gift of Hermes to Apollo" (p. 19). But I am sure that we have produced a reliable concordance to five important Middle English poems. I am sure, too, that students of Middle English poetry will find it extremely useful. I am sure, again, that two mediævalists who scarcely knew one another before they started the work have learned a great deal about these poems, about
| ||