A Computer Concordance to Middle English
Texts
by
Alan Markman
The production of a computer concordance to Middle English texts
for which no standard, or received, edition exists, but which exist in several
non-uniform editions, requires two distinct operations.[1] Our preparation of A
Computer
Concordance to Five Middle English Poems: Pearl,
Patience, Cleanness, Sir Gawain and the
Green
Knight, and St. Erkenwald, soon to be printed by the
University of Pittsburgh Press, illustrates very nicely the exact nature of the
operations.[2] The first is textual. A
satisfactory reading of the MSS. must be obtained; i.e., one has to have or
prepare a "best" copy edition or text, taking "best" in such a case to mean
that which might best accord with the capabilities of the available computer
system as well as best represent the MSS. The second operation is
technical. A method has to be devised to translate the text into a "machine
readable" format so
that the computer system can digest it and perform operations with it; i.e.,
listing words, comparing them, sorting them, collecting them with an
appropriate context, and printing at last the completed concordance. These
operations are different, to be sure, yet we soon found out that we were
unable to proceed with the textual problems until we had clearly understood
the technical requirements. We had, for example, to resolve certain
orthographical distinctive features of our text in favor of the printing
limitations of our computer system; not only did we eliminate all
punctuation
points but also all occurrences of the grapheme þ, to cite one instance,
had to be changed to upper case TH, and, for another instance, the entire
corpus had to printed out in upper case letters.
[3]
The entire textual problem is perhaps best described by Professor C.
O. Chapman. In response to our inquiry, Mr. Chapman reminded us that
". . . there is no uniform text of the five poems such as we have in Skeat's
Chaucer, and the variations of spelling within each poem, as
well as the presence of four or five editors, each transcribing a poem
according to his own method, have resulted in great confusion. It has long
seemed to me that such a uniform text for these poems, as Skeat's for
Chaucer, is an indispensible prerequisite to the making of a
concordance."[4] The wisdom of that
conclusion is evident to all who know these poems. The occurrence of BOT
in "Cleanness," 473 is a case in point. The form BOT occurs 324 times in
these texts. It may be a noun, a verb, or an adverb, or also a preposition
or a conjunction. When it equates generally with Modern English
but it is not to be concorded, but is to be found in a listing
of
common
words not concorded. If, however, it equates with Modern English
remedy, help, announce, or
proclaim it is concorded. Since the form BOTE also occurs
11
times in these texts, either with the meaning of boat or
boot (i.e., remedy), one would expect that BOT
ought not to occur where elsewhere BOTE is used. Now "Cleanness," 473
reads "Bryng bod-worde to bot blysse to us all." Sir Israel Gollancz reads
a noun, remedy, taking BOT from Old English
bōt; R. J. Menner reads a verb, announce,
taking BOT from Old English bodian. This difference means,
of course, that both men take a different attitude towards
BLYSSE. If BOT in this instance means
remedy it has to be
found in our concordance under Modern English
boot,
whereas
if it means
announce, we should have to list it under Modern
English
bode, where immediately it would look like one of
the
more than a dozen forms of BIDE. As a saving grace, BOOT, meaning any
sort of covering for the foot, does not, thank goodness, occur. In the same
poem, does FLEE3, 1476, mean
fleece or
flies,
and
does GENTYLE, 1432, mean
gentle or
gentile?
Editors do not agree. Or in "Sir Gawain and the Green Knight," 1634, does
HERE mean
hear or
praise? These few
instances are
but a fraction of the occurrence of forms which called for editorial
decisions. One word, two words, or compound; spelling variations;
homographs and homonyms; these are a few of the editorial problems we
faced. We obviously needed the "uniform text" Mr. Chapman spoke of.
There was nothing for us to do but to make one.
We did it in this fashion.
Our first step, of course, was to assemble all the printed critical
editions of the five poems and, lacking access to the MSS., facsimile
reproductions of MS. Cott. Nero A.x. + 4 and Brit. Mus. MS. Harl. 2250.
We were able, after examination, to exclude several of the printed editions,
so that as we faced up to the tedious chore of collation we found it
necessary to include but one "variant" edition with each of our "base"
editions for four of the poems, but three "variant" editions of "Pearl" were
required along with the "base" edition. In order to provide a three space
identification symbol for the five poems, a symbol the computer could
easily manage, we chose the standard abbreviations of the titles of these
poems. The following designations, wherein the three letter symbol alone
signifies our base edition, and the symbol followed by V or a number
signifies a variant edition, were assigned:
- GGK The 1940 Gollancz edition of Sir Gawain and the
Green Knight.
- GGK V The 1952 reprint of the 1925 Tolkien and Gordon
edition.
- CLN The 1921 Gollancz edition of
Cleanness.
- CLN V The 1920 Menner edition.
- PRL The 1953 Gordon edition of Pearl.
- PRL 1 The 1906 Osgood edition.
- PRL 2 The 1921 Gollancz edition.
- PRL 3 The 1933 Bowdoin College edition.
- PAT The 1924 Gollancz edition of
Patience.
- PAT V The 1918 Bateson edition.
- ERK The 1920 Gollancz edition of St.
Erkenwald.
- ERK V The 1926 Savage edition.
We knew at this stage of our work that eventually an IBM card punch
operator should have to have in front of her an accurate copy text of these
poems so that she could punch on one card one line of the text followed by
an identification symbol and the line number of that line in its text. The
single object of our collation was to provide that copy text. Because
Sir Gawain and the Green Knight is the longest of these five
poems we chose to work first with it.
Our decision to use the 1940 Gollancz edition of Sir Gawain
and
the Green Knight as base text committed us to certain procedures.
For example, we retain -3 as a grapheme, we retain the U/V practise of
both Gollancz and Tolkien and Gordon, and we retain the I/Y practice of
both editions. Since no punctuation whatsoever is retained, we were obliged
to disregard punctuation variants.[5]
All occurrences of þ we had to change to TH. All brackets in the text
were eliminated, & was changed to AND, and we simply treated all
italicized printings as if they were printed regularly.
The hyphen caused the greatest difficulty. In general, we eliminated
the hyphen when, for example, it was used in the base text to separate
morphemes which ordinarily are not separated. Thus to cite but a few
examples which occur early in the poem, we changed VP-ON to VPON
(GGK, 47), DE-BATED to DEBATED (GGK, 68), and IN-NOGHE to
INNOGHE (GGK, 77). Here, of course, we followed the practice of
Tolkien and Gordon. Similarly, we print FORSOTHE as one word in each
instance, whereas, in all editions of all five poems, that form may occur as
one word, as two words (GGK, 415), or as a hyphenated word. A curious
instance is the occurrence of GOD + MON. This combination may occur
as two words, i.e., a good man, but it also occurs as one
word,
i.e., head of a household. Thus GGK, 1029, reads
GOD-MON,
GGK V, 1029, reads GOD MON, but we print one word, GODMON. Even
more disturbing is GGK, 157, where HEME-WEL HALED occurs. In
GGK V, 157, that form is printed as HEME WEL-HALED.
The facsimile MS. indicates three words.[6] Since this line shows about as well
as any
other what an editor of a Middle English
text faces, it will be worth a moment to look at it closely. The line is
printed as follows:
- HEME-WEL HALED HOSE OF THAT SAME GRENE GGK
157
- HEME WEL-HALED HOSE OF THAT SAME GRENE GGK V
157
The MS. reading has to be taken as HEME WEL HALED. Now both
Gollancz and Tolkien and Gordon derive Middle English HAL (L)E from
Old French
haler, and suggest "rise, depart, rush, draw, lift,
come, go, pass, and loose from a bow" as Modern English equivalents.
Tolkien and Gordon regard HEME as an adjective, derived from Old
English
gehœme, meaning "neat" and WEL-HALED as
another adjective, simply a combination of WEL + HAL (L) E, maening
"pulled up properly" or "drawn tight." Gollancz considers HEME-WEL to
be an adjective meaning "well fitting" which is closely allied to HEMELY,
"closely," which he derives from Old Norse
heimolliga,
"privately," and for which he suggests Old English cognates
hām and
hæm-. One should therefore
have
to translate GGK, 157, as "well fitting drawn (up) stockings of that same
green color" and GGK V, 157, as "neat properly pulled up stockings of that
same green color." Admittedly the concorder is not a
lexicographer, and ordinarily one turns to a concordance not to find a
discrimination of lexical meanings but instead a listing of forms, of words.
And it probably is of little importance, in this instance, to suggest that "well
fitting drawn up stockings" are not very much different, if different at all,
from "neat properly pulled up stockings," or that HEME-WEL and HEME
might be synonyms after all. Yet it is a matter of importance to the counter
of words to know how many times HEME, HEME-WEL, HALED, and
WEL-HALED occur in these texts.
Here we were obliged to face up to the entire matter of compounds
versus two words or hyphenated words. After a great deal of thought, we
decided to eliminate the hyphen wherever possible, and to choose one word
rather than two words wherever it clearly seemed to us that the resulting
one form represented a unified lexical entity which readers of Middle
English would recognize. Thus we print HEME-WEL for GGK, 157, and
WELHALED for GGK V, 157.[7]
Some further
idea of the complications we struggled with can be got from observing that
HAL and HAL (L) E, the word we started with in this demonstration, also
occur as nouns, meaning "castle" or "hall." Nevertheless, our decision to
eliminate the hyphen wherever possible provides, we think, a clearer
reading of the texts. It also precluded several technical problems, since an
abundance of hyphenated forms could cause a certain amount of machine
confusion. Of course, we did retain the hyphen in PRL, 195,
FLOR-DE-LYS, where we felt FLORDELYS was an impossible form.
Treated as separate words, moreover, we would have created an
unnecessary homonym with FLOR meaning "floor" and have inserted,
rather falsely, two French forms in our listings.
[8]
Two other editorial decisions were made. First, we retained all proper
names in their Middle English forms. This seemingly innocuous procedure
caused an amusing problem, for the form AGRAUYN A LA DURE
MAYN presents four French words. We quickly eliminated A and LA from
the list of words to be concorded, but both DURE and MAYN occur
elsewhere as Middle English words, meaning endure and
main, and these occurrences of those forms naturally turned
up
with the others.[9] But under the
Modern English headwords HARD and HAND, DURE and MAYN do not
appear. In justice to Agravain's too sordid late reputation, we suppose they
ought to appear. Second, a certain amount of regularization seemed
desirable. We therefore follow the orthographical practice of GGK
throughout; i.e., we did not consistently normalize spellings, but retained
the C-K, the I-Y, the I-J, the -ES and -E3, and the U-V or U-W
distinctions. We did, however, change all occurrences
of QUOD to QUOTH, all occurrences of VUS to VS. Our principle here
is a simple one. A concordance is used with
the available printed texts. There is no standard, received edition of these
poems, so our copy text actually is the closest yet to the "uniform text" Mr.
Chapman spoke of. Having chosen the Gollancz edition of
Sir
Gawain
and the Green Knight as the base text for the longest of our poems,
it seemed only reasonable to us to make all other printed editions conform
to its orthography. In view of the presently available printed editions of
these poems, the editions one would have at hand while using our
concordance, little good would have resulted had we expended further
energy in replacing initial 3 with Y and medial or final 3 with GH or W.
Finally, to be done with these matters, we changed BERTILAK, GGK,
2445, to the better supported form BERCILAK, we did not use at all the
spurious line, GGK, 2445*, we changed TON and DON, ERK, 5-6, to
TOUN and DOUN, and we omitted altogether all occurrences of AMEN
as well as the motto HONY SOYT QUI MAL PENCE.
The results of our collation were interesting but not startling. The
principal achievement, after all, was the production, following the
procedures just described, of a uniform text of the five poems. We did not
make and adopt a single new emendation. We did not suggest, even, a
single new reading of the MS. But we did record, and therefore preserve
for consideration, a rather large number of variants. All are truly lexical
variants. That is a finding of some significance. Differences of spelling or
punctuation do not alter the word stock of a MS., but some differences of
MS. readings do. Let us be specific for a moment. The editors of all our
printed editions examined the same MS. If, as they do, one editor prints for
GGK, 77, Toulouse and the other prints
tolouse, and
one prints [&] where the other prints of,
we need
not be concerned. The difference between an upper case and lower case
printing is not lexically a significant difference, especially
for a computer concordance which is printed entirely in upper case letters.
Neither is the difference between [&] (or AND) and
of significant, because both words are omitted, not concorded
at all, since they meet the requirements of a classification found in all
concordances, a list of "common words not concorded." Actually, as the
one editor admits, the MS. clearly reads of. We found many
variants like these two occurrences, and we simply ignored them. Many
variants, however, could not be ignored. When in CLN, 745, the line is
printed as THEN THE BURNE OBECHED HYM AND BO3SOMLY HIM
THONKKE3 while in CLN V, 745, the same line is rendered THEN
ABRAHAM OBECHED HYM AND HY3LY HIM THONKKE3, and the
MS. clearly reads ABRAHAM and not THE BURNE, and appears to read
LO3LY where these editors show BO3SOMLY and HY3LY, we found a
problem on our hands. We saw no reason to retain THE BURNE, but we
saw that it was reasonable to retain BO3SOMLY and HY3LY, since both
forms represent reasonable decisions of editors who examined the very
same spot in the MS. Therefore this one line in the MS. grew into two lines
in printed editions and two lines for us also. Our concordance will show
both lines, for we concluded that an honest lexical difference existed, and
that support for both BO3SOMLY and HY3LY could be adduced, and that
we should count these forms as actual occurrences.
In all we found a considerable number of significant variant lines. In
examining the poems, we noted 72 lines of the total of 2530 lines in GGK
where significant differences occurred, 64 lines of 1812 lines of CLN, 112
lines of 1212 lines of PRL, only 9 lines of 531 lines of PAT, and 18 lines
of 352 lines of ERK. Of the 6437 total lines, then, we discovered 275
variant lines. That means, of course, that we not only added 275 lines of
verse to our corpus (almost another poem as long as ERK) but that we also
considerably affected both the total word stock and the number of
occurrences of a good many individual forms. For example, if a line
showed but one difference, say MUCH as against MUTH (PAT and PAT
V, 54), both forms, to be sure, were concorded, but also every other word
in what was originally one MS. line had to be tallied twice, for each word
in both lines of the printed editions was spotted under its headword, and a
single word, say CHEKES, which both editors show for
this same line, would be cited twice under its headword. Our frequency
tally shows that CHEKES occurs three times in these poems, but it really
occurs only twice. Since all variant printings are marked with a V, or a
number in the case of PRL, a reader of our concordance will have to
subtract one from the total occurrences of any given form for each
occurrence which is so marked. To make this perfectly clear, suppose a
reader wishes to know how many times the word "date" occurs. Our
frequency list will show 15 occurrences, but a glance at the fifteen lines
containing this word under the headword DATE will reveal that three of the
lines are marked as variant lines, and that therefore DATE actually occurs
only 12 times. It is not that DATE is in way suspect, but that in ERK, 205,
it is A LAPPID DATE while ERK V, 205, has it A LEWID DATE, and
that in PRL, 528, it is WYL DAY WAT3 PASSED DATE while both PRL
1 and PRL 3 have it WYLDAY WAT3 PASSED DATE — our one
word
or two word
puzzle again, and in this instance not to be resolved. It will be seen that
variants are a bit of a bother to the concorder.
This is not the place to display all the variants we discovered; let it suffice
that our collation was necessary, that it turned up some interesting
significant differences, and that it is one more problem a maker of a
concordance of a Middle English text must reckon with.
[10]
Our uniform text in hand, we turned to our second operation. Any
sense of relief we might have experienced at having "solved" the textual
problems, being able at last to release our text to an efficient machine, was
short-lived. Initially, things did indeed run smoothly. A very capable
operator, loaned to us for the purpose by the University of Pittsburgh's
Health Law Center, punched our text and produced, in approximately 40
hours, 6712 IBM cards, each card punched with one line of text, its proper
title abbreviation, and its proper line number in its own text. The Health
Law Center also printed out that initial deck of cards. In another week the
print-out was proofread, and all errors noted were quickly corrected by
simply punching a complete new card for each error discovered. Our initial
deck of cards, now corrected and carrying in its own punched format our
uniform text, was now ready for the IBM 7070.
Once again we were fortunate. Mr. Charles Bacon, a Systems Analyst
in our Computation and Data Processing Center who had become interested
in our project, agreed to prepare the program for
the concordance.
[11] Largely in Mr.
Bacon's words, this is the program he prepared.
The initial deck of cards was punched (as we have just described) so
that each card was completely identified without regard to the order in
which the deck was stacked. The first computer program was written to
read these cards and perform the following functions. First, it loaded into
the computer a common word list, 146 words not to be concorded, which,
eliminated, lightened the load on the rest of the system. Second, the
program read each text card and preserved its content as a "card image."
That "card image," at this stage, was scanned by a sub-program designed
to find each separate word and move it to a twenty-letter area, making sure
that the first letter of each word was always placed in the first column of
that twenty-letter area. Third, the program wrote out that area, immediately
followed by the complete "card image," as a connected tape record. When
all the words from a given card had either been written on to tape, the
"card image" being repeated as many times as
non-common words appeared on the card, or eliminated as common words,
the next card was read, and the process repeated.
The second computer program was designed to sort the connected
tape record. The record was sorted, alphabetically, first according to the
individual words and second according to the poem title abbreviation. A
third sorting according to line number completed this program. The entire
sorting process was performed by making use of a standard program written
for general sorting requirements.
The third, and last, computer program was designed to transfer the
sorted tape record on to cards in a format suited to the printing
requirements of the concordance. At this stage a second common word list
was introduced. (Personal pronouns and all forms of the verbs
have and be comprised this list. Our reasons
for
desiring this list will be explained later on.) Words contained in this list are
to be printed in a separate listing, identified by poem title abbreviation and
line number only, instead of being displayed in an entire line of text. The
sorted tape record, therefore, was punched out in such a manner that three
card formats were produced: first, the individual word by itself (i.e., a
headword), second, the original line of verse taken from the copy text, and,
third, the display of numerical references. Thus 36,831 cards were
produced and stacked in an order which would produce, as they were run
through a printer, the final concordance.
After Mr. Bacon had devised that program he supervised the actual
run of the entire process, from the first step of loading our initial deck to
the last step of stacking the 36,831 cards for the printer. The IBM 7070 is
an extremely rapid machine. The entire operation took but a very short
time. The first phase, loading our initial deck of 6712 cards and
transferring our text to magnetic tape, required but 25 minutes. The second
phase, the entire sorting process, required 45 minutes. The last phase, the
punch-out of the sorted tape record, required 4 hours. The actual machine
time, then, from initial loading to stacking of the punched output in proper
order for printing, was just 5 hours and 10 minutes. The printer was a bit
slower; it required 8 hours to print the whole concordance. The entire
operation thus required 13 hours and 10 minutes. We very carefully
gathered up the continuous sheet which bore our project, separated the roll
into sheets, stacked and bound them, and
carried them off, hopefully, after a rapid proofreading, to send the
concordance, with some slight introduction prepared conventionally, to be
reproduced in an offset process by the university press.
At this happy time we encountered our largest disappointment. Had
our poems been written in twentieth century English our first printout,
without doubt, could have been reproduced, just as Mr. Parrish described
it for the Arnold concordance. But ours are fourteenth century poems, and
they were not written in twentieth century English. It is almost, looking
back at it now, as if some huge irony were on purpose lodged within our
texts, as if those old poets were now able to confound us, to dare us to
reduce their art to a modern machine manageable format. Mr. Parrish and
his associates were dismayed, he tells us, to see AAR and AARAU as the
first two items in their concordance. As we looked over our product we
were shocked. The more carefully we searched, the greater was our
concern. We saw chaos.
It seems best to us to produce for examination here a sample of our
first print out. In this fashion we can describe the true nature of the
technological problems we now faced. Also, most of our principles, our
feeling of what this concordance should be, can be induced from the very
data we worked with. Picked quite at random, here is one sheet of our first
print out.
DEUOUTLY |
|
7741 |
HIS TWO DERE DO3TERE3 DEUOUTLY HEM HAYLSED
...CLN |
814 |
7742 |
DEUOYDE |
|
7743 |
THAT WONT WAT3 WHYLE DEUOYDE MY WRANGE
...PRL |
15 |
7744 |
DEUOYDES |
|
7745 |
DEUOYDES VCHE A VAYNEGLORIE THAT VAYLES SO
LITELLE ...ERK |
348 |
7746 |
DEUOYDIT |
|
7747 |
AND DEUOYDIT FROM THE DOUTHE AND DITTE THE
DURRE AFTER ...ERK |
116 |
7748 |
AND DEUOYDIT FROM THE DEDE AND DITTE THE
DURRE AFTER ...ERK V |
116 |
7749 |
DEUYNE |
|
7750 |
AND SO DO WE NOW OURE DEDE DEUYNE WE NO FYRRE
...ERK |
169 |
7751 |
DEUYS |
|
7752 |
OF DIAMAUNTE3 A DEUYS ...GGK |
617 |
7753 |
DEUYSE |
|
7754 |
DERE SER QUOTH THE DEDE BODY DEUYSE THE I
THENKE ...ERK |
225 |
7755 |
THE DERTHE THEROF FOR TO DEUYSE ...PRL |
99 |
7756 |
I HOPED THE WATER WERE A DEUYSE ...PRL |
139 |
7757 |
WYTH THE MYRYESTE MARGARYS AT MY DEUYSE
...PRL |
199 |
7758 |
DEUYSED |
|
7759 |
ER DALT WERE THAT ILK DOME THAT DANYEL
DEUYSED ...CLN |
1756 |
7760 |
AS JOHN DEUYSED 3ET SA3 I THARE ...PRL |
1021 |
7761 |
DEUYSEMENT |
|
7762 |
I KNEW HIT BY HIS DEUYSEMENT ...PRL |
1019 |
7763 |
DEUYSE3 |
|
7764 |
AS DEUYSE3 HIT THE APOSTEL JHON ...PRL |
984 |
7765 |
AS DERELY DEUYSE3 THIS ILK TOUN ...PRL |
995 |
7766 |
DEUYSIT |
|
7767 |
THE DENE OF THE DERE PLACE DEUYSIT AL ON FYRST
...ERK |
144 |
7768 |
DEVAYE |
|
7769 |
3IF ANY WERE SO VILANOUS THAT YOW DEVAYE
WOLDE ...GGK |
1497 |
7770 |
DEVISED |
|
7771 |
THER PRYUELY IN PARADYS HIS PLACE WAT3 DEVISED
...CLN |
238 |
7772 |
DEVOYDE |
|
7773 |
WYTH ALLE THISE WY3E3 SO WYKKE WY3TLY DEVOYDE
...CLN |
908 |
7774 |
DEVOYDYNGE |
|
7775 |
IN DEVOYDYNGE THE VYLANYE THAT VENKQUYST HIS
THEWE3 ...CLN |
544 |
7776 |
DEVYSE |
|
7777 |
WEL CLANNER THEN ANY CRAFTE COWTHE DEVYSE
...CLN |
1100 |
7778 |
DEVYSED |
|
7779 |
DANYEL IN HIS DIALOKE3 DEVYSED SUMTYME
...CLN |
1117 |
7780 |
HE DEVYSED HIS DREMES TO THE DERE TRAWTHE
...CLN |
1604 |
7781 |
DEW |
|
7782 |
THAT ALLE WAT3 DUBBED AND DY3T IN THE DEW OF
HEUEN ...CLN |
1688 |
7783 |
DEWE |
|
7784 |
WHEN THE DONKANDE DEWE DROPE3 OF THE LEUE3
...GGK |
519 |
7785 |
DEWOUTLY |
|
7786 |
BOT I DEWOUTLY AWOWE THAT VERRAY BET3 HALDEN
...PAT |
333 |
7787 |
DEWOYDE |
|
7788 |
DEWOYDE NOW THY VENGAUNCE THUR3 VERTU OF
RAUTHE ...PAT |
284 |
7789 |
DEWYNE |
|
7790 |
I DEWYNE FORDOLKED OF LUFDAUNGERE ...PRL |
11 |
7791 |
I DEWYNE FORDOKKED OF LUFDAUNGERE ...PRL
2 |
11 |
7792 |
DERE |
|
7793 |
THAT DRY3TYN FOR OURE DESTYNE TO DE3E WAT3
BORNE ...GGK |
996 |
7794 |
DE3EN |
|
7795 |
WHAT THAY BRAYEN AND BLEDEN BI BONKKE3 THAT
DE3EN ...GGK |
1163 |
7796 |
DE3TER |
|
7797 |
HOW THE DE3TER OF THE DOUTHE WERE DERELYCH
FAYRE ...CLN |
270 |
7798 |
I HAF A TRESSOR IN MY TELDE OF TWO MY FAYRE
DE3TER ...CLN |
866 |
7799 |
THO WERN LOTH AND HIS LEF HIS LUFLYCHE DE3TER
...CLN |
939 |
7800 |
LOTH AND THO LULYWHIT HIS LEEFLY TWO DE3TER
...CLN |
977 |
7801 |
THE THRE LEDE3LENT THERIN LOTH AND DE3TER
...CLN |
933 |
7802 |
Notice, in the first place, the run of numbers at the right margin.
These numbers, 7714-7802, identify the position of the 62 cards of our total
deck of 36,831 cards, placed in the order shown, which carried that portion
of the first print out reproduced here. They are concordance line numbers.
In the final off-set printing, these numbers, suppressed, will not appear.
Their utility to us at this stage of our work, as will soon be evident, was
enormous.
Actually, we faced the problem of orthographical variants as well as
that of Middle English versus Modern English headwords
long
before we had our first print out of the entire concordance. We at the start
obtained a print out of the complete lexicon, and, working with it and our
copy text, attempted to devise a system of cross referencing which would
allow to retain all headwords in their Middle English form. But cross
referencing became so complicated that we were forced to give it up; users
of the concordance so arranged would find it unnecessarily complex. We
decided then that we should have to use some Modern English headwords.
As we shall point out shortly, what
we had to do was no less than to print out a concordance (our original
complete print out) in order to make our final concordance. We explored
the possibility of producing a program which would permit the computer to
gather forms together and shift them from their original position to some
other position in the final ordering of all forms, but were unable to produce
such a program. We were obliged to proceed along other lines.
Suppose, now, that this 62 line sample concordance were the final
version of this portion of the whole concordance. And suppose, further, that
a reader entered the concordance at the headword DEWYNE. How would
he know that he should find, 421 lines further along, roughly eight pages
in the printed book, a variant form, DOWYNE? What instructions ought we
provide? Or, earlier in our work, ought we to have spotted the DEWYNE
— DOWYNE variation and regularized all three occurrences to
DEWYNE? (It would have been a simple matter, actually, because the
excellent glossary in Gordon's Pearl does, on p. 127, link
these
forms.) Such a procedure, carried out consistently in our five poems,
however, would have been an impossible task; thousands of forms would
have had to be changed and our texts, as well, would have been seriously
violated. Moreover, we would have produced a concordance to a text
unknown to anyone. Hence, we did not regularize spelling. Ought we to
have chosen a
Modern English headword and printed both forms under it? The Middle
English form is derived from Old English dwīnan,
meaning
"languish, pine away." We do not use Modern English headwords unless
the Modern English development of the Middle English form closely
resembles the Middle English form. It would not do, in this case, to use
LANGUISH as a headword for these forms (the poets did not know that
word), and there is no Modern English form *DEVINE. We chose to retain
DEWYNE as headword, and to print all three occurrences of it here, these
two occurrences, concordance line numbers 7791 and 7792, and
concordance line number 8212, which contains the form DOWYNE. Where
DOWYNE falls out in the concordance as a headword, at present as
concordance line number 8211, we have replaced that card with a new card
which carries this format: DOWYNE (V. DEWYNE). We therefore make
it certain for the reader that he indeed finds the three occurrences of the
Middle English forms which equate
with Modern English "languish."[12]
It might appear, at this juncture, that our simplest solution would
have been the elaborate cross reference key we attempted earlier to devise.
We could have retained all Middle English headwords, and directed
attention to other places in the concordance where variant forms of
semantically similar items occurred. Very carefully we started through our
first print out and marked vide references after those
headwords
for which variant forms occurred. The very first entry on the print out was
ABATAYLMENT. It was marked to read: ABATAYLMENT (V.
BATELMENT). That procedure was designed to carry a reader from the
first line to concordance line number 1975, where he would see
BATELMENT. BATELMENT, of course, had to be changed to read:
BATELMENT (V. ABATAYLMENT). After some time we gave it up. Not
only did it turn out that we should have to mark well over half of our total
stock of headwords, but it turned out also, in some cases, that we had to
cite as many as fourteen vide
references for a single headword, and thus to mark each of those fourteen
co-occurring variant forms. Moreover, it proved impossible always to
indicate which form constituted the principal entry. Had we persisted, the
result would have been an inefficient concordance, if not an inadequate one.
Users would have been forced to turn back and forth so frequently that they
well might have concluded that such searching was not worth the
effort.
We therefore concluded that the concordance had to be rearranged so
that it would present the sorted word stock of our five poems in the most
convenient and useful form for the type of reader we imagined would most
want to consult it. We decided, then, that forms which were phonemically
and lexically alike but graphemically different had to be drawn together
under a common headword. For example, we have grouped under the
Modern English headword CHRIST all occurrences of CRIST, CRISTE,
CRYST, KRYST, and KRYSTE. Forms which are lexically alike but are
morphemically or morphophonemically different are distinguished. We
count DEUEL and DEUELE3 as two words, and we count GODE (or
GOOD, GOODE, GOUD, GOUDE), BETTER, BEST (or BESTE) as three
words. In short, all lexical variants are distinguished (the chief aim of a
concordance) and, at a second level of distinction, within the major form
classes all significant structural differences are distinguished. Thus: Nouns
— singular and plural
forms are listed separately, but case is ignored, especially, lacking an
apostrophe, since genitive singulars and plurals cannot be distinguished
from other case singulars and plurals;[13] Verbs — the five
paradigmatic forms
are listed separately, i.e., infinitive, 3d sing. pres. indic., pret., p. partic.,
and pres. partic., with no distinction made between pret. and p. partic.
forms unless a distinctive difference exists, as, in Modern English, say,
between RODE and RIDDEN; Adjectives — the stem, comparative
degree, and superlative degree are listed separately; Adverbs and all other
Form Classes — these presented no especial problems.
We have, here and there, already said a few things about the choice
of headwords. In general, we find Parrish's description of the matter quite
sound, and his solution for a text like ours reasonable. We have, however,
made one significant change. Parrish suggests that the ". . . optimum
concordance to an early text is of a third type, in which the lines of verse
are given in their original form but index words are
modernized . . . ." (p. 10.) Our lines of verse are in their original form,
but we did not modernize all headwords (index words). Usefulness to the
reader was again in our minds, and consequently we modernized a
headword only when the Middle English form closely resembles the
Modern English form. We retained the Middle English form when no
closely similar Modern English form exists. Where modernized, the Middle
English form will not occur at all as a headword; e.g., from the sample
page we have reproduced here, the headword DEWOUTLY, concordance
line 7787, occurs, but in our revised concordance that form will not appear
as a headword, and, instead, under the Modern English form DEVOUTLY,
which will be inserted in place of the present concordance line 7741,
concordance lines 7742 and 7787 will be listed. From the list of words
which appear on this same sample page, only three — DEVAYE,
DEW,
and DEWYNE — are retained in their Middle English forms; DEW
also
happens to be the
Modern English form, but "deny" does not at all look like DEVAYE, and
DEWYNE had to be treated as we have already described it. There is a
third choice. Some Middle English words do not closely resemble their
Modern English counterparts but do tend to suggest them. Words of this
class are retained as headword entries, but citations are not printed beneath
them. Instead, such a headword will carry a
vide reference
after
it, that reference being a Modern English form, and under the Modern
English form the citations will be situated. From this sample page, two such
entries are indicated: DE3E will read DE3E (V. DIE) and DE3TER will
read DE3TER (V. DAUGHTER), and concordance lines 7794 and 7796
(since DE3EN is an unlevelled form of DE3E) will be located under DIE,
which will follow DID, line 7833, while the five DE3TER citations will be
spotted under DAUGHTER, to be relocated some 650 lines back,
immediately after concordance line 7059, a line containing the form
DAUBE. In
short, we derived three types of headwords: the Middle English form, the
Modern English form, and the "in between" Middle English
vide Modern English form. With that in mind, it is clear to
see
how we established, to finish with the forms on this sample page, the
paradigms
DEVOID—DEVOIDS—DEVOIDED—DEVOIDING
and
DEVISE—DEVISES—DEVISED. The sources of our
headwords
DEVOUTLY, DIVINE, and DEVICE are likewise clear. And all the forms
on this sample will be resituated as we have explained it.
All this activity, with some decision called for at every entry in the
concordance, with uncommon persistence required to uncover all the
alternates and variant spellings (even making sure, to give another
example, to search out apathetic forms—thus we chose to take
concordance line 2030, BAYST, and relocate it as line 1, as Modern
English ABASHED), with a gnawing frustration but ever growing
knowledge of our texts, and respect for them, attending it, all this
constituted our "post-machine" editing. From the original print out, then,
we made the necessary changes in headwords, and using the line numbers
assigned to the entries, we rearranged the concordance. As we have said,
we discussed the possibility of writing a program which would enable the
computer to rearrange the concordance, but were unable to produce that
program. In any case, it seems clear that we should have had to produce the
changes in headwords and the instructions for rearrangement by hand, as
we did, before a complex program could be written. It took two people 200
hours to do that work by hand—a staggering number of hours
compared
with what the computer was able to do in less than 14 hours. At this
moment we
see no other alternative. It is another problem for all concorders of Old
English or Middle English texts.
Relocating the out put deck of 36,831 cards has proved to be an
equally arduous task. Since thousands of operations are involved, it was
apparent that we could not count on the IBM 7070 to perform the work. To
switch card and concordance line number 2030 to line number 1, for
example, would require a distinct program, simple to be sure, but costly.
To sort out the line numbers 16575, 16576, 16463 . . . 5013, 16582, 16583
. . . 16592-98 . . . 508, 16465 (40 occurrences, in all, of what we list
under Modern English CAST) and move this entire newly arranged unit so
that it is spotted after what is at present line number 5011 would require
another distinct program, a more complicated one, also costly. With these
two instances expanded to thousands, it should be clear why the computer
is ill suited for that assignment. Perhaps in the future an extremely
sophisticated program might be devised to perform that kind of operation
efficiently. At the present time it is not possible. We are
completing this arduous work manually.
The utility of the concordance line numbers will now be apparent. A
simple program was devised to print the matching concordance line number
on each of our 36,831 out put cards. That entire deck now is so marked;
i.e., each punched card now carries the same number on it as the line
number it represents. Thus an individual card, run through the printer some
time ago, and which generated a line of our original print out, can now be
picked out from all the other 36,830 out put cards. All our "post machine"
editing was performed right on the original print out sheets. The
instructions are quite clear. They
go along like this: place card 2030 in position 1; next place card 1952, then
2028, then 1, followed by 2, 3, 6, 9, 7, and so on. Nimble fingers and a
patient eye have taken over the work. It is not yet completed, but the end
is in sight. When new headword cards have been punched and inserted at
the proper places (i.e., those instances where we did not retain the Middle
English headword) the freshly ordered deck of cards will be ready for
machine printing. The print out of that deck should constitute our final
concordance, and it, after proofreading, will be reproduced in an offset
printing.
A few minor problems occurred. We had, for example, to examine
each of the 2937 occurrences of THE to pick out the 146 instances where
the form equates with the pronoun "thee." The other 2791 occurrences were
not concorded. So, too, did we examine the 687 occurrences of ON to pick
out the 34 instances where it equates with "one." We wanted to get a
separate listing of all pronouns, and therefore all occurrences of THE and
ON had to be examined.
Readers of Middle English will, we think, find one of the appendices
to our concordance especially useful. All occurrences of pronouns,
sometimes missing from concordances or scattered alphabetically throughout
the concordance, and all forms of BE and HAVE (as well as BE and HAVE
plus a negative particle) are collected in one listing. Here the reader will
find these forms concorded by poem title and poem line number, but not
with line of verse citations. The structure and syntactical behavior of these
forms are particularly interesting in Middle English. We have made it
possible for a reader to locate very quickly every occurrence of these forms
in our five poems. Two other appendices should also be useful. We have
provided a list of headwords (here entirely in Middle English) in order of
the frequency of their occurrence. Parrish's statement of this feature of the
Cornell Concordances (p. 12) receives our wholehearted support. It is good
to know, very quickly, what words a
poet chooses to use more frequently than other words. It is more than
merely 'good to know' that; here is an avenue to insight. To use the Arnold
concordance itself, for example, it was something of a surprise to discover
that Arnold used the word day more frequently than the word
night, for, influenced no doubt by admiration of "Dover
Beach," it somehow seemd that it should have turned out the other way
around. The frequency list of our concordance will no doubt be put to good
use. We also thought it would be helpful (our last appendix) to list all the
variant lines we uncovered in our collation of the texts we used. Read
against each base text we used, this list, sorted according to poem titles,
will disclose where
variations exist in the other editions we employed. The 9 variant lines of
PAT, for example, are, by line number only, disclosed at a glance.
There is no more. I want now to return to my own voice. Mr. Kottler
likes to think of our work as a computer concordance shaped by man. It is
hard to imagine a better way to put it. Surely, as this record is reviewed,
it will occur to anyone to ask why in the world did they use the 7070 in the
first place. We used it, after our "pre-machine" textual work was
completed, work to be done in any case no matter what technique might
afterwards be employed to evolve the concordance, because the IBM 7070
gave us the original print out in a day. It might have taken us a year to do
it manually. And, most important of all, we hardly realized what we had
to contend with until we had that print out. Unless a corps of co-workers
is constituted to prepare a concordance, as Parrish describes the work of the
Dante Society of America (p. 1), the machine is indispensible. As I hear of
new techniques, particularly those which eliminate the necessity for using
punched cards to transfer a text to
magnetic tape, or those which skillfully have made use of extremely
sophisticated programs, programs designed to handle enormous numbers of
variables, and some of these programs aimed squarely at human discourse,
oral or written, and not just numbers or other symbols more readily
amenable than phonemic or graphemic structures to the operation of binary
arithmetic, I feel sure that in the future a concordance such as ours could
be produced much more efficiently than we have managed it.
If we have not been efficient, we have been reliable. I have been
reading Lane Cooper's delightful account of the evolution of his
Concordance of Wordsworth, "The Making and the Use of a Verbal
Concordance."[14] Naturally, I
wondered how he might have described what we have been doing with our
concordance. I can also recall the day when I should have looked elsewhere
for entertaining reading— "An undergraduate student once assured
me
that the word God was rare in the writings of Wordsworth;
he
had heard so in a lecture. It occurs 274 times in the poems of that author.
. . ." (p. 21) Indeed. I doubt that this account of our concordance is that
entertaining. I also am not sure that it will prove to be "the gift of Hermes
to Apollo" (p. 19). But I am sure that we have produced a reliable
concordance to five important Middle English poems. I am sure, too, that
students of Middle English poetry will find it extremely useful. I am sure,
again,
that two mediævalists who scarcely knew one another before they started
the work have learned a great deal about these poems, about
Middle English, and about the behavior of a sophisticated, at many times
a very delicate, language. I am sure, at last, that Mr. Kottler is anxious to
return to Boethius, to later mediæval commentaries on Boethius, to
Chaucer, and, I suspect, to the ordinary human race. I too find myself
looking away, to Beowulf, to Gawain, and to Sir Thomas Malory, Kt.,
who, at this very instant, prods me with his usual good words—"here
is
the ende."
Notes