University of Virginia Library

Search this document 


expand section 
expand section 
expand section 
expand section 
collapse section 
expand section4. 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 

expand section 

Normalisation Model

Explanation: This is not a model illustrating the descending significance or the statistical incidence of actual forms, but rather a logical arrangement of "levels" of normalisation, based upon the three major sources for producing the accidentals of the edited text—the HOCCLEX concordance (including a reverse lexicon, on which see below), the NORMLEX "special" dictionary of normalised forms not extant in HOCCLEX, and the copy-text, BL MS. Arundel 38. Furthermore, it is primarily a model for the likely editorial treatment of individual word-forms (or more properly word-form types), not for a system of accidence —either Hoccleve's or that of the scribe of Arundel. It is possible that such a system could be constructed from the evidence lying behind the model, but our unit of comparative data is first always the specific form as it occurs in a specific place or places in the text (i.e., only later, usually through NORMLEX or the reverse lexicon, is a formal morphological pattern extending beyond the immediate evidence of HOCCLEX transferred to the edited text). Similarly, the model—as it reflects our editorial method—is not a lexical study per se; that is, it is concerned with the morphology of the word as it occurs in the text, and not with its lexical identity or history. This is an important consideration in interpreting the model, for while a particular form of a word might not exist in one column, the word itself (in a related form) could very well be extant. This explains the apparent anomaly of level 12, where there is no entry in either the HOCCLEX or the COPY-TEXT columns, and yet there is a 100% entry in the NORMLEX column: i.e., the particular form (say, a third-person singular of a regular verb) happens not to exist in the texts from which HOCCLEX is derived, and the copy-text is either very ambiguous in its preferred forms for this verb in this specific inflected form, or uses an inflection which does not appear in HOCCLEX for this verb, or perhaps not for any verb. Nonetheless, it would be possible to construct the appropriate preferred Hocclevean inflection from HOCCLEX


Page 132
and to read that, without ambiguity, into NORMLEX—in an example of the occasional employment of the general principles of accidence as a secondary editorial activity, based on the evidence of HOCCLEX and NORMLEX together. Anomalies such as this notwithstanding, the model still functions primarily as a record of the specific form in the specific word, not of the putative degree of consistency in the idiolect as a whole. In fact, as is readily seen, an entry appears in the NORMLEX column only when there is no entry in the HOCCLEX column; that is, recourse to NORMLEX is taken only when a highly conservative use of HOCCLEX will not produce a well-attested form for the specific inflection (or root) needed. The 100% HOCCLEX forms would be automatically read into NORMLEX but would not need to be cited in editorial work, hence their not recurring in the NORMLEX column. This simply confirms the relatively greater authority of HOCCLEX over NORMLEX (even though, in this case, they carry identical data, so that no choice has to be made between them). The entire procedure is, of course, merely another (post-classical) occurrence of the basic principle of "analogy" as defined by the Alexandrian librarians, editors, and grammarians. One final caveat: since we have not yet created a complete concordance of all copy-text forms to parallel HOCCLEX for the holographs, any statistics cited in the COPY-TEXT column are inevitably less firm (but also less significant for a critical as opposed to a diplomatic edition) than those for HOCCLEX. Frankly, we are not convinced that such a concordance would be of any great value editorially (although it could be of use to palaeographers, philologists, and dialecticians). For although Arundel happens to obey Gregian requirements for copy-text as regards its accidentals, it is essentially being used as a vehicle to present comparative data for the recognition and, where necessary, the construction, of auctorial intentions.                  
1.  100% usage  = 100% usage  HOCCLEX[a] & COPY-TEXT 
2.  100% usage  = high usage  HOCCLEX[b]  
3.  100% usage  = indifferent  HOCCLEX[c]  
4.  100% usage  = low usage  HOCCLEX[d]  
5.  100% usage  HOCCLEX[e]  
6.  90%-99% usage  = 100% usage  HOCCLEX[f]  
7.  1%-90% usage  = 100% usage  HOCCLEX[g] or COPY-TEXT 
8.  100% usage  = 100% usage  NORMLEX[h]  


Page 133
9.  100% usage  = high usage  NORMLEX[i]  
10.  100% usage  = indifferent  NORMLEX[j]  
11.  100% usage  = low usage  NORMLEX[k]  
12.  100% usage  NORMLEX[l]  
13.  = 100% usage  COPY-TEXT[m]  
14.  indifferent  indifferent  indifferent  COPY-TEXT[n] or HOCCLEX or NORMLEX 


Page 134


Page 135
Within the theoretical parameters described by this ideal model for normalisation, the Hoccleve editors could now deal with the lexicon itself. Working with Gary Tobey, Charles Wilcox, and John Southard of the Computer Science department of Adelphi University, and using a Prime 800 computer and a FORTRAN language, Peter Farley entered the accidentals of the holograph manuscripts to produce the raw data from which the editorial implementation of normalisation theory could proceed. Initially, we had thought that John Fisher's computer-based analysis of Chancery English (which he very kindly made available to us on tape) might be used for morphological extension, in those circumstances where no paradigm could be discovered in the holographs; for Fisher's material (based on a selection of 90,000 words from public documents) was very much wider in scope than the slim corpus of the Hoccleve manuscripts (6143 word entries in the main Hoccleve concordance).

The Fisher concordance is presented in so-called KWIC (keyword-in-context) format, where the main entry occurs as the central word in a complete line of the print-out. This format is usually regarded as most suitable for prose and therefore is appropriate to Fisher's work on the Chancery documents. The KWIC format can be useful in verse also, especially where enjambment or the repetition of poetic formulae across the line-system is very common. But since Hoccleve generally prefers end-stopped lines, the line itself forms a natural unit, and Farley therefore used the KWOC format (keyword-out-of-context), whereby the main entry is keyed to a separate print-out of the text.[16]

While there are some obvious limitations to the HOCCLEX program, we believe these are comparatively minor, considering our editorial purposes, and we are confident that the evidence we now have will indeed produce the first authoritative computer-assisted text of the accidentals of a Middle English author. Our major innovation (at least in Middle English) is to store the evidence which can be used for the construction of an inflectional system or a paradigm of auctorially-preferred suffixes.[17] This is done very simply, by creating a reverse lexicon as well as the positive one. We did not use a syllabic break-down (perhaps one of our oversights), but we can achieve the same results from being able, say,


Page 136
to list all uses of particular letter-clusters. This can be very significant in confirming readings of e.g., a preferred double "ee" over single "e", where otherwise the statistical evidence would be limited to the particular word in question, which nonetheless remains the basic unit of the evaluation. We did not use syntactic codings, although we had developed approximately seventy-five codes for all the major Middle English patterns. The degree of conservatism in the Regement copying makes it unlikely that preferred syntactic or word-order choices would occur that might run counter to the evidence of copy-text. Of the "surface features", we did record punctuation as a separate element (although it is not introduced into the sample normalisation below, and in practice turns out to be largely restricted to the use of the mid-line virgule for caesura), but not capitalisation, which Farley had already analysed independently. Finally, using the dictionary program THE WORD PLUS on a KayPro II Plus 88 microcomputer, a special sub-lexicon (i.e., NORMLEX) has been written by the general editor, listing all forms which can be created by analogy and morphological extension based on HOCCLEX usage, but which do not actually appear in the HOCCLEX lexicon. Furthermore, by first reading the actual morphology of the HOCCLEX text (but not the incidence) into the "Main Dictionary" on THE WORD PLUS (and, of course, erasing the 45,000-word Modern English dictionary on the program and renaming NORMLEX to this Main Dictionary), it is then possible to run any part of the edited (and normalised) text through the microcomputer dictionary programs separately and therefore to establish immediately whether the final text is compatible with either part of the normalisation and, if not, whether this apparent incompatibility is merely the mark of a retained indifferent or unique form from the copy-text (e.g., levels 13 or 14 in the normalisation model) or is a genuine error of judgement by an editor. I should emphasise here that this technical checking of the normalisation results is not intended to suggest that the entire normalisation process is one merely manufactured by a coven of computers. The main-frame computer simply provides the raw data upon which the editorial decisions are based (decisions obviously supported by statistical information), and the microcomputer makes some of this information more readily accessible to the other editors; the dictionary programs do not create the normalisation—they merely record it and check the final text against it.

This is what we found. There is in the Hoccleve holographs a quite remarkable degree of consistency in accidentals, much more so than in Fisher's Chancery English (thereby rendering that concordance, despite its much wider data-base, of less value to us than we had hoped) and


Page 137
perhaps more so than in any other English author before the eighteenth century. Hoccleve was obviously a good bureaucrat.

The first trial sections of normalisation suggested that more than two-thirds of the copy-text forms would remain undisturbed, thereby confirming the status of Arundel as a reliable copy-text. As editors, we could now distinguish at this practical level of normalisation three lexical types of relationship between copy-text and the holographs, types in which their representation of the actual conditions of those documents could usually be reassuringly linked to one or other of the theoretical levels delineated in the Normalisation Model.