Normalisation Model
Explanation: This is not a model illustrating the descending
significance
or the statistical incidence of actual forms, but rather a logical arrangement
of "levels" of normalisation, based upon the three major sources for
producing the accidentals of the edited text—the HOCCLEX
concordance
(including a reverse lexicon, on which see below), the NORMLEX
"special"
dictionary of normalised forms not extant in HOCCLEX, and the copy-text,
BL MS. Arundel 38. Furthermore, it is primarily a model for the likely
editorial treatment of individual word-forms (or more properly word-form
types), not for a system of accidence —either Hoccleve's or that of
the
scribe of Arundel. It is possible that such a system could be constructed
from
the evidence lying behind the model, but our unit of comparative data is
first always the specific form as it occurs in a specific place
or places
in the text (i.e., only later, usually through NORMLEX or the reverse
lexicon, is a formal morphological
pattern extending beyond the immediate evidence of HOCCLEX transferred
to the edited text). Similarly, the model—as it reflects our editorial
method—is not a lexical study per se; that is, it is
concerned with
the morphology of the word as it occurs in the text, and not with its lexical
identity or history. This is an important consideration in interpreting the
model, for while a particular form of a word might not exist
in one
column, the word itself (in a related form) could very well be extant. This
explains the apparent anomaly of level 12, where there is no entry in either
the HOCCLEX or the COPY-TEXT columns, and yet there is a 100%
entry
in the NORMLEX column: i.e., the particular form (say, a third-person
singular of a regular verb) happens not to exist in the texts from which
HOCCLEX is derived, and the copy-text is either very ambiguous in its
preferred forms for this verb in this specific inflected form, or uses an
inflection which does not appear in
HOCCLEX for this verb, or perhaps not for any verb.
Nonetheless, it
would be possible to construct the appropriate preferred Hocclevean
inflection from HOCCLEX
and to read that, without ambiguity, into NORMLEX—in an example
of
the occasional employment of the general principles of accidence as a
secondary editorial activity, based on the evidence of HOCCLEX and
NORMLEX together. Anomalies such as this notwithstanding, the model
still functions primarily as a record of the specific form in the specific
word,
not of the putative degree of consistency in the idiolect as a whole. In fact,
as is readily seen, an entry appears in the NORMLEX column
only
when there is no entry in the HOCCLEX column; that is, recourse to
NORMLEX is taken only when a highly conservative use of HOCCLEX
will
not produce a well-attested form for the specific inflection (or root) needed.
The 100% HOCCLEX forms would be automatically read into NORMLEX
but would not need to be cited in editorial work, hence their not recurring
in
the NORMLEX column. This simply confirms the relatively greater
authority
of HOCCLEX over NORMLEX (even though, in this case, they
carry identical data, so that no choice has to be made between them). The
entire procedure is, of course, merely another (post-classical) occurrence
of
the basic principle of "analogy" as defined by the Alexandrian librarians,
editors, and grammarians. One final
caveat: since we have
not yet
created a complete concordance of all copy-text forms to parallel
HOCCLEX for the holographs, any statistics cited in the COPY-TEXT
column are inevitably less firm (but also less significant for a critical as
opposed to a diplomatic edition) than those for HOCCLEX. Frankly, we are
not convinced that such a concordance would be of any great value
editorially (although it could be of use to palaeographers, philologists, and
dialecticians). For although Arundel happens to obey Gregian requirements
for copy-text as regards its accidentals, it is essentially being used as a
vehicle to present comparative data for the recognition and, where
necessary, the construction, of auctorial
intentions.
|
HOCCLEX |
NORMLEX |
COPY-TEXT |
EDITED TEXT |
1. |
100% usage |
- |
= 100% usage |
HOCCLEX[a] &
COPY-TEXT |
2. |
100% usage |
- |
= high usage |
HOCCLEX[b]
|
3. |
100% usage |
- |
= indifferent |
HOCCLEX[c]
|
4. |
100% usage |
- |
= low usage |
HOCCLEX[d]
|
5. |
100% usage |
- |
- |
HOCCLEX[e]
|
6. |
90%-99% usage |
- |
= 100% usage |
HOCCLEX[f]
|
7. |
1%-90% usage |
- |
= 100% usage |
HOCCLEX[g] or
COPY-TEXT |
8. |
- |
100% usage |
= 100% usage |
NORMLEX[h]
|
9. |
- |
100% usage |
= high usage |
NORMLEX[i]
|
10. |
- |
100% usage |
= indifferent |
NORMLEX[j]
|
11. |
- |
100% usage |
= low usage |
NORMLEX[k]
|
12. |
- |
100% usage |
- |
NORMLEX[l]
|
13. |
- |
- |
= 100% usage |
COPY-TEXT[m]
|
14. |
indifferent |
indifferent |
indifferent |
COPY-TEXT[n] or
HOCCLEX or
NORMLEX |
Within the theoretical parameters described by this ideal model for
normalisation, the Hoccleve editors could now deal with the lexicon itself.
Working with Gary Tobey, Charles Wilcox, and John Southard of the
Computer Science department of Adelphi University, and using a Prime 800
computer and a FORTRAN language, Peter Farley entered the accidentals
of
the holograph manuscripts to produce the raw data from which the editorial
implementation of normalisation theory could proceed. Initially, we had
thought that John Fisher's computer-based analysis of Chancery English
(which he very kindly made available to us on tape) might be used for
morphological extension, in those circumstances where no paradigm could
be discovered in the holographs; for Fisher's material (based on a selection
of
90,000 words from public documents) was very much wider in scope than
the slim corpus of the Hoccleve manuscripts (6143 word entries in the main
Hoccleve concordance).
The Fisher concordance is presented in so-called KWIC
(keyword-in-context) format, where the main entry occurs as the
central word in a complete line of the print-out. This format
is
usually regarded as most suitable for prose and therefore is appropriate to
Fisher's work on the Chancery documents. The KWIC format can be useful
in verse also, especially where enjambment or the repetition of poetic
formulae across the line-system is very common. But since
Hoccleve
generally prefers end-stopped lines, the line itself forms a natural unit, and
Farley therefore used the KWOC format (keyword-out-of-context), whereby
the main entry is keyed to a separate print-out of the text.[16]
While there are some obvious limitations to the HOCCLEX program,
we believe these are comparatively minor, considering our editorial
purposes, and we are confident that the evidence we now have will indeed
produce the first authoritative computer-assisted text of the accidentals of
a
Middle English author. Our major innovation (at least in Middle English)
is
to store the evidence which can be used for the construction of an
inflectional system or a paradigm of auctorially-preferred suffixes.[17] This is done very simply, by
creating a reverse
lexicon as well as the positive one. We did not use a syllabic break-down
(perhaps one of our oversights), but we can achieve the same results from
being able, say,
to list all uses of particular letter-clusters. This can be very significant in
confirming readings of e.g., a preferred double "ee" over single "e", where
otherwise the statistical evidence would be limited to the particular word in
question, which nonetheless remains the basic unit of the evaluation. We
did
not use syntactic codings, although we had developed approximately
seventy-five codes for all the major Middle English patterns. The degree of
conservatism in the
Regement copying makes it unlikely that
preferred syntactic or word-order choices would occur that might run
counter to the evidence of copy-text. Of the "surface features", we did
record punctuation as a separate element (although it is not introduced into
the sample normalisation below, and in practice turns out to be largely
restricted to the use of the mid-line virgule for caesura), but not
capitalisation, which Farley had already analysed independently. Finally,
using the dictionary program THE WORD PLUS
on a KayPro II Plus 88 microcomputer, a special sub-lexicon (i.e.,
NORMLEX) has been written by the general editor, listing all forms which
can be created by analogy and morphological extension based on
HOCCLEX usage, but which do not actually appear in the HOCCLEX
lexicon. Furthermore, by first reading the actual morphology of the
HOCCLEX text (but not the incidence) into the "Main Dictionary" on THE
WORD PLUS (and, of course, erasing the 45,000-word Modern English
dictionary on the program and renaming NORMLEX to this Main
Dictionary), it is then possible to run any part of the edited (and
normalised)
text through the microcomputer dictionary programs separately and
therefore to establish immediately whether the final text is compatible with
either part of the normalisation and, if not, whether this apparent
incompatibility is merely the mark of a retained indifferent or unique form
from the copy-text (e.g., levels 13 or 14 in the normalisation model) or is
a
genuine error of
judgement by an editor. I should emphasise here that this technical checking
of the normalisation results is not intended to suggest that the entire
normalisation process is one merely manufactured by a coven of computers.
The main-frame computer simply provides the raw data upon which the
editorial decisions are based (decisions obviously supported by statistical
information), and the microcomputer makes some of this information more
readily accessible to the other editors; the dictionary programs do not create
the normalisation—they merely record it and check the final text
against
it.
This is what we found. There is in the Hoccleve holographs a quite
remarkable degree of consistency in accidentals, much more so than in
Fisher's Chancery English (thereby rendering that concordance, despite its
much wider data-base, of less value to us than we had hoped) and
perhaps more so than in any other English author before the eighteenth
century. Hoccleve was obviously a good bureaucrat.
The first trial sections of normalisation suggested that more than
two-thirds of the copy-text forms would remain undisturbed, thereby
confirming the status of Arundel as a reliable copy-text. As editors, we
could
now distinguish at this practical level of normalisation three lexical types
of
relationship between copy-text and the holographs, types in which their
representation of the actual conditions of those documents could usually be
reassuringly linked to one or other of the theoretical levels delineated in the
Normalisation Model.