University of Virginia Library

Search this document 
  
  
  
  

expand section 
expand section 
expand section 
  
expand section 
collapse section 
  
  
The Cusum Technique
  
  
  
  
  
expand section 
expand section 
expand section 
  
expand section 
expand section 
  
  
expand section 
expand section 
  
expand section 
  
  
  
  
  
expand section 
  

The Cusum Technique[14]

This would be a short-lived joy since cusum analysis could actually be
carried out—if more slowly—using an abacus, or pencil and paper, instead
of a computer. Nevertheless, as one who is literary-critical by both training
and inclination, the present writer has a large measure of sympathy with the
dilemma of literary sceptics towards numerical studies, although that sympathy
may be qualified by the observation that the computer is only a mechanical
aid, and that there can be no objection to counting as such.[15] Consider
discussion of the "number" of questions in Macbeth or the "preponderance"
of disease imagery in Hamlet.

Cusum analysis has been used since 1990, both for studies in literary attribution
and also in a forensic setting. It is the invention of A. Q. Morton,
Fellow of the Royal Society of Edinburgh and a retired Minister in the
Church of Scotland, whose life-time research has been devoted to developing
an objective, scientific method of authorship attribution, one capable of
being independently verified.[16]

The virtue of the method is that it is an attributive measure applicable
to utterance of all kinds irrespective of date of composition or genre, whether
in speech or writing, and has been widely tested. Attribution studies have
been made on a short story newly found and attributed (initially) to D. H.
Lawrence; to new essays by Henry Fielding; to the disputed Famine Diary,
by Gerald Keegan
purported to have been written on board a Famine Ship
in 1845; and most recently to a newly-found poem "The Barberry Tree", the
conclusion of joint-authorship by Coleridge and Wordsworth being precisely


161

Page 161
identical to that arrived at by literary scholarship. It has been used on the
utterance of children, including a study of Helen Keller from her first year of
language-use at the age of eight.[17]

It works for me and will work for you too (unless you prove to be the
very first exception to its application). As a forensic tool, cusum analysis has
been used in cases in England and Ireland, including the highest courts—the
Appeal Courts in London and Dublin, the Central Criminal Court, Dublin,
and the Old Bailey.[18] So how does it work? Basic to the method is that a sophisticated
analysis of language for attribution purposes must be based on
regular and recurrent usage which is very frequent while also being unconscious
to the user.
What has always been needed is a method simple in principle
and reliable in results, and, in cusum analysis, such a method has become
available. The identification of authorship has been found to lie mainly in
the small function words, usually of two, three or four letters, with which
sentences are structured. Obviously, spelling is conventional and has differed
over time, but the syntactic features analysed remain remarkably stable.

In trying to understand why cusum analysis "works" with function words,
it is reasonable to think about an individual's total vocabulary. As any child's
reading scheme will confirm, this may be divided into sections for frequency
of usage. Twenty-five per cent of the time, language use in English consists
of the repetition of very few words. One common scheme gives a mere twelve
words (a, and, he, I, in, is, it, of, that, the, to, was,) followed by another 20
words for up to 35 per cent of normal utterance.

It may be asked how reliable such lists of "most frequent" words may be.
Concordances of authors usually show that these twelve words appear at, or
near the top, of the most-frequent-words lists. A comparison of the lists for
Henry Fielding's novel Joseph Andrews (written in 1741) and for Dylan
Thomas's Collected Poems (1950) yields a remarkable similarity. These two
authors were writing very different kinds of work—novel and modern poetry—separated
by two hundred and fifty years of England language usage;
yet both revealed high-frequency lists of near identical words (the, and, of, in,
I, a, to, you, my, is, that, he
were among their top twelve; compare the twelve
most-frequently used words in Shakespeare's total corpus: the, and, I, to, of,
a, you, my, that, in, is, not
—the overlap is obvious).

This surely confirms the usefulness of using these vocabulary items for
recognising authorship. Half the time we speak, we are using over and over
again the same function words; two-thirds of the time we speak, we are using
a total of about two hundred and fifty words, which constitute only a tiny
proportion of our total vocabulary. From the remaining thousands of words,
which constitute only a tiny proportion of our total vocabulary. From the
remaining thousands of words, we select those we need to convey content and
meaning, or semantics.

 
[14]

See Jill M. Farringdon et al., Analysing for Authorship (Cardiff: Univ. of Wales
Press, 1996), and a brief introduction at <http://members.aol.com.qsums>.

[15]

Jillian Farringdon and Michael Farringdon, "Literature and Computers", Poetry
Wales,
17.1 (Summer, 1981), 53-60.

[16]

Morton's purpose was to investigate the authorship of the New Testament, a task
which he has now completed (see The Making of Mark [Lewiston, NY: Mellen Press, 1996]
and The Gathering of the Gospels [Lewiston, NY: Mellen Press, 1997]).

[17]

See John Worthen, The Gang (New Haven: Yale Univ. Press, 2001), for attribution
of "The Barberry Tree"; and Farringdon et al., Analysing, for the other studies mentioned
here.

[18]

See Farringdon et al., Analysing, Chapters 8 and 9.