University of Virginia Library

Search this document 


  
  

collapse section 
 01. 
 02. 
 02. 
 03. 
 04. 
 05. 
 06. 
 07. 
collapse section 
 01. 
 02. 
 03. 
 04. 
 05. 
 06. 
 07. 
 08. 
 09. 
 10. 
 11. 
 12. 
collapse section13. 
 01. 
  
 02. 
 03. 
 04. 
 05. 
 06. 
 07. 
 08. 
 09. 
 10. 
 11. 
 12. 
 13. 
 14. 
 15. 
 16. 
 17. 
 18. 
 15. 
collapse section 
 01. 
 02. 
 04. 
 04. 
 03. 
  
collapse section 
 01. 
 02. 
 02. 
collapse section03. 
 01. 
 02. 
 03. 
 04. 
 02. 
collapse section 
 01. 
 02. 
 03. 
The Cusum Technique
 04. 
 05. 
 06. 
 07. 
 08. 
 09. 
collapse section 
 01. 
 01. 
 03. 
 04. 
collapse section 
 01. 
  
  
  
 05. 
collapse section 
 01. 
 02. 
 03. 
 04. 
collapse section 
 01. 
 02. 
 03. 
 05. 
collapse section 
 01. 
 02. 
 03. 
 04. 
 06. 
 07. 
collapse section08. 
 01. 
 02. 
 03. 
 09. 
collapse section 
 01. 
 02. 
 03. 
collapse section 
 01. 
 02. 
collapse section 
 01. 
 02. 
 03. 
 04. 
 05. 
 06. 
 07. 
collapse section 
 01. 
 02. 
 03. 
 05. 
 06. 
 07. 
 08. 
 09. 

  
collapse section 
 01. 
 02. 
 03. 
  
  
  
  
  
collapse section 
 01. 
 02. 
 03. 
 04. 
  

The Cusum Technique [14]

This would be a short-lived joy since cusum analysis could actually be carried out—if more slowly—using an abacus, or pencil and paper, instead of a computer. Nevertheless, as one who is literary-critical by both training and inclination, the present writer has a large measure of sympathy with the dilemma of literary sceptics towards numerical studies, although that sympathy may be qualified by the observation that the computer is only a mechanical aid, and that there can be no objection to counting as such. [15] Consider discussion of the "number" of questions in Macbeth or the "preponderance" of disease imagery in Hamlet.

Cusum analysis has been used since 1990, both for studies in literary attribution and also in a forensic setting. It is the invention of A. Q. Morton, Fellow of the Royal Society of Edinburgh and a retired Minister in the Church of Scotland, whose life-time research has been devoted to developing an objective, scientific method of authorship attribution, one capable of being independently verified.[16]

The virtue of the method is that it is an attributive measure applicable to utterance of all kinds irrespective of date of composition or genre, whether in speech or writing, and has been widely tested. Attribution studies have been made on a short story newly found and attributed (initially) to D. H. Lawrence; to new essays by Henry Fielding; to the disputed Famine Diary, by Gerald Keegan purported to have been written on board a Famine Ship in 1845; and most recently to a newly-found poem "The Barberry Tree", the conclusion of joint-authorship by Coleridge and Wordsworth being precisely


161

Page 161
identical to that arrived at by literary scholarship. It has been used on the utterance of children, including a study of Helen Keller from her first year of language-use at the age of eight.[17] including the highest courts—the Appeal Courts in London and Dublin, the Central Criminal Court, Dublin, and the Old Bailey.[18] So how does it work? Basic to the method is that a sophisticated analysis of language for attribution purposes must be based on regular and recurrent usage which is very frequent while also being unconscious to the user. What has always been needed is a method simple in principle and reliable in results, and, in cusum analysis, such a method has become available. The identification of authorship has been found to lie mainly in the small function words, usually of two, three or four letters, with which sentences are structured. Obviously, spelling is conventional and has differed over time, but the syntactic features analysed remain remarkably stable.

In trying to understand why cusum analysis "works" with function words, it is reasonable to think about an individual's total vocabulary. As any child's reading scheme will confirm, this may be divided into sections for frequency of usage. Twenty-five per cent of the time, language use in English consists of the repetition of very few words. One common scheme gives a mere twelve words (a, and, he, I, in, is, it, of, that, the, to, was,) followed by another 20 words for up to 35 per cent of normal utterance.

It may be asked how reliable such lists of "most frequent" words may be. Concordances of authors usually show that these twelve words appear at, or near the top, of the most-frequent-words lists. A comparison of the lists for Henry Fielding's novel Joseph Andrews (written in 1741) and for Dylan Thomas's Collected Poems (1950) yields a remarkable similarity. These two authors were writing very different kinds of work—novel and modern poetry—separated by two hundred and fifty years of England language usage; yet both revealed high-frequency lists of near identical words (the, and, of, in, I, a, to, you, my, is, that, he were among their top twelve; compare the twelve most-frequently used words in Shakespeare's total corpus: the, and, I, to, of, a, you, my, that, in, is, not—the overlap is obvious).

This surely confirms the usefulness of using these vocabulary items for recognising authorship. Half the time we speak, we are using over and over again the same function words; two-thirds of the time we speak, we are using a total of about two hundred and fifty words, which constitute only a tiny proportion of our total vocabulary. From the remaining thousands of words, which constitute only a tiny proportion of our total vocabulary. From the remaining thousands of words, we select those we need to convey content and meaning, or semantics.