University of Virginia Library

Some Modest Proposals

As I have already suggested, any quantitative attribution method that purports
to be valid must define terms precisely and use statistical concepts in an appropri-
ate manner. It should also offer a theoretical justification. But such conditions are
not sufficient. Proponents of quantitative methods must also rigorously attempt
to falsify their theories. At the hypothetical stage of testing, the question should
not be "does it work?" but rather "does it resist all reasonable attempts to make
it fail?" And so I raise these final points not in relation to QSUM—which can-
not be rescued—but in regard to other quantitative approaches that have been
offered and will likely continue to be offered.[15]

One simple falsification test would work as follows. Take the original textual
data set of homogeneous writing, omit a portion from that data set (perhaps an
entire work), then test the omitted portion against the remaining data set. Repeat
this test using a different omitted portion and repeat again and again until every
portion of the data set has been tested. Since this description might be confusing, I
offer this concrete example. Start with the complete plays of Shakespeare, exclud-
ing those plays believed to have been written in collaboration. Process these plays
as necessary, and then remove Romeo and Juliet from this data set. Now test Romeo
and Juliet
against the data set of fully Shakespearean plays (excluding Romeo and Ju-
). Does the method correctly identify Romeo and Juliet as Shakespeare's? Re-insert
Romeo and Juliet back into the data set, and remove a different play to test. Does
the method work for this play? If, after frequent attempts, the method never fails,
then one has a plausible hypothesis worth examining. This falsification test has the
added benefit of determining whether or not a particular method can fail to distin-
guish between works by the same author that employ radically different styles.


Page 286

But before going further, one should allow others to verify these falsification
attempts and possibly run further tests. The best way to do this, it seems to me,
would be to make available the raw texts as well as the data and formulas on a
publicly accessible website. Then others could download the tests and examine
the information for themselves.

Proponents of these kinds of methods should also provide their history of
failed attempts. There is no shame in conceiving a method and then finding out
on one's own that it does not work. A description of the process of trial and er-
ror should actually increase the reader's confidence in the author's drive to find
a reliable, objective method.

All methods have limitations, and proponents of attribution methods should
acknowledge and describe those limitations. For example: "I have successfully
tested this method on journalistic prose during the first half of the eighteenth
century, but have found it to be less reliable for poetry or verse drama." Genre
and time period are obvious limitations, but there are undoubtedly others. Pro-
ponents should go further by describing the size of the samples tested. More
importandy, what are the measurements of reliability? Is this a claim of 99% or
95% certainty? One benefit of the chi-squared test I mentioned earlier is that it
can calculate its degree of certainty with precision.

Each method should include a statement of standards for treatment of orthog-
raphy (were texts modernized in any way? were spellings standardized?) as well as
attention to bibliographical matters (which editions were used and why?).[16]

Proponents of attribution methods must also distinguish between those that
reliably disprove authorship versus those that assert it. One can easily imagine a
method that can show that a particular author did not write a particular work,
but cannot show with any degree of certainty that a different author did write
the work.

Finally, I would make a plea that proponents separate their arguments in
support of the method from its application in particular cases. That is, propo-
nents should first independendy publish an account that describes the method
and its limitations as well as how it responds to known and accepted cases of
authorship. This step should give the scholarly community an opportunity to
examine the method and respond. Only after a reasonable time to allow for this
exchange should proponents begin to offer attributions (or de-attributions) of
particular examples. I hope that this delay would decrease irresponsible claims
of authorship.


For a useful survey of other quantitative approaches and a judicious assessment of the
potential pitfalls and rewards of this burgeoning subfield, see Harold Love, Attributing Author-


For a thorough discussion of these and related concerns, see Joseph Rudman, "Un-
editing, De-Editing, and Editing in Non-traditional Authorship Attribution Studies: With an
Emphasis on the Canon of Daniel Defoe," Papers of the Bibliographical Society of America 99 (2005):