University of Virginia Library

Search this document 


  

expand section 
collapse section 
Archival Standards in Documentary Editing by Ronald J. Zboray
 1. 
 2. 
 3. 
 4. 
 5. 
 6. 
  
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 
expand section 

expand section 

34

Page 34

Archival Standards in Documentary Editing
by
Ronald J. Zboray

Documentary editors seldom take note of archival practice though they have much in common with archivists. Both must confront the formidable task of controlling, in one way or another, thousands upon thousands of documents. Authorship and dates of documents must be ascertained, documents have to be filed in some sort of chronological order, and indexes (of cards or in computer data bases) must be constructed to provide access to the "collection."

One would think then that editors and archivists would hold in common some general approaches and principles, but this has not happened. For example, although Mary Jo Kline in her A Guide to Documentary Editing advises in her chapter on document control: "if the editor is not an archivist or rare books librarian by training, he must learn the rudiments of these professions," she fails to mention such commonplace terms in the archival world as Anglo-American Cataloguing Rules, MARC AMC, or "authority work." But then again, Dr. Kline, through her methodology of finding out how various documentary projects operate, merely reflects the standard practice of editors.[1]

The different goals of editors and archivists to some extent account for the distance separating their practices. Editors must control individual documents rarely numbering beyond the thousands while most archivists must deal with collections that often consist of millions of documents. The few archivists who control individual documents are sometimes deemed by the profession, not entirely charitably, "antiquarians." Editors, as their name implies, have always been guided by the need to produce an edition and not so directly by the archivist's goal of preserving manuscripts and rare printed materials from the vicissitudes of time. Editors aim to disseminate their work to a much wider audience than the relatively small number of hearty researchers who in person besiege archivists each year.

Ironically, archivists themselves for a long time followed few common standards. Archivists generally observed the dictum to "let the nature of the collection determine the principles used to organize it." Though the resulting, often surreal bevy of organizational strategies frustrate many researchers today, they have by and large worked well enough and have often graced the task of archival research with a certain quaint charm. On the other hand, the archivist's cherished principle of respect des fonds has often preserved the


35

Page 35
original order in which documents were generated, saving them from idiosyncratic schemes of organization imposed upon collections by well-meaning but often misguided collection managers.

While the arrival of the age of electronic information storage and retrieval has not changed the way archivists organize their collections, it has dramatically altered how they store information about these holdings. The format of such information must be standardized in order to permit easy retrieval and, eventually, storage on-line in a national data base of manuscript holdings. Since the early 1980's such standardization has increasingly drawn the attention of archivists, swelling the numbers at Society of American Archivist sessions devoted to computerization.[2]

Standardization problems in automation have come slightly later and with less force to documentary editing. Most projects have used the computer for word processing rather than controlling documents. A study funded by the Office of Scholarly Communication and Technology found that as late as 1986 only 17 of 53 humanities projects funded by the National Endowment for the Humanities used computers to create data bases. Yet documentary editors have been publicly speaking out for the "possibility of pooling information from editorial projects in data bases" since as early as 1981. With time as scholarly data bases become more widely available and with the eventual acceptance of CD ROM storage, such centralized data bases for documentary editions will become feasible. But only those documentary editing projects which have taken the time to address the issue of archival standardization will be able to contribute to such data bases. Ironically, projects without computers which have adopted standard cataloging rules will be more likely to go on-line (thanks to the rising generation of hand-held scanners) than will many of the projects with computerized data bases in incompatible formats.[3]

In short, documentary editors should begin now to evaluate adopting archival standards to assure that their work will be able to meet tomorrow's demand for computerized access to information. As a first step in this analysis, four central topics of archival standardization will be discussed here: cataloging rules, the MARC AMC format, authority work, and subject indexing. The applicability of these standards to the peculiar needs of documentary editing will be addressed, based upon the experiences of the Emma Goldman Papers, a project which has striven to adhere to at least some of these standard practices.[4]

Cataloging Rules

Archivists and other librarians now almost universally catalogue materials according to the second edition of Anglo-American Cataloging Rules (AACR2). These rules grew out of a cooperative venture on the part of the American Library Association and the British Library Association to create a standard format, an effort which produced the first edition in 1967. In keeping with the library orientation of its parent bodies, AACR1 gave short shrift to nonbook materials and the British and American editions differed. To make matters worse, two years after AACR1, the International Meeting of


36

Page 36
Cataloguing Experts met in Copenhagen to create a new international standard: International Standard Bibliographical Description (ISBD). After many, many meetings and negotiations and with the addition of the Canadian Committee on Cataloging, the Library Association, and the Library of Congress as sponsors, the greatly revised and expanded second edition appeared in identical British and American editions in 1978.[5]

AACR2 requires a great deal of study before a prospective cataloguer may implement the rules. The difficult organization of materials contributes to the steep learning curve. The first chapter of the book sets out in forty extremely dense pages "General Rules for Description." With these rules in mind, the cataloguer must then consult one of twelve other chapters for specific material-based formats ranging from the expected "Books, Pamphlets, and Printed Sheets" and "Manuscripts" to "Machine-Readable Data Files" and "Three Dimensional Artefacts and Realia." The formats comprise the first part of the book; "Headings, Uniform Titles, and References" concludes it. This final section must be consulted for proper forms for a wide variety of personal, geographic, and corporate names. Cataloguers unfamiliar with the formats find themselves constantly leaping about from the general rules to the specific and thence to the prescriptions for name forms.[6]

AACR2 confronts the user with an unfamiliar, sometimes exotic technical vocabulary. For example, instead of the familiar term "author," AACR2 uses the unwieldy but perhaps more precise "statement of responsibility." For articles or other materials appearing in larger printed formats, the cataloguer must consult a section entitled "'In' Analytics" under a chapter entitled "Analysis." If two authors are responsible for the work, the first author is indexed under the category of "Main Entry" and the other under "Added Entry."

The formats themselves may confuse first-time users. For example, an entry for a piece of correspondence contains, in order, the type of document (e.g., letter, telegram), the date composed (in year-month-day format), the place of composition, the name of the recipient, the recipient's location, a back slash, the author's name, a dash, the number of pages, a semi-colon, the size, another dash, and then the physical description (e.g., typescript or holograph). A typical entry might read, with editorial interpolations (brackets): "[Letter, 1931?] Dec. 20 [Paris, to] Emma [Goldman, St. Tropez?] / Peggy [Guggenheim]. — 2 p. ; 30 X 21 cm. — Typescript signed (photocopy)." For those unfamiliar with the format, it takes a while to hunt down the author and to sort out the place of writing from the destination of the correspondence. The elimination of back-to-back bracketing may easily lead the reader to suspect that "Goldman" may be a sub-unit of the municipality of St. Tropez. And it takes a long time to become accustomed to leaving spaces before colons and semi-colons or to using the period, space, dash, space (i.e., ".—") as a subsection delimiter.

Formats differ from one another in which the order of data elements appear. For example, in the case of correspondence, the date of composition


37

Page 37
follows the first element in the entry, typically: "Letter, 1899 March 30. . . ." However, for articles, the date appears at the very end, as in the case of: "The White Slave Traffic / Emma Goldman.—p. 344-351; 19.5 cm. In Mother Earth. — Vol. 4, no. 11 (Jan. 1910)." Note, too, that "p." here stands for the inclusive pages in the parent imprint and not for the number of pages as was the case for correspondence. And only the height of the document is given, whereas the manuscript format prescribes height and width.

Data element order differs even within the same format as in the case of manuscript speeches. A typical entry would read: "[Speech, before A Mass Meeting to Promote Local, State, and Public Works for the Unemployed, Union Square, New York] / Emma Goldman. — [1893 Aug. 21]." Here, in the same format as correspondence, the date comes after the author, not before, as with correspondence. Great differences exist in the manuscript format between manuscript volumes, correspondence, speeches and sermons, legal documents, collections of manuscripts, and miscellaneous single manuscripts.

The labyrinthine rules of AACR2 certainly would seem to discourage editors attempting to adopt them, not to mention their processing staffs, if they have them. The Goldman Papers, for example, at first tried to encourage everyone to become familiar with the AACR2 manual; with time it became clear that the manual daunted even the most adventurous of the staff. The staff, instead, streamed into the microfilm editor's office, presumably thinking that he had the final word on the manual. It soon became clear that a usable interpretation of the AACR2 was necessary. Some of the more common AACR2 formats were written directly into the project's data entry program. The microfilm editor wrote other of the more exotic rules into the project's data processing manual and he finally became resigned to acting as the final authority for interpreting the rules.

In short, because of the complexity of AACR2—much greater than MLA or Chicago style manuals—documentary editors will probably only be able to adopt the rules if they can boil them down for their own purposes. Because editions usually have so many documents in the same format, this should not present very many problems. But, of course, nearly every project must deal with stray documents in highly unusual formats. To meet this eventuality, someone in the office will have to specialize in knowing the AACR2 manual, or at least be familiar enough with it to solve expeditiously any problems that may come up.

One may well ask after this discussion: is it worth the time and effort to use AACR2? Certainly operating under the rules takes much more thought and care than working with other style manuals; editors will have to spend at least twenty-five per cent more time entering data or filling out cards. Those projects preparing printed editions will also face the formidable task of getting AACR2 material into either the MLA or Chicago footnote and bibliography forms. Nevertheless, AACR2 promises to give projects the chance to put their material on-line in some future national data base. The point of most editions has been to create a definitive work that will last for


38

Page 38
years to come. AACR2 insures that not only the edition but also the formidable work that goes into collecting manuscripts from a wide variety of sources will be preserved.[7]

MARC AMC

The rationale for using AACR2 depends upon the ultimate ability to put records on line in a computerized data base. The MARC AMC format provides the bridge between AACR2 (conceived of as a manual format), the archivist's or editing project's own computer data base, and nationwide networks of data bases.

The acronym "MARC AMC" stands for Machine Readable Cataloging, Archival and Manuscripts Control format. The AMC format represents only one of many formats that grew out of the Library of Congress's attempt to automate its substantial holdings. When the Library of Congress published its first AMC format in 1973, archivists complained that it worked best only for item-by-item description and, in general, refused to adopt the format. Archivists experimented with several other systems during the 1970's, but their implementations differed so much to make it impossible to create a network for sharing data. To address this problem, the Society of American Archivists (SAA) brought together in 1977 a National Information Systems Task Force that recommended a greatly modified form of MARC AMC be put forth as the standard for the archival community. In order to insure against only librarians determining the standard, the SAA's Committee on Archival Information and Exchange (CAIE) would help maintain the standard with the American Library Association's Committee on the Representation in Machine-Readable Form of Bibliographic Information.

Editors and archivists generally need several works in order to understand and use the MARC AMC format. The Library of Congress's own MARC Formats for Bibliographic Data Update 10 (Library of Congress, 1984) formally presents the AMC format although the manual contains several errors, corrected in Update 11 (1985). A more useful exposition of the format can be found in the Nancy Sahli's MARC for Archives and Manuscripts: The AMC Format (Chicago: Society of American Archivists, 1985). Either work cannot be truly comprehended without consulting the examples given by Max J. Evans and Lisa Weber in MARC For Archives and Manuscript: A Compendium of Practice (The State Historical Society of Wisconsin, 1985). Other works which should be consulted include Steven L. Hensen's Archives, Personal Papers and Manuscripts (Library of Congress, 1983), the Research Libraries Group's AMC Field Guide (Stanford, Cal.: Research Libraries Group, 1983), and Walt Crawford's MARC for Library Use: Understanding the USMARC Formats (White Plains, N.Y. and London: Knowledge Industry Publications, 1984).[8]

That so many works are necessary for using the format suggests its complexity. The basic unit of the format is, as anyone familiar with data base management programs will recognize, the "field." Computers recognize fields as containers for specific types of descriptive information (e.g., title, author,


39

Page 39
or date). Fields can be "fixed," meaning that they will accept a pre-determined number of characters, or they can be "variable." In MARC AMC, fields can also be divided into subfields, which are indicated within the field by the use of a dollar sign '$' and a lower case alphabetical character whose meaning changes depending upon the field. These characters are called "subfield delimiters." MARC prescribes that each field have a descriptive title and a numbered "tag" to identify the field to the computer. All fields taken together for each document described constitute a "record."

Each record in MARC AMC has 77 variable data fields which in total provide the most in-depth description that archivists think they will need for years to come. Whereas AACR2 is best conceived of as a form for generating library index cards, MARC AMC has been developed with an eye to eventual computer manipulation of data. A good deal of duplication of data results from this goal. For example, the date and place of publication of a document must be recorded in separate fixed fields and together within the variable-length title field. The need for data manipulation and searching also means that throughout the format extensive codes are used to represent often complex descriptions. In the case of the physical description fixed field (Tag 007), "hdrbfbo16baca" translates into:

  • h = microform
  • d = microfilm reel
  • r = reproduction
  • b = negative
  • f = 35 mm. microfilm
  • b = normal reduction
  • 016 = 16:1 reduction ratio
  • b = monochrome
  • a = silver halide
  • c = service copy
  • a = safety base

Needless to say, the MARC AMC format confronts first-time users with a seemingly insurmountable maze of codes, fields, and subfields. The archival profession itself has generally trembled with trepidation at adopting the format since its release, and the ability to deal with MARC AMC has become a much-coveted skill for job applicants in the field.[9] Many major research libraries have effected only a very minimal interpretation of the format, due to the considerable cost and skill necessary to create full-blown MARC records.

Editors attempting to implement MARC AMC on microcomputers face additional problems. The MARC records with their 77 fields are lengthy and editors catalog on a document-by-document basis, meaning that full implementation of the format requires a great, great deal of hard disk storage. Versions of DOS before 4.0, the most common operating systems for microcomputers, cannot address a single data base file stored on a disk greater than 32 megabytes. Most projects would probably need to jump over the 32 megabyte limit by using DOS 4.0 or some other specialized software, if they intend to adopt MARC.

Software for getting information stored into a MARC format also will present problems for editors. Most libraries committed to MARC use the programs offered by bibliographic utilities such as OCLC and RLIN for creating MARC records. Commercial data base packages can be adapted to


40

Page 40
the needs of MARC AMC only with great difficulty and considerable programming. The biggest stumbling blocks of these programs grow out of MARC's need for numerous variable-length fields. Most of the major, commercial data management programs for microcomputers, with the exception of Revelation, operate on a fixed-field length basis, meaning that the vast majority of disk space in any implementation of MARC will be wasted by unused portions of fixed fields.[10] A new program tailored specifically by archivists at Michigan State University for MARC AMC users has recently come on the market and promises to address some of these problems. Editors may also look into "turnkey" systems—combining hardware, software, and instructional and service support—such as those of Geac and OCLC's LS2000, although these systems may prove to be too expensive for the meager budgets of most projects.[11]

Once again, the question must be asked concerning MARC AMC, as it was for AACR2: is it worth it for editors? Until an expert-type software (i.e., one in which the formatting rules are embedded within the program) becomes available, probably not. The format in its present form is simply too difficult and costly to implement and maintain, especially on a document-by-document basis. Editors with enough funds could hire a full-time cataloger with MARC AMC experience to oversee data entry. Nevertheless, the sheer volume of information each MARC AMC record contains dramatically increases the chances for errors and the time needed for proofreading. And quality control becomes hard to attain unless every processor is well versed in the rules of the format. In short, MARC AMC threatens to distract editors from their primary work, which is, after all, producing an edition, not creating a data base.

One of the advantages of electronic information storage, however, is that data can later be manipulated into different formats. At some future date when users may be able to access information about manuscripts through an on-line, national data base, documentary editors may be able to transfer their data bases to the MARC AMC format, if they take the proper steps now to insure compatibility.

The issue of compatibility should be addressed at the conceptual stage of data base design. Those who have designed data bases will immediately realize that MARC AMC, like AACR2, represents only a goal of information output and may not be the best way to store and retrieve information locally. For example, a local data base may need only one date field to feed, via programming, into the three or more fields in MARC AMC that may require date information. Particularly, the AACR2 title field must be broken down into component parts or else, for correspondence, programs will index all letters under the initial part of the field (i.e., "letter to" instead of the recipient's name). Even if the author's name is removed to a separate field, AACR2 demands a first-then-last name order and MARC AMC and most data base management programs, vice versa.

Editors should not think that manipulating information electronically will easily solve the problems of transferring data between different formats.


41

Page 41
The previous case, for example, of changing the order of first and last names, requires a herculean amount of programming effort to account for all the idiosyncrasies of usage. At first glance, such manipulation would seem to depend merely upon the location of the comma in an inverted name form. But what about abbreviations, preceded by a comma, such as "Ltd."? Or names of business firms like the publishers "Little, Brown"? The cases multiply to the point where a program has to take into account literally hundreds of rules and thousands of exceptions. Some projects even enter the author or recipient's name in inverted order with the last name in all capital letters, or perhaps worse, titles in all capitals. The distribution of lower and case letters in a name or a title is something that, at this point, only human intelligence can do largely because humans decide, in not the most logical fashion, what their names and the title of works will be. For example, how would a program know to capitalize the first "V" in "W. S. Van Valkenburgh" and not the same letter in "Henry van Dyke"? In short, editors should study AACR2 and the MARC AMC format very closely to make sure that their data bases will be able to output information in either form.

The Goldman Papers' most recent revision of its data entry program, called with heavenward hopes "Ultimate Emma", has the ability to address both formats. Document processors enter data directly at one of six microcomputer work stations networked together and sharing distributed hard disk resources. Information such as date, place of publication or writing, destination (if correspondence), recipient or title are all entered in separate fields via a menu driven system that records such information in a form which allows for electronic manipulation and indexing. After this, the program assembles the various fields into a large, AACR2-compatible title field. The document processors are then given the option to enter editorial attributions (called in AACR2 parlance "interpolations") such as brackets and question marks—the things which, if they appeared in fields to be indexed, would destroy any semblance of order. For example, a bracketed last name beginning with "Z" would come before one beginning with "A". Through the design of the data base and through the manipulations of the computer program, the Emma Goldman Papers has thus shown that it is possible for editorial projects to enter and retrieve its data efficiently and in a form which can easily go on-line in a national MARC AMC-based data base.

Authority Work

All editors already do in-office authority work when confronted with documents with variants of an author's name. Emma Goldman, for example, wrote letters under various pseudonyms such as E. G. Colton, E. G. Smith, Emma Clausen, and E. G. Brady. Obviously in all these cases, "Emma Goldman" would be the authorized name form under which the document would be indexed. But what of another famous case, her lover and lecture tour manager, Ben L. Reitman? Should his name appear in an index as "Benjamin Lewis Reitman", "Benjamin L. Reitman", "Ben Lewis Reitman", "Ben Reitman", or "Ben L. Reitman"? If the name is taken directly from documents,


42

Page 42
Reitman would appear as five different people in an index, something which would have amused him as a dedicated prankster and a sometime anarchist, but which would only irk researchers seeking him out.

Authority work within the archival world expands this type of authorized control of name forms and subject headings beyond the confines of the office to archives across the country. Only with knowledge of authorized names and subjects can users search and retrieve information from a national on-line data base. But to make the system workable, all the archives must use the same common name form as well, or else users will have to somehow know all the variants and search them out, too—a time-consuming proposition.

As with MARC AMC, the Library of Congress has, by virtue of its size and commitment to being the national library, taken the lead in developing authority control for national computer networks. In 1975 the National Commission on Libraries and Information Science funded a study by Lawrence Buckland, The Role of the Library of Congress in the Evolving National Network, which led to the establishment of the Network Development Office (NDO) at the Library of Congress. NDO faced the task of computerizing what had been a manual process. The Library of Congress had for years maintained, in card file form, a name authority system of which the most publicly visible portion was the name forms in National Union Catalog. In 1974, the Library published some of this information in Name Headings With References. A batch automated system followed in 1977, which generated such things for distribution to libraries as authority cards and cumulative microfiche. Finally in the early 1980's, through the efforts of NDO and the MARC Editorial Division, the authorities went on-line in their own MARC AMC format.[12]

MARC AMC Authorities do not hold much promise for documentary editors seeking proper name forms for their editions. The Library of Congress enters authority name forms only for material currently being processed; retrospective authority records are updated only as they are used. Thus a new work with a historical person's name as author or as a subject will have to be published and catalogued by the Library for the name form to be updated. Since so many of the names that come up in documentary editions belong to relatively obscure people, the chances are that those forms would not be contained in the MARC Authorities data base, and if they were, they probably would have an incorrect pre-AACR2 form.

Nevertheless, editors should attempt to do as much authority work using the national system as possible. This could be as simple as checking name forms against the National Union Catalog—though it must be remembered that these entries will probably be in pre-AACR2 form. The best approach would be for editors associated with a university to arrange for the campus library to allow them access to one of the bibliographic utilities that carry MARC Authorities, such as OCLC.

No matter in what way editors may undertake this authority work, they will find that the authorized name sometimes is not the best form, according to their much more informed editorial judgment. Catalogues at whatever


43

Page 43
institution creates the name forms very seldom have anything approaching the specialized knowledge of editors. For on-line searching, however, the name form itself matters less than maintenance of consistency throughout the system. Of course, this means that editors may need to employ a two-track system for authority work: one which looks outward to the national data base and the other for use within the office and, ultimately, within the edition.

Obviously, authority work adds a good deal of time to cataloging. For example, descriptive catalogers at the Library of Congress spend, according to one estimate, half their time in name authority work.[13] Since editors deal with a much more finite universe of names, it would probably be likely that efforts at achieving proper name forms might take only a maximum of one-quarter more processing staff time. As with most cataloging, the error rate runs very high; according to one study, if names occur more than ten times in a data base at least one error should be expected.[14]

The Goldman Papers approaches the issue of authority work in the following manner. All authors, recipients, and titles receive four-letter mnemonic codes. Names generally take, in order, the first two letters of the last name and the first two letters of the first (e.g., "goem" for Emma Goldman). Titles are coded using the first two letters of the first word, and the first letters of the next two important words. Living My Life, for example, takes the code "liml." When entering a document for the first time, processors, instead of entering the full name, attempt to input the four-letter code. The program seeks the code in the project's authority data base. If the program finds the code, it retrieves it and asks to verify the name. If not, the program prompts the user for new code; if so, the program writes the authorized name form directly to the record. If the program does not find the code in the authority files, the program gives the processor the opportunity to create a new entry in the name authorities file. Periodically, the microfilm editor reviews new entries for proper form.

The Goldman Papers system has several advantages for doing authority work. Since one entry creates all names in the main data base, variant forms do not occur. Eventually, the four-letter codes that appear in each record can be linked to any altered entries in the authority data base; hence if the forms the Goldman Papers uses differ from those of MARC Authorities, the MARC form can be entered in the authority data base to permit output in a form which allows for on-line searching.

Subject Assigning

Subject assigning constitutes, in actuality, a subset of authority work. The same need to maintain consistency over the forms of access points makes highly controlled subject coding crucial in an on-line environment.

Subject indexing differs from name authority work, however. Subject assigning must be, by its very nature, less exact than name authority work. A part of the lore of many editing projects concerns the time when editors and processors passed around a single document and no one assigned the same subjects to it. The story is probably apocryphal; editors grow so accustomed


44

Page 44
to working with an eye towards objectivity that they can easily forget that subject indexing represents a highly, informed interpretation of the material. Indexers differ in their style almost as much as authors, which is not to say that poor indexing can be excused by an indexer's right to unbridled self-expression. Indexing can be an act of concentrated creativity and perspicacity or it can be a futile effort that ultimately turns away readers.[15]

Although the style of an index may vary, its quality depends in large part upon internal consistency. If indexing something with great relevance to the social history of the period, such as Samuel Sewall's diaries, one cannot create an entry for "beer" on one page and index "beer" under "alcoholic beverages" on another (if this is the only such beverage appearing on the page).[16] Readers should feel confident that when they look up an entry, they will find there all the relevant portions of the text, unless cross-referenced to another subject heading.

Electronic information storage and retrieval demands not only internal consistency within an index, but that the subject headings used conform in their form, meaning, and application to widely accepted standards. Thesauri of subject terms present editors and archivists with a means of achieving this external level of consistency.[17]

The Library of Congress's Subject Headings (LCSH) have become for the library and archival world the de facto thesaurus, though in reality those two large red books hardly deserve the term. A true thesaurus would allow users to be able to follow, through cross-references, natural language forms of an expression to the most technical. For example, if someone tries to use "street-walkers" they will not be referred to "prostitution." In other words, one must in general know the proper authorized term before using LCSH. The form of LCSH also differs from a true thesaurus. A thesaurus like that of the Educational Resources Information Center gives broader, narrower, and related terms as well as extensive scope notes; LCSH only renders "see also" references and has very little advice on the way the terms should be used.[18]

Tomes can be written about the inadequacies of LCSH for the age of electronic information storage and retrieval. LCSH began as (and to a large extent remains) a manual thesaurus; as of mid-1987 it had still not gone on-line. One must thumb through thousands of pages of two unwieldy volumes to use the work rather than being able to pose simple queries to a data base. To the horror of experienced data base managers, the LCSH has been issued since 1908 in ten editions; no variations—and they have been considerable—between the editions have been cross-referenced or even formally tracked by the Library of Congress. To make matters worse, the headings used by LCSH bewilder specialists in several fields. Why would film historians ever look under LCSH's "moving-pictures" when they probably never used the term in their entire scholarly careers? And LCSH manages to offend, in various ways, several special interest groups, especially minorities. The Subject Headings have been so insensitive to women's issues that "counter-thesauri" have been posed as correctives. Finally, LCSH does not provide the descriptive depth required by the type of document-by-document indexing


45

Page 45
that editors undertake; several editors have tried to use the subject headings and have given up in exasperation.[19]

Given all LCSH's problems, should editors avoid it completely? Probably not, since the Subject Headings remains the closest thing to a true, national thesaurus. If the federal government ever does realize the importance of such a thesaurus for the electronic information age and supplies funds for bringing together specialists in various fields, lexicographers, and librarians to develop a definitive thesaurus, undoubtedly the basis will be the system developed by the Library of Congress.

Editors can make use of LCSH if they use it only as a basis for subject assigning. For example, the Goldman Papers addressed the limitations of the Library of Congress system in two ways: by reshaping the original scope of the subject headings to include women's experiences and by creating new headings which are then cross-referenced back to the nearest Library of Congress term. The project has avoided working with the large volumes by pre-selecting the terms likely to be used and entering them into a thesaurus data base. Subject terms have been grouped beneath ten major headings that range from "Arts and Letters" to "Economics" and in over a hundred subgroups. For example, "Booktrade" appears beneath "Arts and Letters." Subject headings displayed under their assigned group and subgroup are frequently printed out to be used by staff members who process documents. Subject assigning follows a system similar to that for name authorities in which four-letter mnemonic codes stand for headings (i.e., "aupu" for "Authors and Publishers"). The data entry program ensures that codes will not be duplicated and allows for the entry of new terms in the thesaurus data base with proper group and subgroup headings.[20]

With the new generation of microcomputers based upon the Intel 80386 microprocessor chip and the new operating system being jointly developed by Microsoft and IBM (OS/2) it will become feasible for projects to consult a subject thesaurus directly on the CRT screen in a manner similar to the various thesauri currently available for word processing programs. A pop-up menu would present various authorized options to a natural language subject term which can be entered electronically by merely hitting a number key. However, on today's IBM-compatible microcomputers based upon the 8088 and 80286 chips, such thesaurus authority work would be difficult to manage and would be prohibitively slow given the random access memory limits of these computers.

Conclusion

The above discussion has attempted to point out that documentary editors may be coming to an important juncture in their work. As on-line national library and archival data bases become a reality, editors will have to assess the role, if any, they will play in these networks. If editors and the agencies and foundations that fund them decide that a part of the goal of making editions accessible to the public will be the ability to do on-line searching of documents published in documentary editions, then editorial


46

Page 46
projects will have to start thinking now about using the same standards archivists employ. Certainly it seems a waste of energy for any project controlling its documents with the use of a data base management system to ignore the archival standards which could open the data base to a wide audience.

Yet documentary editors must be prepared to put a great deal of effort into adopting archival standards. AACR2 and the MARC AMC format must be mastered and incorporated into the project at the cost of increased complexity and, consequently, more time required to finish the project. That cost may presently not be worth it for editors, at least until expert programs have been written to ease the use of AACR2 and MARC AMC. Authority work for names and subjects poses further hindrances. Projects would have to add the search for authorized name forms onto their already burdened work load. For subjects, editors will have no other choice but to use the Library of Congress's Subject Headings and to stumble along with often inappropriate headings. Nevertheless, editors can take comfort in that archivists will face many of the same problems. Out of those problems will grow new solutions in microcomputer software and on-line data base utilities. An argument thus can be made for editors waiting and letting archivists shoulder the burden of developing approaches to on-line access.

Waiting may be a mistake, however, since archivists and editors, though having much in common, have differing goals. Any national data base of manuscript holdings will probably not include the type of document-by-document description editions offer. Such description simply takes up too much information storage space to be kept permanently on-line.

If editors can manage to get onto a national data base or some other bibliographic utility, they would essentially have a new form of publication (the data base) and, hopefully, a new source of revenue. Libraries, for example, are often paid or given credits for the MARC records they create. Apart from monetary considerations, going on-line with indexed editions could conceivably attract far more users than books or microfilm. A user would need only a microcomputer with a modem to gain access to the data bases of editions. In a single search for a name the user would have contact with the work of several documentary editing projects. If documentary editors have done their authority work well, that search could yield records for hundreds of documents in a matter of seconds; if done manually, through library visits or interlibrary loans or printed catalog searching, the same search would take weeks if not months.

Before editors rush into the new millennium of "connectivity" (i.e., networked information storage and retrieval), they should realize at least one danger that lurks in the future. Documentary editions projects' creation of data bases with on-line capability may be seen by some funding agencies and foundations as a low-cost alternative to both microfilm and book publishing, in the same manner that microfilm has been argued as appropriate for editions of lesser figures in American history. Editors who decide to pursue connectivity should make it clear to all grantors that the data bases being


47

Page 47
constructed are meant only to increase the access to microfilm and book editions—else these editors may become more like archivists than they ever intended.

With this caveat in mind, editors have one other reason, one less tangible than the ones presented above, for becoming familiar with archival standards. Learning the standards gives often-isolated editors a sense of community. The archivists and librarians who developed the standards faced many of the same problems as editors do in their day-to-day tasks. Seeing how the standards deal with problems gives an editor a shock of recognition, even if editors reject the standards themselves. The standards took many minds and much discussion to construct. Editors who familiarize themselves with the standards thus engage in a dialogue with this on-going process of definition within the archival profession. This dialogue can enrich the decision making of editors and ultimately contribute to their sense of professionalization.

Notes

 
[1]

Mary Jo Kline, A Guide to Documentary Editing (The Johns Hopkins University Press, 1987), 48.

[2]

For an argument for archival standardization see Lydia Lucas, "Efficient Finding Aids: Developing a System for Control of Archives and Manuscripts," American Archivist 44 (1981):24-25. The movement for a national data base of manuscript holdings began with the formation in 1977 of the National Information Systems Task Force. See Richard H. Lytle, "A National Information System for Archives and Manuscript Collections," American Archivist 43 (1980):423-426 and idem, "An Analysis of the Work of the National Information System Task Force," American Archivist 47 (Fall 1984):357-365.

[3]

Anne Jamieson Price, The Use of Computers in Humanities Research (Washington, D.C.: The Office of Scholarly Communication and Technology, 1986), 19-20; for an account of the early discussion of common data bases for editorial projects, see the account of the general discussion following the final session of "Modern Technology and Historical Editing: National Historical Publications and Records Commission Word Processing Conference" (held in May 1981 in Philadelphia) presented in Kathleen Waldenfels, "'Goodbye Gutenberg,'" Newsletter of the Association for Documentary Editing 3 (1981):1-2. The DEST Handset Scanner, for example can read documents letter by letter into different fields of a data base, thus permitting selective entry of information on the page instead of reading a whole page into a text file.

[4]

The Emma Goldman Papers, at the Institute for the Study of Social Change at the University of California at Berkeley, has been in existence since 1980 under the directorship of Dr. Candace Falk. With grants from the National Endowment for the Humanities and the National Historical Publications and Records Commission and several private foundations, the Goldman Papers will publish with Chadwyck-Healey, Inc. a comprehensive microfilm edition and four-volume index and guide of Goldman's writings and correspondence as well as government documents pertaining to her.

[5]

Michael Gorman and Paul W. Winkler, eds. Anglo-American Cataloguing Rules, 2nd ed. (American Library Association, 1978); Seymour Lubetzky and C. Sumner Spalding, Anglo American Cataloguing Rules (American Library Association, 1967).

[6]

For a wide range of examples of AACR2 formatting, see Florence A. Salinger and Eileen Zagon, Notes for Catalogers: A Sourcebook for Use with AACR2 (White Plains, N.Y.: Knowledge Industry Publications, Inc., 1985). A useful, systematic approach to formatting can be found in Malcolm Shaw and others, Using AACR2: A Diagrammatic Approach (Phoenix: The Oryx Press, 1981).

[7]

For an excellent overview of the problems faced by many institutions which attempted


48

Page 48
to adopt AACR2, see Judith Hopkins and John A. Edens, eds., Research Libraries and Their Implementation of AACR2 (Greenwich, Conn.: JAI Press, 1986).

[8]

Three articles appeared in American Archivist 49 (1986) which serve as an excellent introduction to MARC AMC: Nancy Sahli, "Interpretation and Application of the AMC Format" (9-20); Katharine D. Morton, "The Marc Formats: An Overview" (21-30); and Steven L. Hensen, "The Use of Standards in the Application of the AMC Format" (31-40).

[9]

According to a survey of 261 repositories conducted by the Society of American Archivists in 1987, a full 37% of archivists say that they do not plan to use MARC AMC. See Lisa B. Weber, "Automation Survey Results," SAA Newsletter (September 1987):4-5.

[10]

Ronald J. Zboray, "dBASE III Plus and the MARC AMC Format: Problems and Possibilities," American Archivist 50 (1987):210-226.

[11]

MicroMARC:amc (Michigan State University Archives and Historical Collections).

[12]

Library of Congress, Authorities: A MARC Format (Library of Congress, Processing Services, 1981); Update no. 1, June 1983 and Update no. 2, June 1986. For important back-ground reading on authority work, see: Robert H. Burger, Authority Work (Littleton, Colo.: Libraries Unlimited, 1985); James R. Dwyer, "The Road to Access and the Road to Entropy," Library Journal 112:14 (September 1, 1987):131-136; Mary W. Ghikas, ed. Authority Control: The Key to Tomorrow's Catalog: Proceedings of the 1979 Library and Information Technology Association Institutes (Phoenix, Az.: Oryx Press, 1982); Lorene E. Ludy and Sally A. Rogers, "Authority Control in the Online Environment," Information Technology and Libraries 3 (September 1984):262-266; Dan Miller, "Authority Control in the Retrospective Conversion Process," Information Technology and Libraries 3 (September 1984): 298-292; Arlene G. Taylor, "Authority Files in Online Catalogs: An Investigation of their Value," Cataloging and Classification Quarterly 4 (Spring 1984):1-17; Catherine M. Thomas, "Authority Control in Manual versus Online Catalogs: An Examination of 'See' References," Information Technology and Libraries 3 (December 1984):393-398; Mark R. Watson and Arlene G. Taylor, "Implications of Current Reference Structures for Authority Work in Online Environments," Information Technology and Libraries 6 (March 1987):10-19.

[13]

Lucia J. Rather, "Authority Systems at the Library of Congress," in Ghikas, Authority Control, 158; Lawrence F. Buckland, The Role of the Library of Congress in the Evolving National Network: A Study Commissioned by the Library of Congress Network (Library of Congress, 1978).

[14]

Sally McCallum, "Evolution of Authority Control for a National Network," in Ghikas, Authority Control, 57-58.

[15]

The guide most documentary editors consult for indexing has long been Sina Spiker, Indexing Your Book: A Practical Guide for Authors (University of Wisconsin Press, 1954). Though it offers sound general advice, it hardly serves as a how-to manual for subject assigning in an age of electronic information retrieval. For an overview of how subtle the field of indexing has become see the journal, The Indexer; sample articles can be found in Leonard Montague Harrod's excellent Indexers on Indexing: a selection of articles published in the Indexer (R. R. Bowker Co., 1978). Other starting points for indexing include: G. Norman Knight, The Art of Indexing: a Guide to the Indexing of Books and Periodicals (London; Boston: Allen & Unwin, 1979); Jennifer E. Rowley, Abstracting and Indexing (London: Clive Bingley, 1982); Donald B. Cleveland and Ana D. Cleveland, Introduction to Indexing and Abstracting (Littleton, Colo.: Libraries Unlimited, 1983); Jessica L. Milstead, Subject Access Systems: Alternatives in Design (Orlando: Academic Press, 1984); K. G. B. Bakewell, Classification and Indexing Practice (London: C. Bingley; Hamden, Conn.: Linnet Books, 1978); Harold Borko and Charles L. Bernier, Indexing Concepts and Methods (New York: Academic Press, c1978); Timothy C. Craven, String Indexing (Orlando: Academic Press, 1986); Everett H. Brenner and Tefko Saracevic, Indexing and Searching in Perspective (Philadeophia: National Federation of Abstracting and Information Services, c1985).

[16]

For example, the index of Samuel Sewall, The Diary of Samuel Sewall, 1674-1729, M. Halsey Thomas, ed. (New York: Farrar, Straus and Giroux, 1973) does not list all occurrences of the word "beer" in the text.

[17]

For understanding the process of constructing thesauri, see: Helen M. Townley and


49

Page 49
Ralph D. Gee, Thesaurus-making: Grow Your Own Word-Stock (London: Deutsch; Boulder, Co: distributed by Westview Press, 1980); Alan Gilchrist, The Thesaurus in Retrieval (London: Aslib, 1971); Maxine MacCafferty, Thesauri and Thesauri Construction (London: Aslib, c1977); and F. W. Lancaster, Vocabulary Control for Information Retrieval, 2nd ed. (Arlington, Va.: Information Resources Press, 1986).

[18]

Educational Resources Information Center, Thesaurus of ERIC descriptors, 10th ed. (Phoenix, Az: Oryx Press, 1984). The newest edition of the Library of Congress's Subject Headings, now in three enormous volumes, does attempt to incorporate headings in a format more like a traditional thesaurus, although it still has a great many problems with "natural" language references.

[19]

See Joan K. Marshall, On Equal Terms: A Thesaurus for Non-Sexist Indexing and Cataloging (New York: Neal-Schuman, 1977) and Mary Ellen S. Capek, A Women's Thesaurus: An Index of Language Used to Describe and Locate Information By and About Women (Harper and Row, 1987). For the vast literature on LCSH's problems, see Pauline Atherton Cochrane, Critical Views of LCSH—the Library of Congress Subject Headings: A Bibliographic and Bibliometric Essay; An Analysis of Vocabulary (Syracuse: ERIC Clearinghouse on Information Resources, Syracuse University, 1981).

[20]

For a further discussion of the Goldman Papers indexing system (based upon an earlier data entry program), see Ronald J. Zboray, "Microfilm Editions of Personal Papers and Microcomputers: Indexing the Emma Goldman Papers," International Journal of Micrographics and Video Technology 5 (1986):213-221. Editors wishing to draw upon thesauri other than LCSH should consult: Carol A. Mandel, Multiple Thesauri in Online Library Bibliographic Systems: A Report Prepared for Library of Congress Processing Services (Library of Congress, 1987).