:: :: University of Virginia Library

The rationale for using AACR2 depends upon the ultimate ability to put records on line in a computerized data base. The MARC AMC format provides the bridge between AACR2 (conceived of as a manual format), the archivist's or editing project's own computer data base, and nationwide networks of data bases.

The acronym "MARC AMC" stands for Machine Readable Cataloging, Archival and Manuscripts Control format. The AMC format represents only one of many formats that grew out of the Library of Congress's attempt to automate its substantial holdings. When the Library of Congress published its first AMC format in 1973, archivists complained that it worked best only for item-by-item description and, in general, refused to adopt the format. Archivists experimented with several other systems during the 1970's, but their implementations differed so much to make it impossible to create a network for sharing data. To address this problem, the Society of American Archivists (SAA) brought together in 1977 a National Information Systems Task Force that recommended a greatly modified form of MARC AMC be put forth as the standard for the archival community. In order to insure against only librarians determining the standard, the SAA's Committee on Archival Information and Exchange (CAIE) would help maintain the standard with the American Library Association's Committee on the Representation in Machine-Readable Form of Bibliographic Information.

Editors and archivists generally need several works in order to understand and use the MARC AMC format. The Library of Congress's own MARC Formats for Bibliographic Data Update 10 (Library of Congress, 1984) formally presents the AMC format although the manual contains several errors, corrected in Update 11 (1985). A more useful exposition of the format can be found in the Nancy Sahli's MARC for Archives and Manuscripts: The AMC Format (Chicago: Society of American Archivists, 1985). Either work cannot be truly comprehended without consulting the examples given by Max J. Evans and Lisa Weber in MARC For Archives and Manuscript: A Compendium of Practice (The State Historical Society of Wisconsin, 1985). Other works which should be consulted include Steven L. Hensen's Archives, Personal Papers and Manuscripts (Library of Congress, 1983), the Research Libraries Group's AMC Field Guide (Stanford, Cal.: Research Libraries Group, 1983), and Walt Crawford's MARC for Library Use: Understanding the USMARC Formats (White Plains, N.Y. and London: Knowledge Industry Publications, 1984).[8]

That so many works are necessary for using the format suggests its complexity. The basic unit of the format is, as anyone familiar with data base management programs will recognize, the "field." Computers recognize fields as containers for specific types of descriptive information (e.g., title, author,

or date). Fields can be "fixed," meaning that they will accept a pre-determined number of characters, or they can be "variable." In MARC AMC, fields can also be divided into subfields, which are indicated within the field by the use of a dollar sign '$' and a lower case alphabetical character whose meaning changes depending upon the field. These characters are called "subfield delimiters." MARC prescribes that each field have a descriptive title and a numbered "tag" to identify the field to the computer. All fields taken together for each document described constitute a "record."

Each record in MARC AMC has 77 variable data fields which in total provide the most in-depth description that archivists think they will need for years to come. Whereas AACR2 is best conceived of as a form for generating library index cards, MARC AMC has been developed with an eye to eventual computer manipulation of data. A good deal of duplication of data results from this goal. For example, the date and place of publication of a document must be recorded in separate fixed fields and together within the variable-length title field. The need for data manipulation and searching also means that throughout the format extensive codes are used to represent often complex descriptions. In the case of the physical description fixed field (Tag 007), "hdrbfbo16baca" translates into:

h = microform
d = microfilm reel
r = reproduction
b = negative
f = 35 mm. microfilm
b = normal reduction
016 = 16:1 reduction ratio
b = monochrome
a = silver halide
c = service copy
a = safety base

Needless to say, the MARC AMC format confronts first-time users with a seemingly insurmountable maze of codes, fields, and subfields. The archival profession itself has generally trembled with trepidation at adopting the format since its release, and the ability to deal with MARC AMC has become a much-coveted skill for job applicants in the field.[9] Many major research libraries have effected only a very minimal interpretation of the format, due to the considerable cost and skill necessary to create full-blown MARC records.

Editors attempting to implement MARC AMC on microcomputers face additional problems. The MARC records with their 77 fields are lengthy and editors catalog on a document-by-document basis, meaning that full implementation of the format requires a great, great deal of hard disk storage. Versions of DOS before 4.0, the most common operating systems for microcomputers, cannot address a single data base file stored on a disk greater than 32 megabytes. Most projects would probably need to jump over the 32 megabyte limit by using DOS 4.0 or some other specialized software, if they intend to adopt MARC.

Software for getting information stored into a MARC format also will present problems for editors. Most libraries committed to MARC use the programs offered by bibliographic utilities such as OCLC and RLIN for creating MARC records. Commercial data base packages can be adapted to

the needs of MARC AMC only with great difficulty and considerable programming. The biggest stumbling blocks of these programs grow out of MARC's need for numerous variable-length fields. Most of the major, commercial data management programs for microcomputers, with the exception of Revelation, operate on a fixed-field length basis, meaning that the vast majority of disk space in any implementation of MARC will be wasted by unused portions of fixed fields.[10] A new program tailored specifically by archivists at Michigan State University for MARC AMC users has recently come on the market and promises to address some of these problems. Editors may also look into "turnkey" systems—combining hardware, software, and instructional and service support—such as those of Geac and OCLC's LS2000, although these systems may prove to be too expensive for the meager budgets of most projects.[11]

Once again, the question must be asked concerning MARC AMC, as it was for AACR2: is it worth it for editors? Until an expert-type software (i.e., one in which the formatting rules are embedded within the program) becomes available, probably not. The format in its present form is simply too difficult and costly to implement and maintain, especially on a document-by-document basis. Editors with enough funds could hire a full-time cataloger with MARC AMC experience to oversee data entry. Nevertheless, the sheer volume of information each MARC AMC record contains dramatically increases the chances for errors and the time needed for proofreading. And quality control becomes hard to attain unless every processor is well versed in the rules of the format. In short, MARC AMC threatens to distract editors from their primary work, which is, after all, producing an edition, not creating a data base.

One of the advantages of electronic information storage, however, is that data can later be manipulated into different formats. At some future date when users may be able to access information about manuscripts through an on-line, national data base, documentary editors may be able to transfer their data bases to the MARC AMC format, if they take the proper steps now to insure compatibility.

The issue of compatibility should be addressed at the conceptual stage of data base design. Those who have designed data bases will immediately realize that MARC AMC, like AACR2, represents only a goal of information output and may not be the best way to store and retrieve information locally. For example, a local data base may need only one date field to feed, via programming, into the three or more fields in MARC AMC that may require date information. Particularly, the AACR2 title field must be broken down into component parts or else, for correspondence, programs will index all letters under the initial part of the field (i.e., "letter to" instead of the recipient's name). Even if the author's name is removed to a separate field, AACR2 demands a first-then-last name order and MARC AMC and most data base management programs, vice versa.

Editors should not think that manipulating information electronically will easily solve the problems of transferring data between different formats.

The previous case, for example, of changing the order of first and last names, requires a herculean amount of programming effort to account for all the idiosyncrasies of usage. At first glance, such manipulation would seem to depend merely upon the location of the comma in an inverted name form. But what about abbreviations, preceded by a comma, such as "Ltd."? Or names of business firms like the publishers "Little, Brown"? The cases multiply to the point where a program has to take into account literally hundreds of rules and thousands of exceptions. Some projects even enter the author or recipient's name in inverted order with the last name in all capital letters, or perhaps worse, titles in all capitals. The distribution of lower and case letters in a name or a title is something that, at this point, only human intelligence can do largely because humans decide, in not the most logical fashion, what their names and the title of works will be. For example, how would a program know to capitalize the first "V" in "W. S. Van Valkenburgh" and not the same letter in "Henry van Dyke"? In short, editors should study AACR2 and the MARC AMC format very closely to make sure that their data bases will be able to output information in either form.

The Goldman Papers' most recent revision of its data entry program, called with heavenward hopes "Ultimate Emma", has the ability to address both formats. Document processors enter data directly at one of six microcomputer work stations networked together and sharing distributed hard disk resources. Information such as date, place of publication or writing, destination (if correspondence), recipient or title are all entered in separate fields via a menu driven system that records such information in a form which allows for electronic manipulation and indexing. After this, the program assembles the various fields into a large, AACR2-compatible title field. The document processors are then given the option to enter editorial attributions (called in AACR2 parlance "interpolations") such as brackets and question marks—the things which, if they appeared in fields to be indexed, would destroy any semblance of order. For example, a bracketed last name beginning with "Z" would come before one beginning with "A". Through the design of the data base and through the manipulations of the computer program, the Emma Goldman Papers has thus shown that it is possible for editorial projects to enter and retrieve its data efficiently and in a form which can easily go on-line in a national MARC AMC-based data base.