This section offers a general model for projecting the direct costs of a digization project by dividing tasks into five major categories:

(1) Selecting & Preparing Materials for Digitization

(2) Imaging Requirements

(3) Digitization Costs

(4) Metadata Requirements

(5) Post-Digitization Processing and other Local Costs

Examples are given within each category of the kind of work that may be involved, depending on the nature of the source material and the context of the project. Because no two projects are alike, there can be no prescriptions that meet all circumstances.

 
(1) Selecting and Preparing Materials for Digitization
Review the physical media of the source material in light of your decision about which strategic criteria most strongly support the project. All projects must establish their own priorities, and each will differ, but consider two examples that illustrate the need for thorough planning that examines the source material in light of its condition, the ideal digitization process, and the intended audience:

(1) A collection of books printed on acidic paper is deteriorating. They are frequently needed by students, but their fragility makes it impossible to allow them to circulate. Although these are not especially rare items, it is highly probable that copies in other collections are in similar condition. There is a strong case for dismantling the books, making images of their pages on a desktop scanner, and providing page images from the master digital files using Acrobat Distiller to make pdf files.

(2) A uniquely comprehensive collection of 78 rpm recordings constitutes an archive that is valuable for the study of an ethnic culture. The collection has potential use for both historians and ethnomusicologists. The recordings are not in fragile condition because the material of which they are made is very stable, but each successive playing will cause them to deteriorate. They must therefore be digitized so that they can be used, while the originals should be placed in secure storage in order to guarantee their survival. The costs of digitizing the original are high, and the resulting files will be very large, but the master files will not need to be used frequently, and need not be stored on-line. The cost of storage will be relatively low, but decisions will have to made about what kind of audio file to offer the user, what kind of server software to use, and how to compress the original file so that it can be accessed easily through a database of metadata.

Both projects present a strong case for digitization, and both match strategic criteria for building digital collections, but each will pursue its own direction, requiring different degrees of skilled intervention to choose and add appropriate technical, structural, and descriptive metadata standards; different kinds of handling and conservation practices; and a different choice of digitization method and workflow.

Nonetheless, both projects can be planned by using the same fundemental questions as analytical tools:

Why are we proposing to digitize this body of material?

Do we have the right to digitize it, and if there is a copyright issue, can we resolve it?

How are we going to digitize it and how much will that process cost, given the need to use an appropriate technical standard?

What metadata standards should be used to ensure preservation and access, and how much work will be required to add values to the elements?

How much work is required in the post-digitization stage to produce the digital collection for its audience?

 
(2) Imaging and Encoding Requirements
Imaging and encoding are radically different processes, and should be costed separately on the worksheet, but can be considered together as alternative or complementary options at early stages of many projects.

In example (1) above, for instance, the sheer quantity of text that must be captured suggests that an incremental process of scanning images of pages is suitable because it is inexpensive and can be performed using temporary, semi-skilled workers. Encoding invidual texts is probably not an option, but a strategy for encoding the metadata may strengthen possibilties for access and discovery in the long term. The question of how this digital collection will be cataloged requires careful consideration of the use of metadata in various contexts. Should there be a finding aid, a MARC record, or both? What value might XML encoding add to the project?

Some projects comprise a variety of source media (images, text, sound, video) and different approaches will be suitable for each type. SGML encoding of texts using the Text Encoding Initiative standard will ensure excellent structural and descriptive metadata for navigation and discovery. If the text already exists in a digital file it may be possible to add encoding, using a software tool, without great difficulty. If there is no digital file will it have to be created manually, or can we capture it reliably using an OCR process?

Imaging

When all options have been considered, and you have chosen an appropriate digitization process for each type of source material, analyze the cost of your imaging choice(s) as follows:

Source Type
Quantity
Process
Standard
Cost per item

When using this procedure, be careful to account for variations in cost within each type that are caused by differences in dimensions. When resolution is a constant (e.g. 600 dpi) the cost of scanning any particular document or image will vary according to its size. Frequently, a digital copy of an 8-by-10-inch, or even 12-by-17-inch source, can be made inexpensively on in-house equipment; when the material exceeds those dimensions it may have to go to a vendor, and the increase in cost will be dramatic, especially because the fees will include handling and shipping, and you may also have to devote staff time to custodial responsibilities.

After reckoning costs of digitizing the selected source material on a per item basis, consider the costs of post-digitization processes and calculate these in the last section of the worksheet. When a vendor is involved, you may need to separate these from the invoice, where items such as CD-ROMs or FTP transfer are included in the work, but are not truly a part of the actual digitization process.

Encoding

All projects will require encoding in some sense because decisions about metadata will require an understanding of techniques for making collections visible and searchable on the Web or in specialized catalogs, or both. When considering practical details of how to digitize source material however, "encoding" essentially means SGML markup by the Text Encoding Initiative (TEI) standard. Some documents in your project may benefit from the application of TEI encoding. Whether or not this will be a difficult or costly process will depend on two key factors:

Do you already have a digital file, and how difficult will it be to make one?

What degree or intensity of encoding is required, given the needs of the audience?

When a digital file already exists, or can be made at no great expense using Optical Character Recognition software, it may be worth consulting an LIS staff member on the use of Adobe Acrobat for document management. Documents can be indexed by this means, but not "marked-up" in the structural, analytical sense allowed by TEI.

Encoding costs can be estimated in the same tabular format as above:

Document Type
# of documents
Process
(e.g. keying, OCR - editing)
Standard
Cost per item
 
(3) Digitization Costs

Once preliminary decisions about how to digitize the collection have been made it will be useful to total all costs into an overall projection that includes post-digitization processes such as checking data integrity or the custodial time required to protect, re-assemble, and store originals. If this projection seems to exceed your wildest expectation about support, this would be a good time to consider Plan B, the exit strategy that could produce a prototype to support a grant application, or the small test project designed to explore the validity of your idea.

 
(4) Metadata Requirements

Given the nature of the selected material and its intended audience, what would be the optimal metadata standard? A relatively small number of standards have proved their utility in libraries and archival collections and it may be possible to make an "ideal" choice that is driven by the nature of the material. However, the choice is seldom so easy because a variety of factors will probably conspire to make these issues debatable:

A substantial amount of collection-level metadata already exists and the work of modifying or adding to it is daunting.

No metadata exists, and limited resources require that the metadata standard will be as low as possible.

You have planned that the project will be visible within the CDL gateway, and a high standard of metadata is therefore required.

The existing metadata for the material is ideal for one segment of the audience, but will have to be modified for use in another.

It is always a good idea to consult members of the relevant interest group or user community to determine what standards prevail in that context. Your decision may also be influenced by your perception of how your audience will need to discover and use this resource. Useful metadata standards require a tacit collaboration between the material and its users: in what context are they searching; what will they want to search, and does the proposed standard allow them to do it?

It is very likely that your project will be seen from multiple perspectives, and that each will require the application of a different standard. For example, one might make a digital collection of images from rare books. The books are already cataloged in MARC and it may be feasible to extract certain portions of the MARC records to occupy elements in the Dublin Core, which will be used to identify the images as a separate collection on the web. The two standards will not compete, but complement one another by providing different perspectives on the same objects. By the same principle, one might need to create an EAD finding aid so that a digitized oral history collection can be found both in the On-Line Archive of California and as a web site on a UCLA host.

 
(5) Post-Digitization Processing and Other Local Costs

It is easy to underestimate the amount of work required to get a collection up and running so that it performs exactly as intended. The following areas of work should be foreseen:

Creating smaller, lower-resolution images from the master image files to serve as user displays. This work will be done by LIS staff after discussion with the project manager about the project specifications and requirements.

Checking data integrity. This work is usually done twice: once before the data is delivered to LIS, and once more to discover why some things don't work as they should.

Re-scanning or re-photographing to rectify problems that arose in production.

Editing and adding to data to improve the quality of the project after testing.