This section offers a general model for projecting the direct costs of a digization project by dividing tasks into five major categories:
Examples are given within each category of the kind of work that may be involved, depending on the nature of the source material and the context of the project. Because no two projects are alike, there can be no prescriptions that meet all circumstances.
|(1) Selecting and Preparing Materials for Digitization|
the physical media of the source material in light of your decision about
which strategic criteria most strongly support the project. All projects
must establish their own priorities, and each will differ, but consider
two examples that illustrate the need for thorough planning that examines
the source material in light of its condition, the ideal digitization process,
and the intended audience:
Both projects present a strong case for digitization, and both match strategic criteria for building digital collections, but each will pursue its own direction, requiring different degrees of skilled intervention to choose and add appropriate technical, structural, and descriptive metadata standards; different kinds of handling and conservation practices; and a different choice of digitization method and workflow.
Nonetheless, both projects can be planned by using the same fundemental questions as analytical tools:
|(2) Imaging and Encoding Requirements|
and encoding are radically different processes, and should be costed separately
on the worksheet, but can be considered together as alternative or complementary
options at early stages of many projects.
Some projects comprise a variety of source media (images, text, sound, video) and different approaches will be suitable for each type. SGML encoding of texts using the Text Encoding Initiative standard will ensure excellent structural and descriptive metadata for navigation and discovery. If the text already exists in a digital file it may be possible to add encoding, using a software tool, without great difficulty. If there is no digital file will it have to be created manually, or can we capture it reliably using an OCR process?
When all options have been considered, and you have chosen an appropriate digitization process for each type of source material, analyze the cost of your imaging choice(s) as follows:
When using this procedure, be careful to account for variations in cost within each type that are caused by differences in dimensions. When resolution is a constant (e.g. 600 dpi) the cost of scanning any particular document or image will vary according to its size. Frequently, a digital copy of an 8-by-10-inch, or even 12-by-17-inch source, can be made inexpensively on in-house equipment; when the material exceeds those dimensions it may have to go to a vendor, and the increase in cost will be dramatic, especially because the fees will include handling and shipping, and you may also have to devote staff time to custodial responsibilities.
After reckoning costs of digitizing the selected source material on a per item basis, consider the costs of post-digitization processes and calculate these in the last section of the worksheet. When a vendor is involved, you may need to separate these from the invoice, where items such as CD-ROMs or FTP transfer are included in the work, but are not truly a part of the actual digitization process.
All projects will require encoding in some sense because decisions about metadata will require an understanding of techniques for making collections visible and searchable on the Web or in specialized catalogs, or both. When considering practical details of how to digitize source material however, "encoding" essentially means SGML markup by the Text Encoding Initiative (TEI) standard. Some documents in your project may benefit from the application of TEI encoding. Whether or not this will be a difficult or costly process will depend on two key factors:
When a digital file already exists, or can be made at no great expense using Optical Character Recognition software, it may be worth consulting an LIS staff member on the use of Adobe Acrobat for document management. Documents can be indexed by this means, but not "marked-up" in the structural, analytical sense allowed by TEI.
Encoding costs can be estimated in the same tabular format as above:
|(3) Digitization Costs|
Once preliminary decisions about how to digitize the collection have been made it will be useful to total all costs into an overall projection that includes post-digitization processes such as checking data integrity or the custodial time required to protect, re-assemble, and store originals. If this projection seems to exceed your wildest expectation about support, this would be a good time to consider Plan B, the exit strategy that could produce a prototype to support a grant application, or the small test project designed to explore the validity of your idea.
|(4) Metadata Requirements|
Given the nature of the selected material and its intended audience, what would be the optimal metadata standard? A relatively small number of standards have proved their utility in libraries and archival collections and it may be possible to make an "ideal" choice that is driven by the nature of the material. However, the choice is seldom so easy because a variety of factors will probably conspire to make these issues debatable:
It is always a good idea to consult members of the relevant interest group or user community to determine what standards prevail in that context. Your decision may also be influenced by your perception of how your audience will need to discover and use this resource. Useful metadata standards require a tacit collaboration between the material and its users: in what context are they searching; what will they want to search, and does the proposed standard allow them to do it?
It is very likely that your project will be seen from multiple perspectives, and that each will require the application of a different standard. For example, one might make a digital collection of images from rare books. The books are already cataloged in MARC and it may be feasible to extract certain portions of the MARC records to occupy elements in the Dublin Core, which will be used to identify the images as a separate collection on the web. The two standards will not compete, but complement one another by providing different perspectives on the same objects. By the same principle, one might need to create an EAD finding aid so that a digitized oral history collection can be found both in the On-Line Archive of California and as a web site on a UCLA host.
|(5) Post-Digitization Processing and Other Local Costs|
It is easy to underestimate the amount of work required to get a collection up and running so that it performs exactly as intended. The following areas of work should be foreseen: