Creating Professional Archival Quality Digital Image Collections

The following information my be of use for those who are thinking of archiving a large number of photographic images.
It should give you an idea of what measures professional archivists take to save important photos etc. 


This document addresses the standards for archival quality, digital image collections for the California Digital Library. In conjunction with the companion document, California Digital Library Digital Object Collection Standards (July 9, 2001), these standards describe image quality considerations, file formats, storage and access standards for digital images created by or incorporated into the CDL as part of its permanent collections.

They attempt to balance adherence to industry standards, reproduction quality, access, potential longevity and cost. 

Adherence to these standards is required for all CDL contributors and may also serve University of California staff as guidelines for digital image creation and presentation.

At a minimum, digital image collections incorporated into the CDL must have the following components.

A set of archival quality digital image files as defined in this document.
The required metadata defined in this document and encoded in an XML DTD.

These standards are not intended to address all of the administrative and technical issues surrounding the
creation of digital image collections and they do not describe operational procedures for digitization. “Best
Practices for Image Capture,” a companion to these standards, describes best practices for libraries, archives, and museums; it does not define standards, but instead provides an overview of the issues that need to be addressed when initiating a digital imaging project.

Definition of a Digital Image

A digital image is defined for the purposes of this document as a raster based, 2-dimensional, rectangular array of static data elements called pixels, intended for display on a computer monitor or for transformation into another format, such as a printed page.

Digital Masters

Digital master files are created as the direct result of image capture. The digital master should represent as
accurately as possible the visual information in the original object. Digital images should be created through the direct scanning or imaging of the original object. However, if the original object can not be digitized directly due to its size or other attributes, it may be necessary to use a photographic intermediary.

Care should be taken that the photographic intermediary is well documented and represents the original object as accurately as possible.

The primary function of digital master files are to serve as a long-term archival record and as a source for
derivative files. A digital master file may serve as a surrogate for the original, may completely replace originals or may be used as security against possible loss of originals due to disaster, theft and/or deterioration. Derivative files are created from digital master images for editing or enhancement, conversion of the master to different formats, and presentation and transmission over networks.

Long term preservation of digital master files requires a strategy of identification, storage, and migration to new media, as well as policies about image use and access to them. It is essential that master files remain unaltered over time. Lossy compression techniques, such as GIF and JPEG, should not be applied to master files and migration procedures should include quality control procedures to ensure that the integrity of the files is maintained throughout the entire process.

The specifications for derivative files used for image presentation may change over time; digital masters can
serve an archival purpose, and can be processed by different presentation methods to create necessary derivative files without the expense of digitizing the original object again. Because the process of image capture is so labor intensive, the goal should be to create a master that has a useful life of at least fifty years. Therefore, collection managers should anticipate a wide variety of future uses, and capture at a quality high enough to satisfy these uses. In general, decisions about image capture should error towards the highest quality.

Digital Image Storage Formats

Digital image collections intended for long term storage and presentation should store from three to four images for each original item: an archival image, derivatives for viewing and a thumbnail for browsing. The master or archival image should capture as much information as possible to preserve the investment in the capture process. Masters should use color rather than grayscale when color is an integral part of the information conveyed by the original object, and any compression applied to the file should be lossless.

Viewing files can be created at any time from the archival image and should be created to provide reasonable access by standard viewers. It is recommended that at least two viewing files be created, a preview or thumbnail file for the fastest access during initial search and retrieval process and a service or reference image for more detailed viewing.

The following image formats are supported by the CDL, some for archival storage and some for presentation
purposes. See the following table for the appropriate use of each format.

TIFF ITU-T.6 – A 24-bit storage format commonly used by Adobe PhotoShop and other bitmap
editors, this TIFF format may be used to store color images. This format is also suited for bitonal text
documents; it provides a high level of detail (up to 600 dpi, or 4,800 x 6,600 pixels for a letter-sized
page). TIFF ITU-T.6 format should be used for archival files. With lossless compression, the data that
results from compressing and then uncompressing the image is exactly the same as the original,
uncompressed file. CCITT Group 4 compression is lossless and, therefore, permissible for masters and

While compression is allowed for archival files, it is discouraged, as it adds complexity to the format
migration issues of long-term preservation. When compression is used, it must be lossless and not

JPEG – A 24-bit, lossy (some information lost) compression format which is well-suited for screen and
print presentation. JPEG is supported by all major computer platforms and by Internet web browsers.
With lossy compression, the picture quality of the compressed file is reduced when compared to the
original file, and can not be restored, except by going back to the original. The advantage is that the file
sizes are much smaller, and image quality is acceptable in most cases. It is not acceptable as an archival
file format.

JFIF JPEG File Interchange Format – A specific implementation of the JPEG standard, commonly
used by bitmap editing programs, viewers, and Web browsers.

GIF – An 8-bit, lossless compression format which is well-suited for low resolution screen display of
images. GIF is often used for image thumbnails and screen versions of text documents, and is supported
by all major computer platforms and Internet web browsers.

PNG – The Portable Network Graphic format is expected to provide a higher-quality replacement for the
GIF, particularly for images delivered to World Wide Web browsers. 

PDF – Adobe Acrobat Portable Document Format provides a convenient way to view and print images
at high resolution, and may also be used to group several files into chapters and books. PDF can provide
additional navigational tools such as hyperlinks among pages within a document, and from one PDF
document to another. Although this is a proprietary Adobe format, both the file specifications and the
viewer software are freely distributed. Plug-ins are available for major web browser to enable them to
view PDF files without launching an external viewer.

Specific Minimum Resolution and File Formats

The intent of the following table is to offer guidelines for scanning various types and sizes of original
documents, so that the digital master files as captured will record all of the significant visual features in the
original item. Capture resolutions in the table are based upon the assumption that a scanning resolution of 600 ppi will be sufficient to meet this requirement for most originals in most collections, apart from negatives and transparencies. Digital master files which fail to capture some of the visual information present in the original will presumably become obsolete as image capture techniques improve over time.

The reflective formats, such as photographic prints and illustrations, are based on 8.5″ x 11″ originals scanned at 600ppi. The 35mm format has a resolution standard of 4200 pixels in the longest dimension, as this is about as much data that most 35mm films can capture. Scanning the 35mm format, which is 1.5″ on the longest side, at 2800ppi will result in compliance with the 4200 pixel standard. Note, if you plan to create a film intermediary of the object and then digitize the 35mm intermediate, remember to consider the size of the original. Filming an original larger that 5″ x 7″ with the 35mm process will not capture all the original’s detail. For example, 4200 pixels spread along the 7″ inch side yields 600ppi (4200 pixels / 7″). If the original was 12″ long, the image would be only 350ppi (4200 pixels / 12″), which is not archival quality.

Other transmissive formats, such as negatives and slides have a standard of 6000 pixels on the longest side, based on a 8.5″ x 11″ original, which yields an image just under 600ppi image (6000 pixels / 11″). Note again, if you plan to create a film intermediary and then digitize, you must consider the size of the original. For example, creating a 4″ x 5″ negative at 1200dpi from a 10″ long original original yields the 6000ppi standard, or a 600dpi image (6000 pixels / 10″). Creating an image 6000 pixels on the longest side for a 12″ long original would be digitizing at 500ppi and therefore, would lose detail from the original image.

Oversize originals such as posters and maps can be especially difficult to scan at the recommended resolution of 600 ppi. Few libraries own flatbed scanners capable of scanning originals larger then 11″ x 17,” and even if they do, the problems of handling image files larger than about 120MB are daunting. These problems may lead to the use of a lower standard of capture resolution, such as 300 ppi or 3000 pixels in the longest dimension (the “alternative minimum”), with the understanding that the useful life of the files may be limited and digital image capture for these objects will need to be repeated in the future.

Specific Resolutions and File Formats

Note: These guidelines are an attempt to guide one through general digitization projects. In all cases, the
over-riding factor should be the need to capture all relevant elements that users will need.

E.g., when a 35mm negative is shot of a 4×5 photograph (i.e., the original object), the resulting negative should be scanned at 2000 PPI. Since the 35mm negative is 1-1/2″ long, this will provide 3000 pixels along the 1.5″ (longest) side of the negative.

Therefore, when an image of the slide is created at its original size, it will be 600 PPI. That is, we will have 3000 pixels spread over the 5″ of the new image (3000/5″ = 600 PPI). 

The formula to determine the correct scanning PPI is: 

Longest Side of the Original * 600 PPI
        Longest side of the negative


California Digital Library
Digital Image Format Standards