For my internship, I was tasked with developing a workflow for the ingest of image files into the Smithsonian’s relatively new Digital Asset Management System (DAMS). My guide is meant to serve as a model for future SIL old media and legacy data ingest projects. It describes the steps needed to bring previously scanned and photographed items together for retrieval, storage, and preservation. Long-term, the plan is for each Smithsonian digital image to be ‘backed up’ within the DAMS rather than on separate computers, hard drives, CDs, etc. Additionally, the DAMS will serve as a central repository for digital images, searchable across the institution – facilitating inter-departmental image discovery and use in research, exhibitions, projects, and enterprise.
In order to provide this service, each image in the DAMS must have associated metadata – data about the data object, in this case information about the image. The metadata can be searched within the DAMS, such that relevant images will be retrieved. The metadata already exists in various content management systems used at SIL – we just need to pull that information from the databases and into the DAMS.
To create the workflow for this process, I have been working with one of SIL’s many digital image collections: the Galaxy of Images, a high-use collection of digital photographs and scans.
[Side note: You may recall past intern Simon Underwood’s post on adding metadata to a subset of this particular collection. One of the best things about working at the Smithsonian is the knowledge that you are building off of the great work of those that came before you, and that your work, in turn, will provide the foundation for the next project!]
To date, the Galaxy of Images contains over 15,000 images. To more easily work with this many files, various sub-collections within the Galaxy of Images were identified and prioritized based on usage statistics and number of files. SIL’s Seed Catalog image collection was the first batch of images to go through testing.
Before uploading images into the DAMS, we want to ensure that metadata associated with each image is part of the image, rather than in a separate file (sometimes called a ‘side car’). In this way, the metadata associated with each image will exist within in the image file structure, rather than relying on an additional file or database. This means that no matter where the file is being used, information such as title, date, author, and copyright information is accessible to the user.
In order to ‘embed’ this metadata within the file, we must extract this information from the database, associate it with the correct files, export the metadata in a useable format, and run it through a program that can embed data in batches that correspond with a standard called IPTC. Once the metadata has been embedded, the images can be ingested into the DAMS without the need to re-type all of that corresponding information. The files are searchable within the DAMS, and their metadata will always travel with them no matter where they are downloaded or copied!
As my time at the Smithsonian comes to a close, I am pleased to leave behind a comprehensive document detailing the process for working with legacy data. This complements OCIO’s DAMS guidebooks by providing specific information for working with SIL’s digital image files and their associated metadata. Using this workflow, nearly 3,000 files from the Galaxy of Images now have embedded metadata and are in the DAMS waiting to be summoned – or discovered – by staff Smithsonian-wide!
Jacqueline Chapman is a Smithsonian Institution Libraries Professional Development Intern for the Digital Asset Management System Workflow Project. In December, Jacqueline will graduate from the School of Information and Library Science at the University of North Carolina – Chapel Hill with a Master of Science in Library Science (Concentration in Archives and Records Management), along with a Certificate in Nonprofit Leadership.