Routine processing of library books frequently means using shelves and other spaces as staging areas for incoming and in-process items. As gifts and purchased books are acquired, cataloged and labeled, librarians typically work on them in batches, sorting on to separate shelves those which have not yet been searched in the catalog or which represent additional copies for the collection or which require a certain level of cataloging, etc. As they move through the processing of getting them to the library and ultimately, the reader they are moved from place to place in the back-rooms of library work areas.
Managing digital content for the Smithsonian Digital Repository takes a similar set of steps, but instead of shelves or carts where in-process books are staged, incoming items are computer files which are temporarily stored in various folders and sub-folders and moved from one to the other after certain procedures are completed on each batch.
This presents a new set of challenges and a slightly altered workflow than librarians are used to. One of the big changes in working with digital texts is that it is not always clear what a file contains simply by looking at the filename; just as you can’t judge a book by its cover, you can’t judge a digital book by its filename. When we deal with books, most of us can tell generally what book we have by reading the spine or the cover (although complete and accurate bibliographic information still requires an examination of the title page or other cataloging data inside the item). But with computer files, we often have a large batch of items named similarly (for example: Am_zool_2011_pinniped_ecology.pdf). Filenames can contain descriptive information but for purposes of processing for digital library collections, it is necessary to open and view the file to be certain a matching publication record is identified.
An easy-to-overlook process in handling digital (versus physical) content is avoiding duplication of material. Backups are of course essential to working with digital material but because copying and replication are so easy to do, those who handle digital materials in libraries and archives have to make sure that they don’t create extra copies (and therefore extra work for themselves) when processing materials. For instance, when a certain batch of electronic reprints is checked in the Repository to ensure that they are not duplicated, the files are moved to a different folder for the subsequent workflow step. It is important to move them and not simply copy them since the latter will leave content in the original folder which will be searched a second time–unnecessarily. Moving books from shelf to shelf does not pose this risk, but inadvertently duplicating rather than moving files means that it becomes easy to duplicate effort as well.
The Smithsonian Libraries matches incoming digital reprints with their corresponding publications in the Smithsonian Research Online (SRO) database. During 2012, the SIL Discovery Services Division staff have typically matched 75-100 items per month which were then uploaded to the Repository. But they review a far greater number when you consider that some have to be removed from the normal workflow for various reasons. These include items where there is no corresponding publication in the SRO database, items which are poorly scanned or which are not easily identifiable from viewing the first page of the document. It is these “problems” that have to be moved to a separate folder for further work, just as physical books are moved to a separate shelf or book cart for further processing.
Processing digital information is a fairly new workflow for library staff and although it may actually mirror the management of printed books, it presents a set of unique circumstances that the SIL has (so far) successfully met. This is evident not only from personal observation of the process but also when considering that there have been over 16,000 digital articles added to the Repository in under 6 years.