Bringing Books* to the Web

The past couple of months in the web-development world have been spent building a foundation for a method of presenting digitized book-like things on the Smithsonian Libraries website. This has been an interesting time creating a home for the history, art, and culture part of our scanned collections.

As you probably know, we already have a home for most or all of our natural history digital content in the form the Biodiversity Heritage Library. But the history, art, and culture part of our scanned collections don’t yet have a home on the web. You can certainly go to the Internet Archive page for the Smithsonian Libraries collections, but that’s not a very user-friendly website and the search is lacking something. Hence, we need a place to present these scanned materials in a way that can be linked to now and forever.**

It sounds easy on the surface, just import some information and link to the item at the Internet Archive, wave a magic wand, speak the incantation and voila! Instant digital library! This is not quite the case, because we faced a number of challenges in creating a system.

First, we needed to wrangle the data to present it in a more web-friendly version which included translating and condensing MARC data into something simpler and more consumable by the average user. Then, because a full page of textual search results isn’t the most interesting thing to read, even when browsing a list of books, we had to devise a reliable method of extracting the cover images from the Internet Archive. (In reality, cover images are often quite boring, so we end up using the title page, which offers a surprising variety of images.) Finally, subject headings need to be handled in an elegant yet simple manner and continue to present us with some challenges in terms of organizing and structuring them for the web, which is the last piece of the puzzle in terms of data.

When it comes to actual implementation, we could have used a straight import (which in the past has been unreliable for large data sets), but since we are also incorporating the Internet Archive Book Reader and we desire a more custom layout to the page than we normally have on our site, we decided to collect all of this functionality into a Drupal module.

But wait! As it turns out, a Drupal module has an installation procedure, which now means that we can leverage that to import the 3000+ items that we want to display on the site. And as a bonus, when we upgrade a module, we have a database upgrade procedure and we can add or improve the data, too. A very elegant solution, I must say. (Does this sound familiar? It may, because we’re using a similar technique for the Taxonomic Literature II project which is still in the works to move it to Drupal. I gave a talk about it at the SLA Annual Meeting in July.)

Once all of the hard work is done, we install this module into the site. With a wave of our magic wand (or a click of the mouse) we create thousands of inter-linked records of data that provide a rich user interface to our digitized collections.

Are you wondering why we go through weeks and weeks of development for this one piece of the site? Well, a lot of it is foundational and we are creating things that didn’t exist before. But it also means that we have a method of wrangling the data such that the next time we import data into the site, it will be a matter of a few hours of work to import another few hundred or even thousands of other records. There may even come a day when we can automate this process so that a human isn’t even involved and new items appear on the site automatically.

Keep an eye on our website and on this blog. We expect to announce the launch of our new book module very soon.

* Books or book-like things

** Forever means committing to never change the URL to an item, or to promise that if the item does move, visitors will not get “lost”

One Comment

Leave a Reply Cancel reply