Previously, we have shared one stream of Smithsonian Libraries and Archives’ linked data experiments, Wikidata, which is based on a Wiki platform. In this post, we share another linked data experiment that focuses on the traditional bibliographic data (the library catalog) transformation with the Shared Virtual Discovery Environment (SVDE) BIBFRAME project.
Linked open data is often considered synonymous with the semantic web, where structured data interconnects with structured data web query tools (such as SPARQL). Then, the data or information can be represented in different visualizations —data graphs, charts, bars, maps, timelines, etc. Cultural heritage organizations such as the Smithsonian Institution, and specifically the Smithsonian Libraries and Archives, have been working diligently to adapt their internal data to meet In an effort to reap the benefits of linked open data, we’ve moved towards complete conversion of our existing library catalog data to this format, and come away with an exciting initial result.
The international library community has been setting policies and encouraging best practices to accommodate bibliographic description, moving away from self-contained “document style” records, to providing descriptive data in which statements about a resource (an object, a thing) are assembled more flexibly. How data gets assembled and reassembled will be based on user preferences and needs. New connections can easily be made among resources. Author A published Title A. Title A has Illustrator A. Illustrator A published Title B. etc., etc.
The international Shared Virtual Discovery Environment (SVDE) project gave the Libraries and Archives the opportunity to model our library catalog records to realize these goals. The entity-relationship description, and the hallmark of this data model, is data accompanied with URIs (an identifier following a HTTP prefix). In this post, we will share the journey of how the SVDE platform aligned Smithsonian Libraries and Archives catalog records in compliance with structured data standards, allowing them to be exposed directly onto the open web.
Enriching Horizon Bibliographic Data
The Libraries and Archives’ current bibliographic data hosting service is Horizon, an older integrated library system no longer supported by research and development. Its system infrastructures are barely keeping in step with the library’s basic operations. Its ability to handle innovative approaches to data representation and visualization are no longer adequate. Because of this, our library data lacks the ability to connect natively with external sources. Library staff have been investigating its replacement for several years.
Realizing that the academic, research, and national library community are moving toward a linked open data environment for library data, a group of Libraries and Archives staff embarked on a series of linked data modeling experiments. Encouraged by progress from other libraries, the Smithsonian team began projects to enhance and transform the library’s catalog of bibliographic data to the BIBFRAME model beginning in 2019.
The task at hand was to expand names (corporate bodies, persons, families) and LCSH (Library of Congress Subject Headings) to a form that facilitates the use of identifiers, in addition to publishing library data on the web. Then, incorporating FAST Headings (Faceted Application of Subject Terminology) to enable grouping of topics, chronology, geographic names, etc. These preparatory activities paved the way for the library’s linked open data journey. Months into the project, a small number of catalog records received minimal enhanced treatment, approximately ten thousand titles over a two-year period. With the support of an internal fund, the library’s bibliographic linked open data project was scaled up to nearly 2 million records. Library data were moved out of siloes and exposed to the open web with help from the SVDE initiative. This allowed us to leverage the technology and expert knowledge of the well-established SVDE team along with scores of member libraries’ collections and structured data expertise to move the Smithsonian Libraries and Archives toward interoperability. This helps us achieve the strategic goals set forth by Secretary Lonnie Bunch.
Share-VDE (SVDE) Initiative
In 2016, the research and development team of Casalini Libri, an international bibliographic and authority data provider, the @Cult, a provider for library systems, Stanford University Library, and a group of academic, research and national libraries from North America and Europe worked to realize a linked open data vision. The resulting SVDE is a library-driven initiative to create “the cleanest, most reconciled pool of collective data shared across collections” and to shape the future of bibliographic linked open data developments.
Smithsonian Libraries and Archives’ bibliographic and authority records were “dropped off” for SVDE teams to transform to a linked open data format using the standard BIBFRAME 2.0. This included extracting heading strings to resolve identifiers in a web format (prefix with HTTP), then conducting live queries via SVDE API tools on open access resources. This journey through the 24 processes for library bibliographic data successfully moved document-type records to entity-relationship data structure. This achieves the promises of the SVDE initiative to increase library collections discoverability by enriching library data with URIs, allowing wider and more direct interactions with linked data in the SVDE cluster knowledge database (known as Sapientia), and to keep pace with semantic web applications and developments.
The previous improvements to library bibliographic data done by Smithsonian Libraries and Archives staff were greatly enhanced by the SVDE project. Over a two-year period, staff were only able to add linked open data processes to a little over ten thousand bibliographic records. The SVDE team completed close to 2 million bibliographic records in a short period of time. In addition, our library data is now in the SVDE Entity Discovery Portal and Linked Data Management System
So, what do we mean when we say transitioning document-type catalog records to entity-relationship structured data? Let’s look at an example, the celebrated work, North American wild flowers, by Mary Vaux Walcott, wife of our fourth Secretary, Charles Doolittle Walcott.
Descriptive statements of entity relationships parse each field to separate components. When a name (for a person, geographic, topics, etc.) URI is available, SVDE will link the text string to its corresponding URI, as seen in the table below.
The SVE Entity Discovery Portal interface (UI) offers users a clean front-end design for simple and advanced search options with seamless viewing of the entity pages embedded links, including data from external services (such as images from Wikimedia Commons, description from Wikidata, articles from Wikipedia, and others). Pages include language preference for navigation.
When comparing the search “Mary Vaux Walcott”: the library’s current catalog presents a document-style description that lacks interactivity to external resources. While in the SVDE Entity Discovery Portal, you will notice the entity-relationship based data model embedded external information from Wiki services. Wikidata description and Wikipedia article describing author, Mary Vaux Walcott, comprised as part of the search output seamlessly. Additional links to external sources, like identification of an entity, contributor, publisher, etc., are clearly represented.
The related subject links for the book North American wild flowers cover diverse types, including genre/form of the publication. Opening up the Library of Congress Subject Headings creates greater opportunity for users to connect to other similar types of material or topics.
To continue data quality and assurance, SVDE is designing an entity editor for library staff to manage and curate information. SVDE’s back-end technology prepared the Smithsonian Libraries and Archives data for indexing, clustering, searching and representation. Their work helps data providers like the Smithsonian reap the benefits of linked data and connect library collections on the web in a user-friendly and informative manner.
In the next few months, Libraries and Archives expects to receive the enhanced bibliographic records with URIs in our traditional MARC format, as well as the transformed format in BIBFRAME RDF. We will have opportunities to further investigate how the new data formats can help realize the potential of descriptive data for visualization, in fulfilling our goals, and in connecting library collections with users and data consumers. We are edging forward excitedly in the semantic world for publishing and research.
- Linked open data, what is it?
- Library of Congress BIBFRAME
- Share VDE (Virtual Discovery Environment) Initiative Wiki page
- Share-VDE Presentation at the Library of Congress’ Digital Future and You (2018):