Smithsonian Libraries and Archives & Wikidata: Using Linked Open Data to Connect Smithsonian Information

This post is part of our Smithsonian Libraries and Archives & Wikidata series.

Libraries have created and curated metadata that describes their collections for a very long time. It is the very essence of the cataloging and metadata profession. This past year, because of the pandemic, the Smithsonian Libraries and Archives initiated a unit-wide pilot project to explore if and how a linked open data platform offered by Wikimedia Foundation could reconceptualize how authority control could be transitioned to identity management.

Propelled by the basic principles prescribed by Tim Berners-Lee, library staff laid the groundwork to transition from a text-centric to a data-centric orientation in 2019. This involves changing bibliographic description to structured data, based on a linked open data standard and preparing the Libraries and Archives’ MARC data, the current standard used for machine-readable cataloging records, for transformation to RDF triples. RDF, or Resource Description Framework, uses URIs (Uniform Resource Identifiers) for objects and property in a structured way. This allows for the creation of rich networks of meaningful data and takes us from the flat world of the textual into a new world of possibilities with linked data.

When news surfaced about the wikifying of the German National Library’s (DNB) GND and the French National Library’s (BnF) FNE authority data, we began investigating Wiki projects as another option for a library linked open data project for name authority data. DNB and BnF have both moved their authority workflow out of their respective integrated library systems and into an open system, by means of a Wiki platform named Wikibase, a powerful MediaWiki software extension. The DNB and BnF Wikibase models performed as a potentially open and global knowledge repository similar to Wikidata. It seemed like their process was replicable for the Smithsonian environment.

Could our library authority data in MARC 21, an early 20^th century standard, transition to an open platform that could stand the test of time, such as Wikidata? Our authority data in Horizon (our integrated library system) is well-curated and maintained. However, many of the obstacles to name authority creation for Smithsonian persons in the MARC 21 environment still hinge on the system infrastructure and authority training requirements from the Library of Congress. In addition, authority data is siloed in Horizon and not easily shared, even within the Institution itself.

Many of the authorized names in Horizon represent entities present in collections maintained by other Smithsonian units, namely the databases of the Institution’s various archives, museums, and galleries. Each of these units manages their own name datasets for their carefully curated collections. Each has its own conventions on how names are constructed, based on the standards in the respective communities that the datasets serve. At present, there is no database at the Smithsonian that serves as a central database for CPF (corporate bodies, persons, and families) agents. This situation increases the difficulty for data reconciliation without human intervention. Too often, when human assistance gets involved, inefficiency sets in, and the quantity of work overwhelms the quality of the database.

Screenshot of a data graph showing network of organizations that are part of the Smithsonian or its constituent parts, from Wikidata.

What if library staff could facilitate the reconciliation of names in all of these Smithsonian databases? With the support of the Discovery Services and Libraries and Archives’ leadership, the telework environment gave library staff an opportunity to embark on an open data project to test this assumption. The project would be similar to those of the DNB and BnF, through the development of a central Wikibase in which units can retain their own preferred name forms tied to a single record for each entity. The Smithsonian currently has no central hosting system for agents/names, which makes connecting names with collections across units so much harder. Though, the Smithsonian has a discovery system for collection metadata. The model presented by Wikidata, which boasts a great number of volunteers (over 12K registered editors worldwide) and an extremely active API developer community generating numerous powerful applications, might serve as the ideal platform to experiment with new approaches in descriptive content for SI collections.

Tune into the second part of this series where we’ll share an overview of the Wikidata projects with the PCC Wikidata Pilot that the Libraries Wikidata team worked on.

Further Reading:

Barbara Fischer. Authority Control meets Wikibase (2019) https://wiki.dnb.de/display/GND/Authority+Control+meets+Wikibase.
French National Entities file (FNE): project overview (2019) https://www.transition-bibliographique.fr/fne/french-national-entities-file/
Wikibase site showcases variety of implementations. https://wikiba.se/showcase/

Libraries Wikidata Team presentations:

Discovery Services Forum on June 23: http://bit.ly/DISC2021Forum0623
LD4 20201 Conference: https://sched.co/joA8
Smithsonian Collections Information Management Committee: http://bit.ly/CIMC202109LibWD

One Comment

Ayman Osman

Dear All,

My name is Ayman and I worked for the Library of Congress for 21 years, and participated and still in the BIBFRAME project. I would like to help with applying linked data at Smithsonian. If available, please let me know and I can send you my resume as well. Thanks

Ayman

March 29, 2022 Reply

One Comment

Leave a Reply Cancel reply