Over the past two years, Smithsonian Libraries and Archives has embarked on a linked data journey along with many other libraries in the Program for Cooperative Cataloging (PCC) Wikidata pilot project. From October 2020 to August 2021, the Libraries Wikidata team experimented with creating and maintaining name authority in a completely new way, including plans to install a decentralized Wikidata instance (Wikibase) that would meet the Smithsonian policies and best practices. This is the second part in the Wikidata blog post series, be sure to read our previous post for additional information.
Smithsonian Libraries and Archives’ Wiki initiation commenced with a Wikidata workshop held in November 2019 with Andrew Lih, currently the Wikimedian-In-Residence for the American Women’s History Initiative. The outgrowth of the workshop was a name reconciling project using carefully curated name data. During the pandemic, this process was expanded to include additional staff and two name datasets: 1) the Art and Artist Files database and 2) a portion of Smithsonian American Art Museum’s artist names from its database.
For the last two years, the Libraries’ Wiki project has mainly focused on Wikidata and Wikibase, and briefly experimented with Wikimedia Commons for images as part of the Smithsonian’s PCC Wikidata Pilot Projects (Oct 2020- Sep 2021).
Wikidata, launched in 2012, is a global and open knowledge repository of structured data that serves as a hub for linking resources. This linked open data information cloud attracts and integrates authority data from many libraries. Wikidata quickly became the authority knowledgebase of choice in libraries and commercial institutions for names for people, places, etc. Its structured data gives many developers a way to create tools to query and present findings on trending topics, such as the resources which impact or are impacted by the pandemic, COVID-19 (http://coviwd.org)
Wikidata has become a high-demand library authority identifier clearing house. The PCC Policy Committee recognized the platform could play a role in its effort to transform authority control into identity management. In September 2020, called for a pilot among the PCC member libraries. The Smithsonian Libraries and Archives assembled a team to participate in October 2020 as telework projects during the pandemic.
The Wikidata team prioritized the following goals in order to create cohesive processes for names (identity) management for the Smithsonian’s collections,
- The creation and curation of names for CPF (corporate bodies, persons and families), collections, and publications for the Institution.
- Adopting replicable workflows to SI units that would work beyond the Libraries and Archives’ cataloging or metadata professionals.
- Increasing professional curiosity toward descriptive data and what it could offer to users as a service.
- Transitioning to a localized deployment of a Wikidata model (in Wikibase) that meets Smithsonian policy and best practices guidelines.
- Encouraging ingenious API tools development to feature Smithsonian collections.
- Forming collaborative efforts with colleagues within and beyond the SI walls.
The five projects from the Libraries and Archives’ Wikidata team for the PCC Wikidata Pilot Project are as follows:
1) African ethnic groups
Reconcile, edit and/or add African ethnic group names (ca. 250) currently used in local subject headings by the Warren M. Robbins Library of the National Museum of African Art.
2) Artist files
Reconcile, edit and/or add the artists descriptive data matched in two SI artists databases (the Libraries and Archives’ Art and Artists Files and Smithsonian American Art Museum’s artists databases), which amounted to 3797 artists.
3) Chinese ancestors portraits (primarily royal family members of the Qing Dynasty of Manchu ethnic group)
Review and augment for accuracy Wikidata statements for 90 names matched to the Freer and Sackler Galleries collection of Chinese Ancestors portraits.
4) Dibner scientists portraits
Reconcile, edit and/or add the scientists and artists featured in the Dibner Library of the History of Science and Technology’s collection of portraits included on the Scientific Identity website.
5) Smithsonian researchers and their publications from the Smithsonian Research Online (SRO)
Reconcile, enhance and/or add names for the notable curatorial and research staff from the Smithsonian Profiles website and review the representation of Smithsonian in Wikidata.
Each project identified the specific aspects, focus, and workflows documented on the Smithsonian PCC Wikidata Pilot Page. The project team met weekly; subgroups met bi-weekly on a regular basis or as needed. A team conducted a few samplers to showcase contributions to the overall Smithsonian collections in the Wikidata landscape. These queries were gathered and put together as a dashboard highlighting various characteristics of each individual project in table forms, maps, and graphs.
Wikibase, an extension of MediaWiki, is the software that powers Wikidata. It offers a suite of open source software for creating a collaborative knowledge base. It also allows for localized configuration and an option to federate with other Wikibase installations and Wikidata at-large.
Several institutions have established substantive workflows for deploying a decentralized Wikidata utilizing Wikibase software. Examples include Rhizome’s Artbase, the Digital archive of artists’ publishing (DAAP), the Enslaved.org, Linked Jazz, DNB’s GND, FactGrid, Luxembourg’s Shared Authority File, and Europeana Eagle, etc.
Members of the Libraries and Archives’ team were exposed to the richness and potential of structured data describing the collections that they are passionate about. In addition, participating staff gained new technological skills and new approaches to information organization in a linked and open repository like Wikidata. And they are excited about the potential for deployment of a local Wikibase instance that enables us to better address Smithsonian internal policies and formulate best practices and guidelines for our workflow.
At the writing of this post, the Smithsonian Libraries and Archives has been piloting a Wikibase instance, investigating limited functionalities. To date, the team has created over 200 properties and close to 1590 items, currently only accessible to Smithsonian staff.
The substantial work of the Libraries and Archives’ Wikidata team has been recognized by colleagues around the world. Our team members are part of a larger community helping to shape developments and features of Wikidata and Wikibase. For instance, Smithsonian Libraries and Archives was one of the first organizations that the WikiLibrary Manifesto, spearheaded by the German National Library, invited to sign the initiative. Throughout Fiscal Year 2021, the Wikidata team conducted several presentations illustrating the potential of Wikidata as a viable tool for our collections.
In an effort to further unveil our collections through digital solutions, the Wikidata team is devising future plans to continue wikifying entities for the collections in a localized Wikibase to accommodate Smithsonian policy and best practices, without sacrificing discovery and reuse of Smithsonian data!
- Barbara Fischer. Authority Control meets Wikibase (2019): https://wiki.dnb.de/display/GND/Authority+Control+meets+Wikibase.
- French National Entities file (FNE): project overview (2019): https://www.transition-bibliographique.fr/fne/french-national-entities-file/
- Wikibase site showcases variety of implementations: https://wikiba.se/showcase/
Smithsonian Libraries and Archives’ Wikidata Team presentations:
- Discovery Services Forum on June 23: http://bit.ly/DISC2021Forum0623
- LD4 20201 Conference: https://sched.co/joA8
- Smithsonian Collections Information Management Committee: http://bit.ly/CIMC202109LibWD