Press "Enter" to skip to content

Supporting Access to Zoological Literature: Article Definition in the Biodiversity Heritage Library

This post was written by Katerina Ozment, part of the Smithsonian Libraries’ 50th Anniversary 2019 Intern Class, funded by the Secretary of the Smithsonian and the Smithsonian National Board. At that time she was an undergraduate at the University of Oklahoma, majoring in History and Biology. Katerina is now a graduate student at the University of Tennessee, College of Communication and Information, School of Information Sciences. The internship program is now the Smithsonian Libraries and Archives’ 50th Anniversary program

For zookeepers to most effectively care for their animals, they need access to zoological research, as well as a way to communicate with other zookeepers. One way for zookeepers to do this is through participation in professional organizations such as the American Association of Zoo Keepers (AAZK) and its publication, Animal Keepers’ Forum (AKF). AKF contains current research, husbandry techniques, animal enrichment activities, conservation news, and other topics.  

Due to AKF’s role in facilitating this kind of communication, Smithsonian Libraries (now Smithsonian Libraries and Archives) requested permission from AAZK to digitize the Libraries’ copies of AKF and make them available through the Biodiversity Heritage Library (BHL). BHL is an open access digital library for biodiversity works. Smithsonian Libraries and Archives is one of the only BHL member libraries that supports an active zoo and therefore has a unique commitment to providing for this user community in BHL.   

Although the publication was already available online, searching for specific articles remained difficult. This is because AKF was uploaded as whole issues as opposed to individual articles. It was uploaded this way because the metadata (data about the work) associated with the Libraries’ record applies to each issue, not each article. Descriptive metadata includes information such as the title, volume, issue number, or date of a work. This metadata ensures that BHL is searchable and that specific works can be located.  

However, researchers are used to having article-level metadata and often search for a specific article or article topics. Currently, if a researcher searched for a specific article author in the name field, it would not bring up the articles written by that author for AKF. Similarly, if an article’s title was searched for in the title field, it would not be found. Without article-level metadata, such as article titles or article authors, these resources are much harder to find. It is possible to do a full text search and find articles by title or author that way; however, the OCR (optical character recognition) the full text search relies on is not corrected. If there are mistakes in the OCR, the search terms won’t be found. This is especially true when an article has graphic design elements, or text overlaid on a picture, as both contribute to poor OCR. 

Despite this, it is sometimes possible to search for an article’s title or author(s) using the full text of the issue, which is made available via OCR (Optical Character Recognition). However, the OCR that the full text search relies on frequently contains mistakes and cannot be manually corrected at this scale. If there are mistakes in the OCR, the search terms won’t be found. This is especially true when an article has graphic design elements, or text overlaid on a picture, as both contribute to poor OCR. 

This article, “Giraffe: Forgotten Megafauna”, left, has both an acrostic title and is overlaid on a photo. As a result, the OCR, right, did not pick up the title, and the OCR text of the article has a number of errors.

To address this, I worked to add article-level metadata and access to issues of AFK. This is possible due to the recently added batch upload tool for BHL. The batch upload tool allows the metadata for an entire set of articles to be added to BHL at once, following a template.  

Because AKF is more of a newsletter or magazine than a formal journal, determining what counted as an article was not always obvious. For each AKF issue, I needed to determine what should be defined as an article (for the purposes of this project) and then add the relevant metadata (including author, title, page numbers, etc) to the template. Letters from the editor, interviews, and news summaries were some parts of AKF that caused the most questions. I often consulted with my supervisors, Jacqueline Chapman (Head, Digital Library & Digitization) and Stephen Cox (Branch Librarian for the Smithsonian’s National Zoo and Conservation Biology Institute), to determine what should be defined as an article and what shouldn’t. In the end, focusing on what was most important to animal care research was a key aspect of this definition. The articles I added to BHL included research done by zookeepers, interviews with conservationists, and columns summarizing zoo-related news. However, there are still many parts of AKF that I did not articlize, but could be added later. These include the letter from the editor, “About the Cover,” or AAZK chapter news.  

The same article, “Giraffe: Forgotten Megafauna”, left, now with article metadata manually applied, right. The title is now searchable within BHL and for other services that use BHL’s data, despite the complicated layout and underlying uncorrected OCR.

Aside from defining articles, another challenge was ensuring consistency for author names across issues. If an author uses a nickname in one article and their full name in another, or if their name has changed over time, only one version of the name should be recorded as the author. This is necessary so that each individual only receives one BHL “creator ID” and so that searching their name will bring up all their work, regardless of the name variation used. This required that the preferred version of the name be used on all articles.  

For this part of the process, I needed to learn about Name Authority work, an aspect of library science focused on solving this problem. Name Authorities establish IDs for individual authors, such as those provided by the Library of Congress, VIAF, and ORCID. VIAF (Virtual International Authority File) is a compilation of name authority records from libraries across the world, including the Library of Congress. An ORCID is an ID that the individual author creates for themselves. Its use is increasingly common among researchers across many fields and helps ensure that an author’s works are all connected even if the author changes their name.  In each case, the name associated with the BHL “creator ID” should be the one used in one of these Name Authorities. 

I was able to find and associate these IDs with their authors in AKF using a tool called OpenRefine. Earlier, in July 2019, I attended a Data Carpentries workshop at the Smithsonian. In addition to receiving an introduction to several data management and manipulation tools, including Python and SQL, I also learned how to use OpenRefine. I was able to use this tool to more easily determine if there was an author ID associated with any of the AKF authors. OpenRefine allows the user to upload a list of authors and use the reconcile tool to find possible matches in either ORCID or VIAF. From there, I could determine if the ID was the author from AKF, or someone else with the same name. While these IDs were very useful for determining preferred names, there were occasional mistakes. In one instance, the VIAF record for the author included works by two different people with the same name. I submitted a correction to VIAF, with the help of Lesley Parilla (former Cataloging and Bibliographic Librarian).  

I was able to complete AKF issues from 2010-2016 during my internship.  I also worked to document the process I used and decisions my team made, ensuring a future intern or staff member will be able to take up the project from where I left off.  

AKF is very varied in how the articles are formatted, how the authors’ names are listed, and how accurate the OCR is; therefore, a manual process worked best for this project. It is also possible to create article-level metadata by writing a script to collect the necessary information, if the title is consistent in layout. I worked with Taylor Smith (Summer 2019 Kathryn Turner Diversity and Technology Internship) and her supervisor, Joel Richard (Head, Web Services & IT), who created metadata for the journal Avicultural Magazine in this way. We were able to share the results of both of our projects as a poster, Supporting Access to Zoological Literature: Article Definition in the Biodiversity Heritage Library, at the Association of Zoos & Aquarium’s (AZA) conference in September 2019.  

Cover, Animal Keeper’s Forum, V. 43: No. 12 (2016).

Whereas my work on AKF in BHL was focused on providing article-level access to one title, my other project allowed me to take a broader view of zoological literature. I was able to help Stephen with his work on a bibliography for the AZA’s Orangutan Species Survival Plan (SSP). An SSP is a holistic approach developed by conservationists to help support captive breeding programs for endangered species. Stephen functions as a curator of relevant peer-reviewed literature for several SSPs, maintaining comprehensive bibliographies.  This specific SSP bibliography serves as a resource for zookeepers caring for orangutans and contains citations for articles and books about a variety of husbandry topics. I located each listed source online and added it to Zotero, a reference management tool, ensuring the accuracy and completeness of each citation. This tool will be shared amongst primate keepers around the world. 

As I worked on my two projects, both focused on how the Libraries and Archives supports zookeepers, I was able to appreciate the many ways in which librarians and zookeepers work together.  I have spent a lot of time in zoos and libraries; however, I was unaware of how connected the two are. 

Over the course of the summer of 2019, I learned so much about how libraries work, the different careers within a library, and how libraries provide resources for their users. I was constantly amazed to see the careful thought and work that goes into tools that I had always taken for granted as a library user.  Aside from my projects, I’ve been able to talk to so many people who work here and each one has given me more insight into what it means to work in a library and the variety of types of librarians. I know that everything I’ve learned will be invaluable as I pursue my career in the library and information sciences. 

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *