As noted elsewhere in this blog, the publication record of Smithsonian scholars includes a growing portion of open access (OA) articles. During 2012, nearly 14% of scientific papers authored by Smithsonian scientists were published in OA journals. This is up from 7% in 2008 and it is expected to grow. Most OA
publishers journals operate under an “author pays” model whereby the authors, their institution, or the funding organization pay a fee to have their manuscript managed, peer-reviewed and (if accepted) published in an online journal. The article is then available online to readers without subscription or other payment required.
A previous assessment of OA publications by Smithsonian authors revealed that there were over 500 items which were published under this model but for which the Repository lacked a digital reprint. Since that time the Smithsonian Research Online (SRO) staff have pondered a way to capture this content systematically and a partial solution has now been tested and implemented.
The National Institutes of Health public access policy requires publications resulting from NIH research funding to be deposited in their digital archive, PubMed Central (PMC). However many publishers choose to contribute their entire journal run to PMC in addition to those papers funded by the NIH. The list of journals depositing all articles in this large archive includes several OA journals to which Smithsonian scientists are frequent contributors. Among them are PLoS One, Phytokeys and ZooKeys. These three journals published nearly 270 Smithsonian-authored papers as of 2013. (Readers of this blog may remember that ZooKeys was the journal which published the account of the olinguito, a new species of mammal recently discovered by scientists at the National Museum of Natural History).
Recently the SRO staff began a systematic effort to capture this freely available content for the Repository. NIH staff have developed tools for external repository and data managers to systematically harvest publications from PMC and Libaries’ staff have begun working on this automated capture. It requires staff to first collect the Digital Object Identifier (DOI) for papers lacking a reprint in our own Repository. The NIH tool allows uploading a batch of DOIs to query the PubMed Central archive and where matches are found, to generate links to the electronic reprints in PDF form. A list of links can then be used to download the associated files and save them automatically to a storage device housed in the Libraries’ Digital Services Division.
It is likely that the SRO will add hundreds publications to the Repository using this method over the coming months. This is significant not only for the volume of content added, but because it further reduces the effort which is required of authors to contribute to the SRO program. Automated capture of metadata already collects author, title, page numbers, etc. for nearly 75% of all publications. But this new method of harvesting the digital texts means that in some cases, the author’s works will be included without being asked to think about it. This proactive collection of data is a primary reason for the success of the SRO program over the past 7 years.