On July 10-14, 2012, Smithsonian Libraries staff members JJ Ford, Gilbert Borrego and Grace Costantino attended the 8th Annual Wikimania Conference in Washington, D.C. to explore possible collaborations between Wikipedia and the Biodiversity Heritage Library. The Biodiversity Heritage Library (BHL), of which the Smithsonian Libraries is a founding member, is an open access, global digital library initiative dedicated to digitizing the biodiversity-related materials held in the collections of BHL consortium member libraries.
Since 2009, BHL staff have been looking at Wikipedia as a way to drive new user traffic to the Biodiversity Heritage Library while improving the content and accuracy of Wikipedia’s articles. This symbiotic relationship has had a few bumps along the way, but our recent attendance at the 8th Annual Wikimania Conference reaffirmed our commitment to increase our Wikipedia efforts, which include adding our Flickr images to the Wikimedia commons file repository as well as inserting species citations and external links to auto-generated BHL taxon name bibliographies. During this week-long conference, we were inspired by the sense of mission, ingenuity and passion that our fellow Wikipedians demonstrated.
Two Missions Collide: Free, Open, and Global! Wikipedia we love you!
Early on, BHL staff ran into some obstacles with our Wikipedia edits. It seems we lacked the user “clout” necessary to add BHL links en masse. As newcomers who had not been “validated” by the Wikipedia community, we found that many of our citations were subsequently deleted, which proved a tad bit frustrating since we had no idea why this was happening! At this year’s conference, we learned that Wikipedia is subject to rampant vandalism and many links from unverified or new users will be deleted. You must earn authority over time in Wikipedia. Top users are awarded virtual honors dubbed “barnstars.” Yes, anyone may edit, but in Wikipedia only the unbiased vetted truth sticks.
Contrary to popular derision, Wikipedia’s standards for trusted citations, fact checking and article quality are intensely rigorous. They have developed rubrics for quality, tutorials for writing articles, and lists of articles that need user help. Wikipedia is built upon the hard-work of a global network of altruistically motivated (as opposed to financially), passionate and tech savvy people who have developed a highly complex information ecosystem. Frankly, we were in awe at the spirit Wikipedians had for Open Knowledge, not to mention the amount of free work they seemed willing to do. Beyond traditional editing efforts, Team BHL has also been exploring potential tech developments for two out of the ten official Wikimedia foundation projects; those being, Wikisource and Wikimedia Commons.
Wikisource is a project that gives users the opportunity to augment open content texts with corrections, hyperlinks and notes. This really piqued our interest because free-text searching in BHL has been one of the most requested improvements by our users and remains on our tech development to-do list. The main obstacle that we face is that optical character recognition software (OCR) is marginally accurate at best and the errors present in uncorrected OCR texts remain one of the insurmountable hindrances to free-text searching of the BHL corpus.
After sitting-in on the Wikisource presentation given by Andrea Zanni, a Wikisource sysop, advocate, and volunteer, we were extremely impressed by this project and its potential future application for BHL text files. Finally: a platform that opens up the possibility of crowd-sourcing BHL user-generated corrections for OCR text! Moreover, the multi-layered djvu file format that Wikisource accepts allows users to add their own links into the text’s OCR, further augmenting the usefulness of the original resource. Lastly, perhaps the most exciting application of Wikisource is its potential use with manuscripts. For instance, handwritten scientific field notes or Linnaeus’ personal letters are not accompanied by a text file; these invaluable scholarly resources could be transcribed by Wikisource users and thus exposed to user search and discovery.
Currently, there are still few options for extracting the corrections and re-integrating them into BHL. Nevertheless, the Wikisource developers were among the most enthusiastic and driven folks at the conference; we will be watching them closely for future improvements to Wikisource that might help make the “Full-text Search Dream” and manuscript transcription a reality. As users, how likely would it be for you to help make corrections to BHL texts using the Wikisource interface? (Screenshot below)
Currently, there are 3,181 BHL links in Wikipedia. We can only hope that this number continues to grow as word about BHL spreads. Our continued efforts are only a small piece of the pie. We count on our users to help us vet Wikipedia’s biodiversity articles, all of which benefit from citations pulled from BHL. Please help us spread knowledge about life on Earth to new user communities by becoming a Wikimanian yourself. We may hold an “edit-a-thon” in the future for interested users — if the idea takes hold. Tell us what you think by voting in our poll below. In the meantime, feel free to sail solo by adding BHL links and citations to Wikipedia. For helpful tips on how to do this, our Technical Director, Chris Freeland, put together a presentation that offers a quick primer on Wikipedia editing for BHL. We depend on user feedback to drive our technical development efforts, so please let us know what you think about our involvement with Wikipedia and the two aforementioned projects. And be sure to check back for future posts about SIL’s further involvement in the 8th Annual Wikimania Conference events!
– Text by JJ Ford, Biodiversity Heritage Library Librarian
– Formatted and Edited by Grace Costantino, Biodiversity Heritage Library Program Manager