Looking for a Linked Data intern!

by Erin Rushing

The Smithsonian Libraries seeks a computer science or MLS student for the Taxonomic Literature 2 Linked Data Mining internship. This is a paid internship, carrying a stipend of $500 per week (full time) or a total of $1500 (part time) to take place in January/Febuary of 2013. It may be performed in person, in the National Museum of Natural History, in Washington, D.C. or remotely. Applications will be accepted until October 15th, 2012. Further project details are below or at http://library.si.edu/internships/taxonomic-literature-2-linked-data-mining-paid-internship.

Dates preferred: Winter term (January-February) 2013
Full time or Part time: Either full time for three weeks or part time, totaling 105 hours. This is a paid internship, carrying a stipend of $500 per week (full time) or a total of $1500.
Intern Supervisor: Joel Richard
Location of internship: Remote or Local (Washington, DC)
Desired knowledge/skill sets:
One of:  B.S. in Computer Science or related field OR MLS/MLIS  current student or recent graduate (within 6 months)
Must have: Experience with databases or large datasets, knowledge of at least one programming language e.g., Ruby, Python, Perl, etc.
Desirable: experience or education in the Natural Sciences

Brief description of project:
TL-2 is the premier publication of the International Association for Plant Taxonomy (IAPT); a 15 volume guide to the literature of systematic botany published between 1753 and 1940.  It is organized by author and includes numbered entries for the author’s publications. Suggested abbreviations for use in taxonomic publications are provided: abbreviations for the author’s name, short titles and abbreviations of the short titles for publications. TL-2 is the standard by which authors’ names and titles should be abbreviated. TL-2 is now being offered online as a searchable database at http://www.sil.si.edu/digitalcollections/tl-2/. The plan is to provide TL-2 as linked open data (LOD) to increase utility for the Botany community.

Possible activities to explore in this internship include one or more of:
•    Data mining for additional Linked Open Data elements using Google Refine, e.g., geographic names, species names, institutions.
•    Linking data elements to other Linked Open Data sources on the web, e.g., VIAF, FreeBase, DBpedia.
•    Exploring use of visualizations to provide additional insights to the data
End results of the internship will be at least two of:
•    A informational graphic  interpreting some of the data (existing data or new data elements created via data mining).
•    Identification of elements for linking, with links out to other Linked Open Data source(s).
•    A report of an analysis of the data with suggestions for future work, challenges faced in mining or linking data, etc.

If the internship is remote, frequent check-ins via Skype or GTalk (or phone) will be the primary means of communication. The internship can be full-time or part-time (20 hrs/week or more) with total time spent on the project not to exceed 105 hours.
Please apply via SOLAA (https://solaa.si.edu/solaa/SOLAAHome.html). Select “Smithsonian Insitution Libraries” as placement unit, then “Smithsonian Institution Libraries Internship Program” as program and “Taxonomic Literature 2 Linked Data Mining” as specific project. Paper and email applications will not be accepted.

