SyntaxHighlighter

Saturday, June 2, 2012

How to get a born-for-print bibliography into RDF

It began life as a Word file for a printed-on-paper dissertation. I want it to become linked data so that I can hook up other linked data I'm putting online. Here's a quick-and-basic way that involves no programming, writing of scripts, or other computational heroics on my part:
  • Open the Word file in Libre Office and save it (download copy here). The basic structure puts one citation per paragraph, with a tab dividing a short title from a full citation. E.g.:  
Ager 1989    S. Ager, “Judicial Imperialism: the Case of Melitaia,” AHB 3.5 (1989) 107-114.
Ager 1996    S. Ager, Interstate arbitrations in the Greek world, 337-90 B.C., Berkeley, 1996.
Aichinger 1982    A. Aichinger, “Grenzziehung durch kaiserliche Sonderbeauftragte in den römischen provinzen,” ZPE 48 (1982) 193-204.
  •  Rip out everything (like title, introductory materials, etc.) that's not the list of short titles and citations (download copy here).
  • "Save as ..." -> File Type = "text encoded" (select the "edit filter settings" checkbox) -> "Save" -> (in filter options, make sure "Unicode (UTF-8)" is the chosen encoding) -> "OK" (see here).
  • Close the text file in Libre Office.
  • Open a new spreadsheet file in Libre Office (don't use Excel for this; it will make a mess of your Unicode text. Ditto exporting to CSV from Word)
  • "File" -> "Open..." -> File Type = "Text CSV (*.csv, *.txt)" -> "Open"
  • In the "Text Import" dialog box, make sure the character set is "Unicode (UTF-8)" and change the "separator" from "comma" to "tab"
  • Click "OK"
  • Make sure the spreadsheet gives you two columns (one for the short title and the other for the full citation).
  • Add an empty top row and in the first cell type "shortTitle" (no quotes). Enter the string "shortDescription" in the second cell (no quotes). Save the file (still in the tab-delimited format). (see here).
  • If you have python installed on your computer, download the tab2n3.py script from the W3C website and save it into the same folder as your data.
  • Open a command window or terminal and navigate to the folder where your data is.
  • Type the following:
$ python tab2n3.py -id -schema -namespace http://purl.org/ontology/bibo/ < BoundaryDisputesJustDataHeadings.csv > BoundaryDisputes.ttl
  • Open the resulting ttl file in the text-editor of your choice. You've got RDF! (see here).

No comments: