horothesia: June 2012

Sunday, June 24, 2012

"About Roman Emperors" open dataset updated

The latest version is now online at the base URI: http://www.paregorios.org/resources/roman-emperors/. Major updates:

URIs for emperor profile docs (with links to coinage) on the Portable Antiquities Scheme website (courtesy Dan Pett)
URIs for emperors as coined by the nomisma.org project (courtesy Dan Pett)
More viaf.org IDs for emperors (courtesy Dan Pett)
More alternate names (courtesy Roko Rumora)
More detail and description of third-party resources in both the HTML and RDF
Slightly more readable HTML pages
Complete dump files now available in CSV, RDF+XML, and Turtle

Friday, June 22, 2012

"About Roman Emperors" linked dataset published

A few days ago I blogged about an open linked dataset about Roman Emperors. I've now more formally published the dataset online at http://www.paregorios.org/resources/roman-emperors/.

I'll be adding more features and data, and improving the dataset description in coming weeks. More information on how to contribute is also forthcoming (and I have a couple of early contributions by others to incorporate as soon as possible!).

I'll blog more here with the label romemplod whenever there's a significant update.

Tuesday, June 19, 2012

Roman Emperors as Linked Data

You can jump right to the roman-emperors github repository here. I repeat the README file here for the benefit of those who'd rather look before they leap:

This dataset uses the published dbpedia resource URIs for Roman Emperors (the persons themselves) as a starting point for making useful assertions about these individuals in the linked data space. The main goal is to align these URIs with any other key URIs (now or in the future) for the same persons and then to attribute these "same as" relationships with links to descriptive documents or other data that have not so far made it into the linked data graph (especially legacy web resources). Multiple names for the emperors are only incidental to the dataset; no attempt is being made to produce (in this dataset) a comprehensive set of alternate names.

It's still a work in progress, but I've made it available under the Open Data Commons Public Domain Dedication and License so anyone who's interested can pitch in and help, or make use of it freely.

Both RDF (Turtle) and CSV versions are included.

Friday, June 15, 2012

People from my dissertation in RDF

On the road to turning my dissertation into linked data I've minted URIs for, and produced basic RDF for, all of the historical individuals I dealt with examining boundary disputes internal to the early Roman empire.

I used foaf:Person, foaf:name, and bio:olb (the latter from the BIO Vocabulary for Biographical Information, developed by Ian Davis and David Galbraith). The Roman emperors who appear in my list have been aligned to dbpedia resources using owl:sameAs. I intend to do more alignments in future to resources like dbpedia and viaf.org.

Here's the XML I started from (part of an Open Document Text format file I converted from Word), and the XSLT I used to produce the Turtle RDF, which was then cleaned up by hand.

More to come.

Saturday, June 2, 2012

How to get a born-for-print bibliography into RDF

It began life as a Word file for a printed-on-paper dissertation. I want it to become linked data so that I can hook up other linked data I'm putting online. Here's a quick-and-basic way that involves no programming, writing of scripts, or other computational heroics on my part:

Open the Word file in Libre Office and save it (download copy here). The basic structure puts one citation per paragraph, with a tab dividing a short title from a full citation. E.g.:

Ager 1989    S. Ager, “Judicial Imperialism: the Case of Melitaia,” AHB 3.5 (1989) 107-114.
Ager 1996    S. Ager, Interstate arbitrations in the Greek world, 337-90 B.C., Berkeley, 1996.
Aichinger 1982    A. Aichinger, “Grenzziehung durch kaiserliche Sonderbeauftragte in den römischen provinzen,” ZPE 48 (1982) 193-204.

Rip out everything (like title, introductory materials, etc.) that's not the list of short titles and citations (download copy here).
"Save as ..." -> File Type = "text encoded" (select the "edit filter settings" checkbox) -> "Save" -> (in filter options, make sure "Unicode (UTF-8)" is the chosen encoding) -> "OK" (see here).
Close the text file in Libre Office.
Open a new spreadsheet file in Libre Office (don't use Excel for this; it will make a mess of your Unicode text. Ditto exporting to CSV from Word)
"File" -> "Open..." -> File Type = "Text CSV (*.csv, *.txt)" -> "Open"
In the "Text Import" dialog box, make sure the character set is "Unicode (UTF-8)" and change the "separator" from "comma" to "tab"
Click "OK"
Make sure the spreadsheet gives you two columns (one for the short title and the other for the full citation).
Add an empty top row and in the first cell type "shortTitle" (no quotes). Enter the string "shortDescription" in the second cell (no quotes). Save the file (still in the tab-delimited format). (see here).
If you have python installed on your computer, download the tab2n3.py script from the W3C website and save it into the same folder as your data.
Open a command window or terminal and navigate to the folder where your data is.
Type the following:

$ python tab2n3.py -id -schema -namespace http://purl.org/ontology/bibo/ < BoundaryDisputesJustDataHeadings.csv > BoundaryDisputes.ttl

Open the resulting ttl file in the text-editor of your choice. You've got RDF! (see here).

Friday, June 1, 2012

Ancient Studies Needs Open Bibliographic Data and Associated URIs

Update 1: links throughout, minor formatting changes, proper Creative Commons Public Domain tools, parenthetical about import path from Endnote and such, fixing a few typos.

The NEH-funded Linked Ancient World Data Institute, still in progress at ISAW, has got me thinking about a number of things. One of them is bibliography and linked data. Here's a brain dump, intended to spark conversation and collaboration.

What We Need

As much bibliographic data as possible, for both primary and secondary sources (print and digital), publicly released to third parties under either a public domain declaration or an unrestrictive open license.
Stable HTTP URIs for every work and author included in those datasets.

Why

Bibliographic and citation collection and management are integral to every research and publication in project in ancient studies. We could save each other a lot of time, and get more substantive work done in the field, if it was simpler and easier to do. We could more easily and effectively tie together disparate work published on the web (and appearing on the web through retrospective digitization) if we had a common infrastructure and shared point of reference. There's already a lot of digital data in various hands that could support such an effort, but a good chunk of it is not out where anybody with good will and talent can get at it to improve it, build tools around it, etc.

What I Want You (and Me) To Do If You Have Bibliographic Data

Release it to the world through a third party. No matter what format it's in, give a copy to someone else whose function is hosting free data on the web. Dump it into a public repository at github.com or sourceforge.net. Put it into a shared library at Zotero, Bibsonomy, Mendeley, or another bibliographic content website (most have easy upload/import paths from Endnote, and other citation management applications). Hosting a copy yourself is fine, but giving it to a third party demonstrates your bona fides, gets it out of your nifty but restrictive search engine or database, and increments your bus number.
Release it under a Creative Commons Public Domain Mark or Public Domain Dedication (CC0). Or if you can't do that, find as open a Creative Commons or similar license as you can. Don't try to control it. If there's some aspect of the data that you can't (because of rights encumberance) or don't want to (why?) give away to make the world a better place, find a quick way to extract, filter, or excerpt that aspect and get the rest out.
Alert the world to your philanthropy. Blog or tweet about it. Post a link to the data on your institutional website. Above all, alert Chuck Jones and Phoebe Acheson so it gets announced via Ancient World Online and/or Ancient World Open Bibliographies.
Do the same if you have other useful data, like identifiers for modern or ancient works or authors.
Get in touch with me and/or anyone else to talk about the next step: setting up stable HTTP URIs corresponding to this stuff.

Who I'm Talking To

First of all, I'm talking to myself, my collaborators, and my team-mates at ISAW. I intend to eat my own dogfood.

Here are other institutions and entities I know about who have potentially useful data.

The Open Library : data about books is already out there and available, and there are ways to add more
Perseus Project : a huge, FRBR-ized collection of MODS records for Greek and Latin authors, works, and modern editions thereof.
Center for Hellenic Studies: identifiers for Greek and Latin authors and works
L'Année Philologique and its institutional partners like the American Philological Association: the big collection of analytic secondary bibliography for classics (journal articles)
TOCS-IN: a collaboratively collected batch of analytic secondary bibliography for classics
Papyri.info and its contributing project partners: TEI bibliographic records for much of the bibliography produced for or cited by Greek and Latin papyrologists (plus other ancient language/script traditions in papyrology)
Gnomon Bibliographische Datenbank: masses of bibliographic data for books and articles for classics
Any and every university library system that has a dedicated or easily extracted set of associated catalog records. Especially any with unique collections (e.g., Cincinnati) or those with databases of analytical bibliography down to the level of articles in journals and collections.
Ditto any and every ancient studies digital project that has bibliographic data in a database.

Comments, Reactions, Suggestions

Welcome, encouraged, and essential. By comment here or otherwise (but not private email please!).