SyntaxHighlighter

Thursday, April 18, 2013

Citing Sources in Digital Annotations

I'm collaborating with other folks both in and outside ISAW on a variety of digital scholarly projects in which Linked Open Data is playing a big role. We're using the Resource Description Framework (RDF) to provide descriptive information for, and make cross-project assertions about, a variety of entities of interest and the data associated with them (places, people, themes/subjects, creative works, bibliographic items, and manuscripts and other text-bearing objects). So, for example, I can produce the following assertions in RDF (using the Terse RDF Triple Language, or TuRTLe):

<http://syriaca.org/place/45> a <http://geovocab.org/spatial#Feature> ;
  rdfs:label "Serugh" ;
  rdfs:comment "An ancient city where Jacob of Serugh was bishop."@en ;
  foaf:primaryTopicOf <http://en.wikipedia.org/wiki/Suruç> ;
  owl:sameAs <http://pleiades.stoa.org/places/658405#this> .

This means: 'There's a resource identified with the Uniform Resource Identifier (URI) "http://syriaca.org/place/45" about which the following is asserted:
(Folks familiar with what Sean Gillies has done for the Pleiades RDF will recognize my debt to him in the what proceeds.)

But there are plenty of cases in which just issuing a couple of triples to encode an assertion about something isn't sufficient; we need to be able to assign responsibility/origin for those assertions and to link them to supporting argument and evidence (i.e., standard scholarly citation practice). For this purpose, we're very pleased by the Open Annotation Collaboration, whose Open Annotation Data Model was recently updated and expanded in the form of a W3C Community Draft (8 February 2013) (the participants in Pelagios use basic OAC annotations to assert geographic relationships between their data and Pleiades places).


A basic OADM annotation uses a series of RDF triples to link together a "target" (the thing you want to make an assertion about) and a "body" (the content of your assertion). You can think of them as footnotes. The "target" is the range of text after which you put your footnote number (only in OADM you can add a footnote to any real, conceptual, or digital thing you can identify) and the "body" is the content of the footnote itself. The OADM draft formally explains this structure in section 2.1. This lets me add an annotation to the resource from our example above (the ancient city of Serugh) by using the URI "http://syriaca.org/place/45" as the target of an annotation) thus:
<http://syriaca.org/place/45/anno/desc6> a oa:Annotation ;
  oa:hasBody <http://syriaca.org/place/45/anno/desc6/body> ;
  oa:hasTarget <http://syriaca.org/place/45> ;
  oa:motivatedBy oa:describing ;
  oa:annotatedBy <http://syriaca.org/editors.xml#tcarlson> ;
  oa:annotatedAt "2013-04-03T00:00:01Z" ;
  oa:serializedBy <https://github.com/paregorios/srpdemo1/blob/master/xsl/place2ttl.xsl> ;
  oa:serializedAt "2013-04-17T13:35:05.771-05:00" .

<http://syriaca.org/place/45/anno/desc6/body> a cnt:ContentAsText, dctypes:Text ;
  cnt:chars "an ancient town, formerly located near Sarug."@en ;
  dc:format "text/plain" ;

I hope you'll forgive me for not spelling that all out in plain text, as all the syntax and terms are explained in the OADM. What I'm concerned about in this blog post is really what the OADM doesn't explicitly tell me how to do, namely: show that the annotation body is actually a quotation from a published book. The verb oa:annotatedBy lets me indicate that the annotation itself was made (i.e., the footnote was written) by a resource identified by the URI "http://syriaca.org/editors.xml#tcarlson". If I'd given you a few more triples, you could have figured out that that resource is a real person named Thomas Carlson, who is one of the editors working on the Syriac Reference Portal project. But how do I indicate (as he very much wants to do because he's a responsible scholar and has no interest in plagiarizing anyone) that he's deliberately quoting a book called The Scattered Pearls: A History of Syriac Literature and Sciences? Here's what I came up with (using terms from Citation Typing Ontology and the DCMI Metadata Terms):
<http://syriaca.org/place/45/anno/desc7/body> a cnt:ContentAsText, dctypes:Text ;
  cnt:chars "a small town in the Mudar territory, between Ḥarran and Jarabulus. [Modern name, Suruç (tr.)]"@en ;
  dc:format "text/plain" ;
  cito:citesAsSourceDocument <http://www.worldcat.org/oclc/255043315> ;
  dcterms:biblographicCitation  "The Scattered Pearls: A History of Syriac Literature and Sciences, p. 558"@en .

The addition of the triple containing cito:citesAsSourceDocument lets me make a machine-actionable link to the additional structured bibliographic data about the book that's available at Worldcat (but it doesn't say anything about page numbers!). The addition of the triple containing dcterms:bibliographicCitation lets me provide a human-readable citation.

I'd love to have feedback on this approach from folks in the OAC, CITO, DCTERMS, and general linked data communities. Could I do better? Should I do something differently?


The SRP team is currently evaluating a sample batch of such annotations, which you're also welcome to view. The RDF can be found here. These files are generated from the TEI XML here using the XSLT here.

No comments: