As readers of this blog will know, the About Roman Emperors dataset is built upon the backbone of Wikipedia. More specifically, HTTP URIs like http://dbpedia.org/resource/Nero are programmatically created by the dbpedia project on the basis of Wikipedia content, and I used them to identify uniquely each emperor. Web pages and other resources about those emperors are then linked -- using FOAF and other vocabularies -- with those identifiers. These little packages of data about the emperors and their web pages are serialized in both HTML and RDF form on my website.
Today I've added some JavaScript to the HTML views of each emperor's page (e.g., "About the Roman Emperor Hadrian"). When the page loads, the JavaScript fires off a query at the dbpedia SPARQL endpoint, asking for title, abstract, image, and label information associated with the corresponding emperor's URI. Whatever it gets back is presented in the gray-backgrounded column to the right.
See what you think.
thoughts and comments across the boundaries of computing, ancient history, epigraphy and geography ... oh, and barbeque, coffee and rockets
SyntaxHighlighter
Showing posts with label lawdi. Show all posts
Showing posts with label lawdi. Show all posts
Tuesday, June 11, 2013
Friday, June 7, 2013
TTL added to dumps of "About Roman Emperors" dataset
Linked from the HTML landing page for the dataset and from the Void file, or from the roman-emperors repos on github.
Labels:
lawdi,
linkeddata,
LOD,
rdf,
roman emperors
Monday, June 3, 2013
Updates to the "About Roman Emperors" Linked Dataset
I have just published a set of updates to my self-published About Roman Emperors dataset (with contributions from Ryan Baumann, Daniel Pett, and Roko Rumora).
Significant changes:
Significant changes:
TTL andCSV dumps have been suspended (i.e., withdrawn) until they can be brought up-to-date with the master RDF/XML version.- Missing article titles from De Imperatoribus Romanis have been added to RDF/XML and HTML versions
- RDF/XML has been re-ordered and consolidated so it's easier to read and more sensible
- Terms have been added from the http://schema.org vocabular
Prior posts, explaining the purpose of the dataset and its contents are best accessed via a search on the romemplod label.
Thursday, April 18, 2013
Citing Sources in Digital Annotations
I'm collaborating with other folks both in and outside ISAW on a variety of digital scholarly projects in which Linked Open Data is playing a big role. We're using the Resource Description Framework (RDF) to provide descriptive information for, and make cross-project assertions about, a variety of entities of interest and the data associated with them (places, people, themes/subjects, creative works, bibliographic items, and manuscripts and other text-bearing objects). So, for example, I can produce the following assertions in RDF (using the Terse RDF Triple Language, or TuRTLe):
This means: 'There's a resource identified with the Uniform Resource Identifier (URI) "http://syriaca.org/place/45" about which the following is asserted:
(Folks familiar with what Sean Gillies has done for the Pleiades RDF will recognize my debt to him in the what proceeds.)
But there are plenty of cases in which just issuing a couple of triples to encode an assertion about something isn't sufficient; we need to be able to assign responsibility/origin for those assertions and to link them to supporting argument and evidence (i.e., standard scholarly citation practice). For this purpose, we're very pleased by the Open Annotation Collaboration, whose Open Annotation Data Model was recently updated and expanded in the form of a W3C Community Draft (8 February 2013) (the participants in Pelagios use basic OAC annotations to assert geographic relationships between their data and Pleiades places).
A basic OADM annotation uses a series of RDF triples to link together a "target" (the thing you want to make an assertion about) and a "body" (the content of your assertion). You can think of them as footnotes. The "target" is the range of text after which you put your footnote number (only in OADM you can add a footnote to any real, conceptual, or digital thing you can identify) and the "body" is the content of the footnote itself. The OADM draft formally explains this structure in section 2.1. This lets me add an annotation to the resource from our example above (the ancient city of Serugh) by using the URI "http://syriaca.org/place/45" as the target of an annotation) thus:
I hope you'll forgive me for not spelling that all out in plain text, as all the syntax and terms are explained in the OADM. What I'm concerned about in this blog post is really what the OADM doesn't explicitly tell me how to do, namely: show that the annotation body is actually a quotation from a published book. The verb oa:annotatedBy lets me indicate that the annotation itself was made (i.e., the footnote was written) by a resource identified by the URI "http://syriaca.org/editors.xml#tcarlson". If I'd given you a few more triples, you could have figured out that that resource is a real person named Thomas Carlson, who is one of the editors working on the Syriac Reference Portal project. But how do I indicate (as he very much wants to do because he's a responsible scholar and has no interest in plagiarizing anyone) that he's deliberately quoting a book called The Scattered Pearls: A History of Syriac Literature and Sciences? Here's what I came up with (using terms from Citation Typing Ontology and the DCMI Metadata Terms):
The addition of the triple containing cito:citesAsSourceDocument lets me make a machine-actionable link to the additional structured bibliographic data about the book that's available at Worldcat (but it doesn't say anything about page numbers!). The addition of the triple containing dcterms:bibliographicCitation lets me provide a human-readable citation.
I'd love to have feedback on this approach from folks in the OAC, CITO, DCTERMS, and general linked data communities. Could I do better? Should I do something differently?
The SRP team is currently evaluating a sample batch of such annotations, which you're also welcome to view. The RDF can be found here. These files are generated from the TEI XML here using the XSLT here.
<http://syriaca.org/place/45> a <http://geovocab.org/spatial#Feature> ;
rdfs:label "Serugh" ;
rdfs:comment "An ancient city where Jacob of Serugh was bishop."@en ;
foaf:primaryTopicOf <http://en.wikipedia.org/wiki/Suruç> ;
owl:sameAs <http://pleiades.stoa.org/places/658405#this> .
This means: 'There's a resource identified with the Uniform Resource Identifier (URI) "http://syriaca.org/place/45" about which the following is asserted:
- it is a "Feature" as defined in the NeoGeo Spatial Ontology;
- the human-readable version of its name is "Serugh";
- a human-readable description (in the English language) of it is "An ancient city where Jacob of Serugh was bishop.";
- it is the primary topic of a document that is identified by the URI "http://en.wikipedia.org/wiki/Suruç"; and
- it is the same resource as that identified by another URI: "http://pleiades.stoa.org/places/658405#this".'
(Folks familiar with what Sean Gillies has done for the Pleiades RDF will recognize my debt to him in the what proceeds.)
But there are plenty of cases in which just issuing a couple of triples to encode an assertion about something isn't sufficient; we need to be able to assign responsibility/origin for those assertions and to link them to supporting argument and evidence (i.e., standard scholarly citation practice). For this purpose, we're very pleased by the Open Annotation Collaboration, whose Open Annotation Data Model was recently updated and expanded in the form of a W3C Community Draft (8 February 2013) (the participants in Pelagios use basic OAC annotations to assert geographic relationships between their data and Pleiades places).
A basic OADM annotation uses a series of RDF triples to link together a "target" (the thing you want to make an assertion about) and a "body" (the content of your assertion). You can think of them as footnotes. The "target" is the range of text after which you put your footnote number (only in OADM you can add a footnote to any real, conceptual, or digital thing you can identify) and the "body" is the content of the footnote itself. The OADM draft formally explains this structure in section 2.1. This lets me add an annotation to the resource from our example above (the ancient city of Serugh) by using the URI "http://syriaca.org/place/45" as the target of an annotation) thus:
<http://syriaca.org/place/45/anno/desc6> a oa:Annotation ;
oa:hasBody <http://syriaca.org/place/45/anno/desc6/body> ;
oa:hasTarget <http://syriaca.org/place/45> ;
oa:motivatedBy oa:describing ;
oa:annotatedBy <http://syriaca.org/editors.xml#tcarlson> ;
oa:annotatedAt "2013-04-03T00:00:01Z" ;
oa:serializedBy <https://github.com/paregorios/srpdemo1/blob/master/xsl/place2ttl.xsl> ;
oa:serializedAt "2013-04-17T13:35:05.771-05:00" .
<http://syriaca.org/place/45/anno/desc6/body> a cnt:ContentAsText, dctypes:Text ;
cnt:chars "an ancient town, formerly located near Sarug."@en ;
dc:format "text/plain" ;
I hope you'll forgive me for not spelling that all out in plain text, as all the syntax and terms are explained in the OADM. What I'm concerned about in this blog post is really what the OADM doesn't explicitly tell me how to do, namely: show that the annotation body is actually a quotation from a published book. The verb oa:annotatedBy lets me indicate that the annotation itself was made (i.e., the footnote was written) by a resource identified by the URI "http://syriaca.org/editors.xml#tcarlson". If I'd given you a few more triples, you could have figured out that that resource is a real person named Thomas Carlson, who is one of the editors working on the Syriac Reference Portal project. But how do I indicate (as he very much wants to do because he's a responsible scholar and has no interest in plagiarizing anyone) that he's deliberately quoting a book called The Scattered Pearls: A History of Syriac Literature and Sciences? Here's what I came up with (using terms from Citation Typing Ontology and the DCMI Metadata Terms):
<http://syriaca.org/place/45/anno/desc7/body> a cnt:ContentAsText, dctypes:Text ;
cnt:chars "a small town in the Mudar territory, between Ḥarran and Jarabulus. [Modern name, Suruç (tr.)]"@en ;
dc:format "text/plain" ;
cito:citesAsSourceDocument <http://www.worldcat.org/oclc/255043315> ;
dcterms:biblographicCitation "The Scattered Pearls: A History of Syriac Literature and Sciences, p. 558"@en .
The addition of the triple containing cito:citesAsSourceDocument lets me make a machine-actionable link to the additional structured bibliographic data about the book that's available at Worldcat (but it doesn't say anything about page numbers!). The addition of the triple containing dcterms:bibliographicCitation lets me provide a human-readable citation.
I'd love to have feedback on this approach from folks in the OAC, CITO, DCTERMS, and general linked data communities. Could I do better? Should I do something differently?
The SRP team is currently evaluating a sample batch of such annotations, which you're also welcome to view. The RDF can be found here. These files are generated from the TEI XML here using the XSLT here.
Labels:
bibliography,
collaboration,
concordia,
data,
interop,
isaw,
lawdi,
linkeddata,
LOD,
neogeography,
patterns,
pelagios,
pleiades,
rdf,
tei,
xml,
xslt
Thursday, October 4, 2012
Pleiades Machine Tags for Blog Posts? Yes!
So, a few minutes ago I noticed a new post in my feed reader from a blog I've admired for a while: Javier Andreu Pintado's Oppida Imperii Romani. I've thought for a long time that I ought to get in touch with him (we don't know each other from Adam as far as I know) and see if we could figure out a more-or-less automated way to get his posts to show up on the associated Pleiades pages.
Then it hit me:
Why can't we just use labels incorporating Pleiades IDs like we've been doing with machine tags on Flickr and query the Blogger API to get the associated posts?
Why not indeed. It turns out it just works.
To test, I added the string "pleiades:depicts=579885" as a label on my blog post from last December, "Pleiades, Flickr, and the Ancient World Image Bank" (Since that tag is used in an example in that post. I recognize that the blog post doesn't actually depict that place, which is what that label term ought to mean, but this is just a test).
Then I went to the Google APIs Explorer page for the Blogger "list posts" function (which I found by googling) and entered by blog's ID and the label string in the appropriate fields.
And, in a matter of milliseconds, I got back a JSON representation of my blog post.
So now I'm thinking we might explore the possibility of creating a widget on Pleiades place pages to feature blog posts tagged like this from selected blogs. It appears that, to execute the API queries against Blogger, we have to do them blog-by-blog with known IDs, but that's probably OK anyway so we can curate the process of blog selection and prevent spam.
It occurs to me that the Pelagios community might be interested in looking at this approach in order to build a gateway service to inject blog posts into the Pelagios network.
And while I'm name-checking, I wonder if any Wordpress aficionados out there can come up with a functionally equivalent mechanism.
Then it hit me:
Why can't we just use labels incorporating Pleiades IDs like we've been doing with machine tags on Flickr and query the Blogger API to get the associated posts?
Why not indeed. It turns out it just works.
To test, I added the string "pleiades:depicts=579885" as a label on my blog post from last December, "Pleiades, Flickr, and the Ancient World Image Bank" (Since that tag is used in an example in that post. I recognize that the blog post doesn't actually depict that place, which is what that label term ought to mean, but this is just a test).
Then I went to the Google APIs Explorer page for the Blogger "list posts" function (which I found by googling) and entered by blog's ID and the label string in the appropriate fields.
And, in a matter of milliseconds, I got back a JSON representation of my blog post.
So now I'm thinking we might explore the possibility of creating a widget on Pleiades place pages to feature blog posts tagged like this from selected blogs. It appears that, to execute the API queries against Blogger, we have to do them blog-by-blog with known IDs, but that's probably OK anyway so we can curate the process of blog selection and prevent spam.
It occurs to me that the Pelagios community might be interested in looking at this approach in order to build a gateway service to inject blog posts into the Pelagios network.
And while I'm name-checking, I wonder if any Wordpress aficionados out there can come up with a functionally equivalent mechanism.
Labels:
blogs,
flickr,
interop,
lawdi,
linkeddata,
machine tags,
pelagios,
pleiades
Sunday, June 24, 2012
"About Roman Emperors" open dataset updated
The latest version is now online at the base URI: http://www.paregorios.org/resources/roman-emperors/. Major updates:
- URIs for emperor profile docs (with links to coinage) on the Portable Antiquities Scheme website (courtesy Dan Pett)
- URIs for emperors as coined by the nomisma.org project (courtesy Dan Pett)
- More viaf.org IDs for emperors (courtesy Dan Pett)
- More alternate names (courtesy Roko Rumora)
- More detail and description of third-party resources in both the HTML and RDF
- Slightly more readable HTML pages
- Complete dump files now available in CSV, RDF+XML, and Turtle
Friday, June 22, 2012
"About Roman Emperors" linked dataset published
A few days ago I blogged about an open linked dataset about Roman Emperors. I've now more formally published the dataset online at http://www.paregorios.org/resources/roman-emperors/.
I'll be adding more features and data, and improving the dataset description in coming weeks. More information on how to contribute is also forthcoming (and I have a couple of early contributions by others to incorporate as soon as possible!).
I'll blog more here with the label romemplod whenever there's a significant update.
I'll be adding more features and data, and improving the dataset description in coming weeks. More information on how to contribute is also forthcoming (and I have a couple of early contributions by others to incorporate as soon as possible!).
I'll blog more here with the label romemplod whenever there's a significant update.
Tuesday, June 19, 2012
Roman Emperors as Linked Data
You can jump right to the roman-emperors github repository here. I repeat the README file here for the benefit of those who'd rather look before they leap:
Both RDF (Turtle) and CSV versions are included.
This dataset uses the published dbpedia resource URIs for Roman Emperors (the persons themselves) as a starting point for making useful assertions about these individuals in the linked data space. The main goal is to align these URIs with any other key URIs (now or in the future) for the same persons and then to attribute these "same as" relationships with links to descriptive documents or other data that have not so far made it into the linked data graph (especially legacy web resources). Multiple names for the emperors are only incidental to the dataset; no attempt is being made to produce (in this dataset) a comprehensive set of alternate names.It's still a work in progress, but I've made it available under the Open Data Commons Public Domain Dedication and License so anyone who's interested can pitch in and help, or make use of it freely.
Both RDF (Turtle) and CSV versions are included.
Friday, June 15, 2012
People from my dissertation in RDF
On the road to turning my dissertation into linked data I've minted URIs for, and produced basic RDF for, all of the historical individuals I dealt with examining boundary disputes internal to the early Roman empire.
I used foaf:Person, foaf:name, and bio:olb (the latter from the BIO Vocabulary for Biographical Information, developed by Ian Davis and David Galbraith). The Roman emperors who appear in my list have been aligned to dbpedia resources using owl:sameAs. I intend to do more alignments in future to resources like dbpedia and viaf.org.
Here's the XML I started from (part of an Open Document Text format file I converted from Word), and the XSLT I used to produce the Turtle RDF, which was then cleaned up by hand.
More to come.
I used foaf:Person, foaf:name, and bio:olb (the latter from the BIO Vocabulary for Biographical Information, developed by Ian Davis and David Galbraith). The Roman emperors who appear in my list have been aligned to dbpedia resources using owl:sameAs. I intend to do more alignments in future to resources like dbpedia and viaf.org.
Here's the XML I started from (part of an Open Document Text format file I converted from Word), and the XSLT I used to produce the Turtle RDF, which was then cleaned up by hand.
More to come.
Labels:
boundaries,
demarc,
lawdi,
LOD
Saturday, June 2, 2012
How to get a born-for-print bibliography into RDF
It began life as a Word file for a printed-on-paper dissertation. I want it to become linked data so that I can hook up other linked data I'm putting online. Here's a quick-and-basic way that involves no programming, writing of scripts, or other computational heroics on my part:
- Open the Word file in Libre Office and save it (download copy here). The basic structure puts one citation per paragraph, with a tab dividing a short title from a full citation. E.g.:
Ager 1989 S. Ager, “Judicial Imperialism: the Case of Melitaia,” AHB 3.5 (1989) 107-114. Ager 1996 S. Ager, Interstate arbitrations in the Greek world, 337-90 B.C., Berkeley, 1996. Aichinger 1982 A. Aichinger, “Grenzziehung durch kaiserliche Sonderbeauftragte in den römischen provinzen,” ZPE 48 (1982) 193-204.
- Rip out everything (like title, introductory materials, etc.) that's not the list of short titles and citations (download copy here).
- "Save as ..." -> File Type = "text encoded" (select the "edit filter settings" checkbox) -> "Save" -> (in filter options, make sure "Unicode (UTF-8)" is the chosen encoding) -> "OK" (see here).
- Close the text file in Libre Office.
- Open a new spreadsheet file in Libre Office (don't use Excel for this; it will make a mess of your Unicode text. Ditto exporting to CSV from Word)
- "File" -> "Open..." -> File Type = "Text CSV (*.csv, *.txt)" -> "Open"
- In the "Text Import" dialog box, make sure the character set is "Unicode (UTF-8)" and change the "separator" from "comma" to "tab"
- Click "OK"
- Make sure the spreadsheet gives you two columns (one for the short title and the other for the full citation).
- Add an empty top row and in the first cell type "shortTitle" (no quotes). Enter the string "shortDescription" in the second cell (no quotes). Save the file (still in the tab-delimited format). (see here).
- If you have python installed on your computer, download the tab2n3.py script from the W3C website and save it into the same folder as your data.
- Open a command window or terminal and navigate to the folder where your data is.
- Type the following:
$ python tab2n3.py -id -schema -namespace http://purl.org/ontology/bibo/ < BoundaryDisputesJustDataHeadings.csv > BoundaryDisputes.ttl
- Open the resulting ttl file in the text-editor of your choice. You've got RDF! (see here).
Labels:
bibliography,
lawdi,
linkeddata,
LOD
Friday, June 1, 2012
Ancient Studies Needs Open Bibliographic Data and Associated URIs
Update 1: links throughout, minor formatting changes, proper Creative Commons Public Domain tools, parenthetical about import path from Endnote and such, fixing a few typos.
The NEH-funded Linked Ancient World Data Institute, still in progress at ISAW, has got me thinking about a number of things. One of them is bibliography and linked data. Here's a brain dump, intended to spark conversation and collaboration.
What We Need
Why
Bibliographic and citation collection and management are integral to every research and publication in project in ancient studies. We could save each other a lot of time, and get more substantive work done in the field, if it was simpler and easier to do. We could more easily and effectively tie together disparate work published on the web (and appearing on the web through retrospective digitization) if we had a common infrastructure and shared point of reference. There's already a lot of digital data in various hands that could support such an effort, but a good chunk of it is not out where anybody with good will and talent can get at it to improve it, build tools around it, etc.
What I Want You (and Me) To Do If You Have Bibliographic Data
First of all, I'm talking to myself, my collaborators, and my team-mates at ISAW. I intend to eat my own dogfood.
Here are other institutions and entities I know about who have potentially useful data.
The NEH-funded Linked Ancient World Data Institute, still in progress at ISAW, has got me thinking about a number of things. One of them is bibliography and linked data. Here's a brain dump, intended to spark conversation and collaboration.
What We Need
- As much bibliographic data as possible, for both primary and secondary sources (print and digital), publicly released to third parties under either a public domain declaration or an unrestrictive open license.
- Stable HTTP URIs for every work and author included in those datasets.
Why
Bibliographic and citation collection and management are integral to every research and publication in project in ancient studies. We could save each other a lot of time, and get more substantive work done in the field, if it was simpler and easier to do. We could more easily and effectively tie together disparate work published on the web (and appearing on the web through retrospective digitization) if we had a common infrastructure and shared point of reference. There's already a lot of digital data in various hands that could support such an effort, but a good chunk of it is not out where anybody with good will and talent can get at it to improve it, build tools around it, etc.
What I Want You (and Me) To Do If You Have Bibliographic Data
- Release it to the world through a third party. No matter what format it's in, give a copy to someone else whose function is hosting free data on the web. Dump it into a public repository at github.com or sourceforge.net. Put it into a shared library at Zotero, Bibsonomy, Mendeley, or another bibliographic content website (most have easy upload/import paths from Endnote, and other citation management applications). Hosting a copy yourself is fine, but giving it to a third party demonstrates your bona fides, gets it out of your nifty but restrictive search engine or database, and increments your bus number.
- Release it under a Creative Commons Public Domain Mark or Public Domain Dedication (CC0). Or if you can't do that, find as open a Creative Commons or similar license as you can. Don't try to control it. If there's some aspect of the data that you can't (because of rights encumberance) or don't want to (why?) give away to make the world a better place, find a quick way to extract, filter, or excerpt that aspect and get the rest out.
- Alert the world to your philanthropy. Blog or tweet about it. Post a link to the data on your institutional website. Above all, alert Chuck Jones and Phoebe Acheson so it gets announced via Ancient World Online and/or Ancient World Open Bibliographies.
- Do the same if you have other useful data, like identifiers for modern or ancient works or authors.
- Get in touch with me and/or anyone else to talk about the next step: setting up stable HTTP URIs corresponding to this stuff.
First of all, I'm talking to myself, my collaborators, and my team-mates at ISAW. I intend to eat my own dogfood.
Here are other institutions and entities I know about who have potentially useful data.
- The Open Library : data about books is already out there and available, and there are ways to add more
- Perseus Project : a huge, FRBR-ized collection of MODS records for Greek and Latin authors, works, and modern editions thereof.
- Center for Hellenic Studies: identifiers for Greek and Latin authors and works
- L'Année Philologique and its institutional partners like the American Philological Association: the big collection of analytic secondary bibliography for classics (journal articles)
- TOCS-IN: a collaboratively collected batch of analytic secondary bibliography for classics
- Papyri.info and its contributing project partners: TEI bibliographic records for much of the bibliography produced for or cited by Greek and Latin papyrologists (plus other ancient language/script traditions in papyrology)
- Gnomon Bibliographische Datenbank: masses of bibliographic data for books and articles for classics
- Any and every university library system that has a dedicated or easily extracted set of associated catalog records. Especially any with unique collections (e.g., Cincinnati) or those with databases of analytical bibliography down to the level of articles in journals and collections.
- Ditto any and every ancient studies digital project that has bibliographic data in a database.
Comments, Reactions, Suggestions
Welcome, encouraged, and essential. By comment here or otherwise (but not private email please!).
Tuesday, February 7, 2012
Playing with PELAGIOS: Open Context and Labels
Latest in the Playing with PELAGIOS series.
I've just modified the tooling and re-run the Pleiades-oriented-view-of-the-GAWD report to include the RDF triples just published by Open Context and to exploit, when available, rdfs:label on the annotation target in order to produce more human-readable links in the HTML output. This required the addition of an OPTIONAL clause to the SPARQL query, as well as modifications to the results-processing XSLT. The new versions are indicated/linked on the report page.
You can see the results of these changes, for example, in the Antiochia/Theoupolis page.
I've just modified the tooling and re-run the Pleiades-oriented-view-of-the-GAWD report to include the RDF triples just published by Open Context and to exploit, when available, rdfs:label on the annotation target in order to produce more human-readable links in the HTML output. This required the addition of an OPTIONAL clause to the SPARQL query, as well as modifications to the results-processing XSLT. The new versions are indicated/linked on the report page.
You can see the results of these changes, for example, in the Antiochia/Theoupolis page.
Labels:
gawd,
lawdi,
linkeddata,
LOD,
opencontext,
pelagios,
pelagiosplay,
sparql
Monday, February 6, 2012
Playing with PELAGIOS: The GAWD is Live
The is the lastest in an on-going series chronicling my dalliances with data published by the PELAGIOS project partners.
I think it's safe to say that, thanks to the PELAGIOS partner institutions, that we do have a Graph of Ancient World Data (GAWD) on the web. It's still in early stages, and one has to do some downloading, unzipping, and so forth to engage with it at the moment, but indeed the long-awaited day has dawned.
Here's the perspective, as of last Friday, from the vantage point of Pleiades. I've used SPARQL to query the GAWD for all information resources that the partners claim (via their RDF data dumps) are related to Pleiades information resources. I.e., I'm pulling out a list of information resources about texts, pictures, objects, grouped by their relationships to what Pleiades knows about ancient places (findspot, original location, etc.). I've sorted that view of the graph by the titles Pleiades gives to its place-related information resources and generated an HTML view of the result. It's here for your browsing pleasure.
Next Steps and Desiderata
For various technical reasons, I'm not yet touching the data of a couple of PELAGIOS partners (CLAROS and SPQR), but the will hopefully be resolved soon. I still need to dig into figuring out what Open Context is doing on this front. Other key resources -- especially those emanating from ISAW -- are not yet ready to produce RDF (but we're working on it).
There are a few things I'd like the PELAGIOS partners to consider/discuss adding to their data:
I think it's safe to say that, thanks to the PELAGIOS partner institutions, that we do have a Graph of Ancient World Data (GAWD) on the web. It's still in early stages, and one has to do some downloading, unzipping, and so forth to engage with it at the moment, but indeed the long-awaited day has dawned.
Here's the perspective, as of last Friday, from the vantage point of Pleiades. I've used SPARQL to query the GAWD for all information resources that the partners claim (via their RDF data dumps) are related to Pleiades information resources. I.e., I'm pulling out a list of information resources about texts, pictures, objects, grouped by their relationships to what Pleiades knows about ancient places (findspot, original location, etc.). I've sorted that view of the graph by the titles Pleiades gives to its place-related information resources and generated an HTML view of the result. It's here for your browsing pleasure.
Next Steps and Desiderata
For various technical reasons, I'm not yet touching the data of a couple of PELAGIOS partners (CLAROS and SPQR), but the will hopefully be resolved soon. I still need to dig into figuring out what Open Context is doing on this front. Other key resources -- especially those emanating from ISAW -- are not yet ready to produce RDF (but we're working on it).
There are a few things I'd like the PELAGIOS partners to consider/discuss adding to their data:
- Titles/labels for the information resources (using rdfs:label?). This would make it possible for me to produce more intuitive/helpful labels for users of my HTML index. Descriptions would be cool too. As would some indication of the type of thing(s) a given resource addresses (e.g., place, statue, inscription, text)
- Categorization of the relationships between their information resources and Pleaides information resources. Perhaps some variation of the terms originally explored by Concordia (whence the GAWD moniker), as someone on the PELAGIOS list has already suggested.
What would you like to see added to the GAWD? What would you do with it?
Thursday, February 2, 2012
Playing with PELAGIOS: Dealing with a bazillion RDF files
Latest in a Playing with PELAGIOS series
Some of the PELAGIOS partners distribute their annotation RDF in a relatively small number of files. Others (like SPQR and ANS) have a very large number of files. This makes the technique I used earlier for adding triples to the database ungainly. Fortunately, 4store provides some command line methods for loading triples.
First, stop the 4store http server (why?):
Note the use of the -a option on 4s-import to ensure the triples are added to the current contents of the database, rather than replacing them! Note also the -v option, which is what gives you the report (otherwise, it's silent and that makes my ctrl-c finger twitchy).
Now, back to the SPARQL mines.
Some of the PELAGIOS partners distribute their annotation RDF in a relatively small number of files. Others (like SPQR and ANS) have a very large number of files. This makes the technique I used earlier for adding triples to the database ungainly. Fortunately, 4store provides some command line methods for loading triples.
First, stop the 4store http server (why?):
$ killall 4s-httpd
Try to import all the RDF files. Rats!$ 4s-import -a pelagios *.rdf
-bash: /Applications/4store.app/Contents/MacOS/bin/4s-import: Argument list too long
Bash to the rescue (but note that doing one file at a time has a cost on the 4store side):$ for f in *.rdf; do 4s-import -av pelagios $f; done
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.00000.rdf>
Pass 1, processed 10 triples (10)
Pass 2, processed 10 triples, 8912 triples/s
Updating index
Index update took 0.000890 seconds
Imported 10 triples, average 4266 triples/s
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.101.rdf>
Pass 1, processed 11 triples (11)
Pass 2, processed 11 triples, 9856 triples/s
Updating index
Index update took 0.000936 seconds
Imported 11 triples, average 4493 triples/s
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.10176.rdf>
Pass 1, processed 8 triples (8)
Pass 2, processed 8 triples, 6600 triples/s
Updating index
Index update took 0.000892 seconds
Imported 8 triples, average 3256 triples/s
...
This took a while. There are 86,200 files in the ANS annotation batch.Note the use of the -a option on 4s-import to ensure the triples are added to the current contents of the database, rather than replacing them! Note also the -v option, which is what gives you the report (otherwise, it's silent and that makes my ctrl-c finger twitchy).
Now, back to the SPARQL mines.
Labels:
lawdi,
linkeddata,
LOD,
pelagios,
pelagiosplay,
pleiades
Wednesday, February 1, 2012
Playing with PELAGIOS: Nomisma
So, I want to see how hard it is to query the RDF that PELAGIOS partners are putting together. The first experiment is documented below.
Step 1: Set up a Triplestore (something to load the RDF into and support queries)
Context: I'm a triplestore n00b.
I found Jeni Tennison's Getting Started with RDF and SPARQL Using 4store and RDF.rb and, though I had no interest in messing around with Ruby as part of this exercise, the recommendation of 4store as a triplestore sounded good, so I went hunting for a Mac binary and downloaded it.
Step 2: Grab RDF describing content in Nomisma.org
Context: I'm a point-and-click expert.
I downloaded the PELAGIOS-conformant RDF data published by Nomisma.org at http://nomisma.org/nomisma.org.pelagios.rdf.
Background: "Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities, for example the generic idea of a coin hoard or an actual hoard as documented in the print publication An Inventory of Greek Coin Hoards (IGCH)."
Step 3: Fire up 4store and load in the nomisma.org
Context: I'm a 4store n00b, but I can cut and paste, read and reason, and experiment.
Double-clicked the 4store icon in my Applications folder. It opened a terminal window.
To create and start up an empty database for my triples, I followed the 4store instructions and Tennison's post (mutatis mutandis) and so typed the following in the terminal window ("pelagios" is the name I gave to my database; you could call yours "ray" or "jay" if you like):
Step 4: Try to construct a query and dig out some data.
Context: I'm a SPARQL n00b, but I'd done some SQL back in the day and XML and namespaces are pretty much burned into my soul at this point.
Following Tennison's example, I pointed my browser at http://localhost:8080/test/. I got 4store's SPARQL test query interface. I googled around looking grumpily at different SPARQL "how-tos" and "getting starteds" and trying stuff and pondering repeated failure until this worked:
Step 1: Set up a Triplestore (something to load the RDF into and support queries)
Context: I'm a triplestore n00b.
I found Jeni Tennison's Getting Started with RDF and SPARQL Using 4store and RDF.rb and, though I had no interest in messing around with Ruby as part of this exercise, the recommendation of 4store as a triplestore sounded good, so I went hunting for a Mac binary and downloaded it.
Step 2: Grab RDF describing content in Nomisma.org
Context: I'm a point-and-click expert.
I downloaded the PELAGIOS-conformant RDF data published by Nomisma.org at http://nomisma.org/nomisma.org.pelagios.rdf.
Background: "Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities, for example the generic idea of a coin hoard or an actual hoard as documented in the print publication An Inventory of Greek Coin Hoards (IGCH)."
Step 3: Fire up 4store and load in the nomisma.org
Context: I'm a 4store n00b, but I can cut and paste, read and reason, and experiment.
Double-clicked the 4store icon in my Applications folder. It opened a terminal window.
To create and start up an empty database for my triples, I followed the 4store instructions and Tennison's post (mutatis mutandis) and so typed the following in the terminal window ("pelagios" is the name I gave to my database; you could call yours "ray" or "jay" if you like):
$ 4s-backend-setup pelagios
$ 4s-backend pelagios
Then I started up 4store's SPARQL http server and aimed it at the still-empty "pelagios" database so I could load my data and try my hand at some queries:$ 4s-httpd pelagios
Loading the nomisma data was then as simple as moving to the directory where I'd saved the RDF file and typing:$ curl -T nomisma.org.pelagios.rdf 'http://localhost:8080/data/http://nomisma.org/nomisma.org.pelagios.rdf/'
Note how the URI base for nomisma items is appended to the URL string passed via curl. This is how you specify the "model URI" for the graph of triples that gets created from the RDF.Step 4: Try to construct a query and dig out some data.
Context: I'm a SPARQL n00b, but I'd done some SQL back in the day and XML and namespaces are pretty much burned into my soul at this point.
Following Tennison's example, I pointed my browser at http://localhost:8080/test/. I got 4store's SPARQL test query interface. I googled around looking grumpily at different SPARQL "how-tos" and "getting starteds" and trying stuff and pondering repeated failure until this worked:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oac: <http://www.openannotation.org/ns/>
SELECT ?x
WHERE {
?x oac:hasBody <http://pleiades.stoa.org/places/462086> .
}
That's "find the ID of every OAC Annotation in the triplestore that's linked to Pleiades Place 462086" (i.e., Akragas/Agrigentum, modern Agrigento in Sicily). It's a list like this:
- http://nomisma.org/nomisma.org.pelagios.rdf#igch1910-agrigentum-5
- http://nomisma.org/nomisma.org.pelagios.rdf#igch2089-agrigentum-24
- http://nomisma.org/nomisma.org.pelagios.rdf#igch2101-agrigentum-32
- ...
51 IDs in all.
But what I really want is a list of the IDs of the nomisma entities themselves so I can go look up the details and learn things. Back to the SPARQL mines until I produced this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oac: <http://www.openannotation.org/ns/>
SELECT ?nomismaid
WHERE {
?x oac:hasBody <http://pleiades.stoa.org/places/462086> .
?x oac:hasTarget ?nomismaid .
}
Now I have a list of 51 nomisma IDs: one for the mint and 50 coin hoards that illustrate the economic network in which the ancient city participated (e.g., http://nomisma.org/id/igch2081).
Cost: about 2 hours of time, 1 cup of coffee, and three favors from Sebastian Heath on IRC.
Up next: Arachne, the object database of the Deutsches Archäologisches Institut.
Cost: about 2 hours of time, 1 cup of coffee, and three favors from Sebastian Heath on IRC.
Up next: Arachne, the object database of the Deutsches Archäologisches Institut.
Labels:
lawdi,
linkeddata,
LOD,
nomisma,
pelagios,
pelagiosplay,
pleiades
Subscribe to:
Posts (Atom)