SyntaxHighlighter

Thursday, December 6, 2012

Pleiades Hackday: Improving Descriptions

A Pleiades hack day is now underway in the Pleiades IRC channel. We've decided to focus on cleaning up descriptions for Pleiades place resources on the island of Sardinia. The following query gives you most of the Pleiades content for Sardinia and Corsica:

http://tinyurl.com/aotxaum

Here's our process:


  1. Pick a place to improve
  2. Drop the title and URL into the IRC channel, so everyone knows you're working on it
  3. Check out a working copy of the place
  4. In your working copy, write a better description and connect the place to the place resource for the island you're working on.
  5. Submit for review
  6. Ping paregorios or servilius_ahala in IRC to get it reviewed and published

We're also building an FAQ on the subject of "What Makes a Good Pleiades Description?".

Monday, December 3, 2012

Latest at Maia and Electra

In response to a request from the author, I've pulled the feed for http://www.iosa.it from the Maia and Electra feed aggregators (is is moribund). I have added to both the feed for Stefano Costa's blog.

Thursday, October 4, 2012

New Blogs Added to Planet Atlantides

The following blogs have been added to the Maia Atlantis feed aggregator:

The following blogs have been added to the Electra Atlantis feed aggregator:
  • Ancient World Mapping Center
  • Archeomatica: Tecnologie per i Beni Culturali

Pleiades Machine Tags for Blog Posts? Yes!

So, a few minutes ago I noticed a new post in my feed reader from a blog I've admired for a while: Javier Andreu Pintado's Oppida Imperii Romani. I've thought for a long time that I ought to get in touch with him (we don't know each other from Adam as far as I know) and see if we could figure out a more-or-less automated way to get his posts to show up on the associated Pleiades pages.

Then it hit me:

Why can't we just use labels incorporating Pleiades IDs like we've been doing with machine tags on Flickr and query the Blogger API to get the associated posts?

Why not indeed. It turns out it just works.

To test, I added the string "pleiades:depicts=579885" as a label on my blog post from last December, "Pleiades, Flickr, and the Ancient World Image Bank" (Since that tag is used in an example in that post. I recognize that the blog post doesn't actually depict that place, which is what that label term ought to mean, but this is just a test).

Then I went to the Google APIs Explorer page for the Blogger "list posts" function (which I found by googling) and entered by blog's ID and the label string in the appropriate fields.



And, in a matter of milliseconds, I got back a JSON representation of my blog post.



So now I'm thinking we might explore the possibility of creating a widget on Pleiades place pages to feature blog posts tagged like this from selected blogs. It appears that, to execute the API queries against Blogger, we have to do them blog-by-blog with known IDs, but that's probably OK anyway so we can curate the process of blog selection and prevent spam.

It occurs to me that the Pelagios community might be interested in looking at this approach in order to build a gateway service to inject blog posts into the Pelagios network.

And while I'm name-checking, I wonder if any Wordpress aficionados out there can come up with a functionally equivalent mechanism.

Tuesday, August 28, 2012

Text of my talk at CIEGL 2012


CIEGL 2012 Paper: Efficient Digital Publishing for Inscriptions
cc-by

2. I considered giving this talk the following title:

why build a submarine to cross the Tiber?

It's a question we've heard a lot over the years in various forms. And by "we" I mean not just digital epigraphers -- if you'll accept such an appellation -- but the large and growing number of scholars and practitioners across the humanities who seek to bring computational methods to bear on the evidence, analysis, and publication of our scholarly work.

3. To build a submarine.

The phrase implies something complicated, expensive, time-consuming. Something with special capabilities. Perhaps a little bit dangerous.

4. To cross the Tiber.

Something we know how to do and have been doing for years using a small number of well-known techniques (bridges, boats, swimming). Something commonplace and, given its ubiquity, easy and inexpensive (at least if calculated per trip over time).

Whether asked rhetorically or in earnest, it's a question that deserves an answer. Time is precious. Funds are limited. There are many texts.

But maybe we don't just want to cross the Tiber. Maybe we want to explore the oceans.  And this is the point: for what reasons are we publishing inscriptions in digital form? Are the tools and methods we use fit for that purpose?

5. Why are we digitizing inscriptions? Why are we digitally publishing inscriptions? What uses do we hope to make of digital epigraphy?

I think it's safe to say that no sane person would prefer to lose the ability to search the epigraphic texts that are now available digitally. By my calculations, that's perhaps fifty or sixty percent of the Greek and Latin epigraphic harvest, probably a bit less if we successfully resolved all the duplicates within and across databases. Do you look forward to a day when all Greek and Latin inscriptions can be searched?

So we agree that "search" is a righteous use of digital technology and that we are making good progress toward the goal of "comprehensive search" we set for ourselves in Rome.

6. The relationship between "search" and digital epigraphy was treated at length by Hugh Cayless, Charlotte Roueché, Gabriel Bodard and myself in a 2007 contribution to Digital Humanities Quarterly, part of a themed volume entitled Changing the Center of Gravity: Transforming Classical Studies Through Cyberinfrastructure, which was assembled in honor of the late Ross Scaife.  Authors were asked to review digital developments in various subfields and to imagine the state of that field with respect to digital technology in ten year's time. When we wrote, we observed that the vast majority of digital epigraphic editions were still published solely in print, but we predicted that, by 2017, the situation will have changed drastically. We imagined a world in which computing will be as central to consuming epigraphic publications as it is now in making them.

Yet, the biggest challenge we still face in meeting the goals we identified in the DHQ article by 2017 is in making the transition to online publishing a reality by producing tools that are fit for the purpose. It's now no harder to create a traditional print-style epigraphic edition and put it online in HTML or PDF format than it is to get it ready to publish in print. A growing number of journals and institutional repositories -- though few yet devoted specifically to classics or epigraphy -- can now provide a publication venue for such born-digital editions that meets minimum expectations for stability, accessibility, citation, and preservation. Moreover,  I can't imagine that any of the major database projects would refuse the gift of clean digital texts and basic descriptive data corresponding to new publications in print or online, although the speed with which they might be able to incorporate same into their databases would be a function of available time and manpower.

7. But this scenario still assumes the old underlying assumptions inherited from a dead-tree era: an epigraphic publication is the work of an individual or small number of scholars, brought forth in a static format that, once discovered, must be read and understood by humans before it can be disassembled -- more or less by hand -- and absorbed into someone else's discovery or research process. From that process, new results will eventually emerge and get published in a similar way.

This is inefficient.

Time is precious. We are few. There are many texts and many questions to answer. Why are humans still doing work better fit for machines?

8. If we are to embrace and exploit the full range of benefits offered by the digital world, we have to remake our suppositions about not only publication, but about the entire research process. To the extent possible, our epigraphic publications must not only be online, but they must also meet certain additional criteria: universal discoverability, stability of access, reliability of citation, ubiquity of descriptive terminology, facility for download, readiness for reuse and remixing. Further, they must become open to incremental revision and microcontribution at levels below that of the complete text or edition.

Universal discoverability means that our editions -- and the discrete elements of their context -- must be discoverable in every way their potential readers and users might wish. So, yes, we must be able to search and browse for them via the homepage of an individual database or online publication, but they also must surface in Google, Bing, and their successors, as well as in library catalogs, introductory handbooks, and course materials. Links, created manually or automatically, ought to bring users to inscriptions from other web resources for ancient studies. Other special-purpose digital systems ought to be able to discover and access epigraphic content via open application programming interfaces. Changes and updates to contents should be reflected not only in a human-readable web page or printed conference handout, but also with a live web feed on the site that can be consumed and syndicated by automatic readers and aggregators.

Stability of access means that I should be able to revisit a given online publication at exactly the same web address I used to read it a month ago. A year ago. Ten years ago. For as long as the web exists. Don't make me run the query again. Ideally, that web address -- the Universal Resource Identifier or URI -- will be as short and easy to remember as possible.

Then there's Reliability of citation. If we cannot cite digital publications in a consistent and dependable way, those publications are of no value to the scholarly enterprise. I cannot cite your text for comparison, acknowledge your argument in making my own, or do any of the other things for which we must make scholarly reference if your online publication is not readily citable. This implies more than just stability of access, though that is essential. It also means that you must give me a way to include a reference to the appropriate part of the publication in the URI. So, if you're publishing 10 or 1,000 or 100,000 inscriptions online, you should give me a URI for each one of them. Just posting a PDF or returning a composite page of search results containing all of them doesn't cut it.

[[[Ubiquity of descriptive terminology]]]

Suppose I want to find all funerary inscriptions that have been published online that were originally written in Greek or Latin and that likely date to the third century after Christ. What if I want to narrow that group of inscriptions to a particular geographic context or pull out only those whose texts that contain Roman tria nomina?  We need an agreed mechanism for structuring and expressing the requisite descriptive elements in a standard, discoverable manner on the web.

There is an emerging web standard for this purpose. It is called Linked Data and, though we do not have time to explore it in detail here today, I believe that there is an urgent and immediate need for a collaborative effort, with real but modest funding behind it, to define and publish descriptive vocabularies for epigraphy that can be used in linked data applications. This work should build upon the metadata normalization efforts already well underway within the EAGLE consortium, and it might well benefit from the complementary work done a few years ago by EpiDoc volunteers to produce multi-language translations of many of the terms used by the Rome database. This would allow us to use a common set of machine-actionable terms for such key aspects as material, type of support, identifiable personages and places, mode of inscription, and date.

There is already a collaborative Linked Data mechanism for communicating geographic relationships in ancient studies. It is called Pelagios, and it already links XYZ different projects in various subfields of ancient studies to the Pleiades online gazetteer.

By joining Pelagios, epigraphic databases and publications would make their contents immediately discoverable by geographic location alongside the contents of the Perseus digital library, the DAI object database Arachne, the holdings of the British Museum, the archaeological reports published by Fasti Online, the coins database of the American Numismatic Society, and the documentary papyri (among others).

Facility of download and readiness for reuse and remixing. It is here perhaps that the epigraphic community faces its greatest twenty-first century challenge. We must decide no less than whether to embrace or forfeit the full promise of the digital age, for if scholars are unable to use digital surrogates -- programs designed to retrieve, analyze, and reformat data for a specific research need -- and use them across the full corpus of classical epigraphy, we will have forfeited the digital promise.

It is no secret that there is a history of disagreement and conflict in our community around the mere idea of putting published epigraphic texts in a digital database. Though such digitization and distribution is now established practice, the resulting databases and publications still assert a hodge-podge of rights and guidance for use, or else are distinguished by silence on the issue. In some cases, users are debarred from reuse of texts or other information in their own publications.

Let me offer, ex cathedra, a prescription for healing.

First, the social and institutional steps. Each person, project, institution, journal or publisher that puts inscriptions online should make a clear and complete statement of the rights it asserts over the publication, and the behavior it allows and expects of its users. "All rights reserved" or "for personal use only" are regimes that preclude most of the best of what we could do in the digital realm this century. If you choose such a path, you are in my opinion, standing in the way of progress. Moreover, you will find many colleagues who, depending on the legal jurisdiction and their experience, will contest any copyright claims to the text itself, and reuse it anyway. Far better than that fraught scenario is to use a standard open license, or even to make a public domain declaration. There are now several licenses crafted by the Creative Commons and the Open Knowledge Foundation that preserve the copyright and data rights that are permissible by law in your jurisdiction, while freely granting users a range of uses that are consistent with academic needs and practice. By choosing an appropriate CC or OKF license for your epigraphic publication, you will help bring the future.

Technologically, we need to make downloading and reusing easier. "Click here to download" is a good start, but to make serious change we will have to move beyond the PDF file to provide formats that can be chewed up and remixed by computational agents without losing the significance of the more discipline-specific aspects of the presentation. (We will pass over in silence the abomination of posting Microsoft Word files online).

So, an epigraphic edition in HTML or plain text, using standard Unicode encodings for all characters, is an excellent improvement over the PDF. I'd urge you, whenever possible, to go further and provide EpiDoc XML for download. The chief additional virtues of EpiDoc being (with regard to reuse) that the constituent elements of the edition (text, translation, commentary) are distinguished from each other in a consistent, machine-actionable manner, and that the semantics of the sigla you use in the text itself are represented unambiguously for further computational processing.

And here let me offer another parenthetical call for action. If EpiDoc is to live up to position of esteem it has now obtained in Greek and Latin epigraphic circles, we must make it easier and cheaper to use. I'll speak more about some aspects in a minute, but allow me to observe here that there is a critical need to identify and obtain funding -- probably a relatively small amount -- to convene some working meetings aimed at completing and updating the EpiDoc guidelines, and to pay for a postdoc or graduate student to support and contribute to that effort steadily for an academic year or so. The present state of the Guidelines is, I'm sad to say, close to useless. Many good people have tried to redress the problem, but job pressures and the lack of resources necessary to create real working time have so far stymied progress. The state of the EpiDoc Guidelines is a train wreck waiting to happen. We need to fix it.

I'd like to wrap up the discussion of digital benefits with a few words on subject of microcontribution. Microcontribution is another area in which we could enhance scholarly progress in epigraphy. By microcontribution I mean the incorporation into an online, scholarly publication of any contribution of content or interpretation in a unit too small to have been given its own article in a traditional journal. Have you ever read someone else's text and thought "I'd emend this bit differently?" Can you provide a realiable identification for a personal or geographic name that puzzled the original editor? We do have in print articles the genre of "notes on inscriptions" of one type or another -- and the annual bibliographic reviews strive mightily to keep these connected to the published editions for us, but what if that were made to happen automatically in an online environment?

So, are the tools now at hand for digital publication of epigraphy fit for purpose? Do they do all these things for us? Are they effective? Efficient? User-friendly?

No.

Of course, that's partly because we've had such a huge task of catching up to do. But we're making good, consistent progress on retrospective digitization. We cannot ignore the future.

And we are seeing important gains in some areas. The new interface to the Heidelberg database, for example, uses clear, stable URIs for each inscription's record (and for each image and bibliographic entry). Similar URIs were always available, but they were not foregrounded in the application and so required extra effort on the part of a user to discover. But I think you'll all agree that consistency of citation is better supported in the new system. On the reuse front, Heidelberg has long given clear guidance on expectations -- a reused text should indicate its origin in EDH.

But we have a long way to go in other areas. I would encourage those who currently manage (or are planning) large databases or digital corpora of inscriptions to look closely at what the papyrologists have been doing with papyri.info. Not only does the system provide descriptive information, texts, translations, and images -- at varying levels of completeness -- for some 80,000 papyri in a manner consistent with many of the desiderata I have enumerated above, but also it serves as the publication of record for a small but growing number of born-digital papyrological editions and microcontributions, all created and managed in EpiDoc.

The papyrologists have benefited, of course, from the significant largesse of the Andrew W. Mellon Foundation, as well as support from the U.S. National Endowment for the Humanities, in bringing this resource to life. Fortunately, both the institutions involved and the funders felt that it was essential to produce the software under open license, so it's ripe for reuse. But it's a complex piece of software that would require modification and extension to support the needs of epigraphers. It, like the major databases we already have in the field, would need an institutional home, tech support staff, and on-going funding. At NYU we are waiting for the National Endowment for the Humanities to tell us if they will give us funding to begin the customization work that would lead to a version of Papyri.info for epigraphy. If funded, this effort will move forward in collaboration with EAGLE and other major database projects, but we will also seek to provide, as soon as possible, an online environment for the creation, collaborative emendation, and digital publication of epigraphic texts from individuals and projects.



Monday, August 27, 2012

My talk at CIEGL 2012: Efficient Digital Publishing for Inscriptions

I just noticed that the program page for the 14th International Congress of Greek and Latin Epigraphy doesn't have a link to the abstract for the talk I'm giving in John Bodel's panel on "Inscriptions in the Digital World" tomorrow afternoon. So, here it is:


Efficient Digital Publishing for Inscriptions
Tom Elliott
Associate Director for Digital Programs and Senior Research Scholar
Institute for the Study of the Ancient World, New York University

No one would argue at a gathering such as this that it will ever be possible to make it “easy” to publish inscriptions. But we should at least remind ourselves of the range of expertise and the quantity of effort the task demands. For a long time it has been the common – and justified – expectation that preparing digital epigraphic editions requires much more effort and expense than doing the work in print, especially when one attempts to seize opportunities that are unique to or enhanced by the digital context. Indeed, this has sometimes been offered as a sufficient reason not to adopt a digital approach, or to limit oneself to the dissemination of digital facsimiles of a print object (e.g., PDF files).

My paper will focus on struggles and successes in making the EpiDoc approach to epigraphic publication efficient and effective: capitalizing on the benefits of semantic markup and digital access while lowering hurdles to adoption. In particular, we will consider recent developments in open-source software for the iterative, collaborative, and “born-EpiDoc” digital publication of inscriptions and papyri.

Thursday, August 23, 2012

New in Maia: hmmlorientalia

To the Maia Atlantis feed aggregator I've just added Adam McCollum's blog hmmlorientalia, which he glosses as "some remarks—often with photos!—about manuscripts and the languages, literature, scholarship, and history of Christian culture in the Middle East". My thanks to Paul Dilley, whose post today alerted me to McCollum's blog.

Saturday, July 14, 2012

Added Elijah Meeks' blog to Electra Atlantis

How is it that I've neglected to add Digital Humanities Specialist to the Electra Atlantis Feed aggregator so long? One of the great mysteries of our time. At least I've now rectified the oversight.

Wednesday, July 11, 2012

Changes to Maia and Electra Atlantis

I've removed the NEH Office of Digital Humanities feed from the Electra Atlantis feed aggregator as it is returning 404 in the aftermath of NEH's move to a new web platform. I'll reinstate it when the feed is resuscitated.

I've added the following blogs to both Electra and Maia:

Sunday, June 24, 2012

"About Roman Emperors" open dataset updated

The latest version is now online at the base URI: http://www.paregorios.org/resources/roman-emperors/. Major updates:

  • URIs for emperor profile docs (with links to coinage) on the Portable Antiquities Scheme website (courtesy Dan Pett)
  • URIs for emperors as coined by the nomisma.org project (courtesy Dan Pett)
  • More viaf.org IDs for emperors (courtesy Dan Pett)
  • More alternate names (courtesy Roko Rumora)
  • More detail and description of third-party resources in both the HTML and RDF
  • Slightly more readable HTML pages
  • Complete dump files now available in CSV, RDF+XML, and Turtle

Friday, June 22, 2012

"About Roman Emperors" linked dataset published

A few days ago I blogged about an open linked dataset about Roman Emperors. I've now more formally published the dataset online at http://www.paregorios.org/resources/roman-emperors/.

I'll be adding more features and data, and improving the dataset description in coming weeks. More information on how to contribute is also forthcoming (and I have a couple of early contributions by others to incorporate as soon as possible!).

I'll blog more here with the label romemplod whenever there's a significant update.

Tuesday, June 19, 2012

Roman Emperors as Linked Data

You can jump right to the roman-emperors github repository here. I repeat the README file here for the benefit of those who'd rather look before they leap:
This dataset uses the published dbpedia resource URIs for Roman Emperors (the persons themselves) as a starting point for making useful assertions about these individuals in the linked data space. The main goal is to align these URIs with any other key URIs (now or in the future) for the same persons and then to attribute these "same as" relationships with links to descriptive documents or other data that have not so far made it into the linked data graph (especially legacy web resources). Multiple names for the emperors are only incidental to the dataset; no attempt is being made to produce (in this dataset) a comprehensive set of alternate names.
It's still a work in progress, but I've made it available under the Open Data Commons Public Domain Dedication and License so anyone who's interested can pitch in and help, or make use of it freely.

Both RDF (Turtle) and CSV versions are included.

Friday, June 15, 2012

People from my dissertation in RDF

On the road to turning my dissertation into linked data I've minted URIs for, and produced basic RDF for, all of the historical individuals I dealt with examining boundary disputes internal to the early Roman empire.

I used foaf:Person, foaf:name, and bio:olb (the latter from the BIO Vocabulary for Biographical Information, developed by Ian Davis and David Galbraith). The Roman emperors who appear in my list have been aligned to dbpedia resources using owl:sameAs. I intend to do more alignments in future to resources like dbpedia and viaf.org.

Here's the XML I started from (part of an Open Document Text format file I converted from Word), and the XSLT I used to produce the Turtle RDF, which was then cleaned up by hand.

More to come.

Saturday, June 2, 2012

How to get a born-for-print bibliography into RDF

It began life as a Word file for a printed-on-paper dissertation. I want it to become linked data so that I can hook up other linked data I'm putting online. Here's a quick-and-basic way that involves no programming, writing of scripts, or other computational heroics on my part:
  • Open the Word file in Libre Office and save it (download copy here). The basic structure puts one citation per paragraph, with a tab dividing a short title from a full citation. E.g.:  
Ager 1989    S. Ager, “Judicial Imperialism: the Case of Melitaia,” AHB 3.5 (1989) 107-114.
Ager 1996    S. Ager, Interstate arbitrations in the Greek world, 337-90 B.C., Berkeley, 1996.
Aichinger 1982    A. Aichinger, “Grenzziehung durch kaiserliche Sonderbeauftragte in den römischen provinzen,” ZPE 48 (1982) 193-204.
  •  Rip out everything (like title, introductory materials, etc.) that's not the list of short titles and citations (download copy here).
  • "Save as ..." -> File Type = "text encoded" (select the "edit filter settings" checkbox) -> "Save" -> (in filter options, make sure "Unicode (UTF-8)" is the chosen encoding) -> "OK" (see here).
  • Close the text file in Libre Office.
  • Open a new spreadsheet file in Libre Office (don't use Excel for this; it will make a mess of your Unicode text. Ditto exporting to CSV from Word)
  • "File" -> "Open..." -> File Type = "Text CSV (*.csv, *.txt)" -> "Open"
  • In the "Text Import" dialog box, make sure the character set is "Unicode (UTF-8)" and change the "separator" from "comma" to "tab"
  • Click "OK"
  • Make sure the spreadsheet gives you two columns (one for the short title and the other for the full citation).
  • Add an empty top row and in the first cell type "shortTitle" (no quotes). Enter the string "shortDescription" in the second cell (no quotes). Save the file (still in the tab-delimited format). (see here).
  • If you have python installed on your computer, download the tab2n3.py script from the W3C website and save it into the same folder as your data.
  • Open a command window or terminal and navigate to the folder where your data is.
  • Type the following:
$ python tab2n3.py -id -schema -namespace http://purl.org/ontology/bibo/ < BoundaryDisputesJustDataHeadings.csv > BoundaryDisputes.ttl
  • Open the resulting ttl file in the text-editor of your choice. You've got RDF! (see here).

Friday, June 1, 2012

Ancient Studies Needs Open Bibliographic Data and Associated URIs

Update 1:  links throughout, minor formatting changes, proper Creative Commons Public Domain tools, parenthetical about import path from Endnote and such, fixing a few typos.

The NEH-funded Linked Ancient World Data Institute, still in progress at ISAW, has got me thinking about a number of things. One of them is bibliography and linked data. Here's a brain dump, intended to spark conversation and collaboration.

What We Need

  • As much bibliographic data as possible, for both primary and secondary sources (print and digital), publicly released to third parties under either a public domain declaration or an unrestrictive open license.
  • Stable HTTP URIs for every work and author included in those datasets.

Why

Bibliographic and citation collection and management are integral to every research and publication in project in ancient studies. We could save each other a lot of time, and get more substantive work done in the field, if it was simpler and easier to do. We could more easily and effectively tie together disparate work published on the web (and appearing on the web through retrospective digitization) if we had a common infrastructure and shared point of reference. There's already a lot of digital data in various hands that could support such an effort, but a good chunk of it is not out where anybody with good will and talent can get at it to improve it, build tools around it, etc.

What I Want You (and Me) To Do If You Have Bibliographic Data
  1. Release it to the world through a third party. No matter what format it's in, give a copy to someone else whose function is hosting free data on the web. Dump it into a public repository at github.com or sourceforge.net. Put it into a shared library at Zotero, Bibsonomy, Mendeley, or another bibliographic content website (most have easy upload/import paths from Endnote, and other citation management applications). Hosting a copy yourself is fine, but giving it to a third party demonstrates your bona fides, gets it out of your nifty but restrictive search engine or database, and increments your bus number.
  2. Release it under a Creative Commons Public Domain Mark or Public Domain Dedication (CC0).  Or if you can't do that, find as open a Creative Commons or similar license as you can. Don't try to control it. If there's some aspect of the data that you can't (because of rights encumberance) or don't want to (why?) give away to make the world a better place, find a quick way to extract, filter, or excerpt that aspect and get the rest out.
  3. Alert the world to your philanthropy. Blog or tweet about it. Post a link to the data on your institutional website. Above all, alert Chuck Jones and Phoebe Acheson so it gets announced via Ancient World Online and/or Ancient World Open Bibliographies.
  4. Do the same if you have other useful data, like identifiers for modern or ancient works or authors.
  5. Get in touch with me and/or anyone else to talk about the next step: setting up stable HTTP URIs corresponding to this stuff.
Who I'm Talking To

First of all, I'm talking to myself, my collaborators, and my team-mates at ISAW. I intend to eat my own dogfood.

Here are other institutions and entities I know about who have potentially useful data.
  • The Open Library : data about books is already out there and available, and there are ways to add more
  • Perseus Project : a huge, FRBR-ized collection of MODS records for Greek and Latin authors, works, and modern editions thereof.
  • Center for Hellenic Studies: identifiers for Greek and Latin authors and works
  • L'Année Philologique and its institutional partners like the American Philological Association: the big collection of analytic secondary bibliography for classics (journal articles)
  • TOCS-IN: a collaboratively collected batch of analytic secondary bibliography for classics
  • Papyri.info and its contributing project partners: TEI bibliographic records for  much of the bibliography produced for or cited by Greek and Latin papyrologists (plus other ancient language/script traditions in papyrology)
  • Gnomon Bibliographische Datenbank: masses of bibliographic data for books and articles for classics
  • Any and every university library system that has a dedicated or easily extracted set of associated catalog records. Especially any with unique collections (e.g., Cincinnati) or those with databases of analytical bibliography down to the level of articles in journals and collections.
  • Ditto any and every ancient studies digital project that has bibliographic data in a database.
Comments, Reactions, Suggestions

Welcome, encouraged, and essential. By comment here or otherwise (but not private email please!).

Wednesday, May 23, 2012

First pass at extracting useful data from my dissertation

You'll find context in yesterday's post on the dissertation.

It turns out it wasn't as hard as I anticipated to start getting useful information extracted from my born-digital-for-printing-on-dead-trees dissertation. Here's a not-yet-perfect xml serialization (borrowing tags from the TEI) of "instance" information found in the diss narrative:

https://github.com/paregorios/demarc/blob/master/xml/instances.xml

Each instance is a historical event (or in some cases event series) relating to boundary demarcation or dispute within the empire. Here's a comparison between the original formatting for paper and the xml.

For paper:

XML:
<?xml version="1.0" encoding="UTF-8"?>
<div type="instance" xml:id="INST9">
  <idno type="original">INST9</idno>
  <head>A Negotiated Boundary between the <placeName 
    type="ancient">Zamucci</placeName> and the <placeName 
    type="ancient">Muduciuvi</placeName></head>
  <p rend="indent">Burton 2000, no. 78</p>
  <p>Date(s): <date>AD 86</date></p>
  <p type="treDisputeStatement">This boundary marker was placed in 
    accordance with the agreement of both parties (<foreign xml:lang="la">ex 
    conven/tione utrarumque nationum</foreign>), and therefore may be taken as
    evidence of a <hi rend="bold">boundary dispute</hi>.</p>
  <p rend="indent">This single boundary marker from coastal <placeName 
    type="modern">Libya</placeName> provides the only evidence for the resolution
    of a boundary dispute between these two indigenous peoples. The date of the 
    demarcation, as calculated from the imperial titulature, places the event in 
    the same year as the reported ‘destruction’ of the <placeName 
    type="ancient">Nasamones</placeName> by <placeName type="ancient">Legio III 
    Augusta</placeName> as a consequence of a tax revolt in which tax collectors 
    were killed.<note n="286"> Zonaras 11.19. </note> It is not clear whether 
    the boundary action was related to the conflict, or merely took advantage of
    the temporary presence of the legionary legate in what ought to have been
    part of the proconsular province. Surviving documentation for proconsuls
    during the 80s AD is incomplete, and therefore we cannot say who was
    governing <placeName type="ancient">Africa Proconsularis </placeName>at the 
    time of this demarcation.<note n="287"> Thomasson 1996, 45-48. </note>
    Neither party seems to have been related to the <placeName 
    type="ancient">Nasamones</placeName>; rather, they are thought to be sub-
    tribes of the <placeName type="ancient">Macae.</placeName><note 
    n="288">Mattingly 1994, 27-28, 32, 74, 76.. </note></p>
</div>



One thing that made this a lot easier than it might of been was the way I used styles in Microsoft Word back when I created the original version of the document. Rather than just painting formatting onto my text for headings, paragraphs, strings of characters, and so forth, I created a custom "style" for each type of thing I wanted to paint (e.g., an "instance heading" or a "personal name"). I associated the desired visual formatting with each of these, but the names themselves (since the captured semantic distinctions that I was interested in) provided hooks today for writing this stuff out as sort-of TEI XML.

There's more to do, obviously, but this was a satisfying first step.

Tuesday, May 22, 2012

Five Minutes to Ancient World Linked Data JavaScript

Probably not even that long:

  • Signed in to blogger
  • Went to the blog overview
  • Selected the "template" menu option
  • Selected the "Edit HTML" button 
  • Selected the "Proceed" button because I am fearless!
  • Scrolled to the bottom of the HTML "head" element and pasted in the following two lines:
<script src='http://isawnyu.github.com/awld-js/lib/requirejs/require.min.js' type='text/javascript'></script>
<script src='http://isawnyu.github.com/awld-js/awld.js?autoinit' type='text/javascript'></script>
  • Save and enjoy

Information about (and code) for the Ancient World Linked Data JavaScript library.

Open-Access Epigraphic Evidence for Boundary Disputes in the Roman Empire

I've been sitting on the dissertation way too long. So here it is, unleashed upon the world under the terms of a Creative Commons Attribution Share-alike license.

I have visions of hacking it up into a super-cool, linked data, ever-updated information resource, but there's no reason -- even though it's pretty niche in a lot of ways -- why anyone who might benefit from having, using, or critiquing it meantime should have to wait for that to happen.

Comments, questions, and post-release reviews are welcome via comment here, or via email to tom.elliott@nyu.edu, or on your own blog. And feel free to fork the repos and play around if you're mob-epigraphically inclined.

Friday, February 10, 2012

Give Me the Zotero Item Keys!

I fear and hope that this post will cause someone smarter than me to pipe up and say UR DOIN IT WRONG ITZ EZ LYK DIS ...

Here's the use case:

The Integrating Digital Papyrology project (and friends) have a Zotero group library populated with 1,445 bibliographic records that were developed on the basis of an old, built-by-hand Checklist of Editions of Greek and Latin Papyri (etc.). A lot of checking and improving was done to the data in Zotero.

Separately, there's now a much larger pile of bibliographic records related to papyrology that were collected (on different criteria) by the Bibliographie Papyrologique project. They have been machine-converted (into TEI document fragments) from a sui generis Filemaker Pro database and are now hosted via papyri.info (the raw data is on github).

There is considerable overlap between these two datasets, but also signifcant divergeance. We want to merge "matching" records in a carefully supervised way, making sure not to lose any of the extra goodness that BP adds to the data but taking full advantage of the corrections and improvements that were done to the Checklist data.

We started by doing an export-to-RDF of the Zotero data and, as a first step, that was banged up (programmatically) against the TEI data on the basis of titles. Probable matches were hand-checked and a resulting pairing of papyri.info bibliographic ID numbers against Zotero short titles was produced. You can see the resulting XML here.

I should point out that almost everything up to here including the creation and improvement of the data, as well as anything below regarding the bibliography in papyri.info, is the work of others. Those others include Gabriel Bodard, Hugh Cayless, James Cowey, Carmen Lantz, Adam Prins, Josh Sosin, and Jen Thum. And the BP team. And probably others I'm forgetting at the moment or who have labored out of my sight. I erect this shambles of a lean-to on the shoulders of giants.

To guide the work of our bibliographic researchers in analyzing the matched records, I wanted to create an HTML file that looks like this:
  • Checklist Short Title = Papyri.info ID number and Full Title String
  • BGU 10 = PI idno 7513: Papyrusurkunden aus ptolemäischer Zeit. (Ägyptische Urkunden aus den Staatlichen Museen zu Berlin. Griechische Urkunden. X. Band.)
  • etc. 
In that list, I wanted items to the left to be linked to the online view of the Zotero record at zotero.org and items on the right linked to the online view of the TEI record at papyri.info. The XML data we got from the initial match process provided the papyri.info bibliographic ID numbers, from which it's easy to construct the corresponding URIs, e.g., http://papyri.info/biblio/7513.

But Zotero presented a problem. URIs for bibliographic records in Zotero server use alphanumeric "item keys" like this: CJ3WSG3S (as in https://www.zotero.org/groups/papyrology/items/itemKey/CJ3WSG3S/).

That item key string is not, to my knowledge, included in any of the export formats produced by the Zotero desktop client, nor is it surfaced in its interface (argh). It appears possible to hunt them down programmatically via the Zotero Read API, though I haven't tried it for reasons that will be explained shortly. It is certainly possible to hunt for them manually via the web interface, but I'm not going to try that for more than about 3 records.

How I got the Zotero item keys

So, I have two choices at this point: write some code to automate hunting the item keys via the Zotero Read API or crack open the Zotero SQLLite database on my local client and see if the item keys are lurking in there too. Since I'm on a newish laptop on which I hadn't yet installed XCode, which seems to be a prerequisite to installing support for a Python virtual environment, which is the preferred way to get pip, which is the preferred install prerequisite for pyzotero, which is the python wrapper for the Zotero API, I had to make some choices about which yaks to shave.

I decided to start the (notoriously slow) XCode download yak and then have a go at the SQLLite yak while that was going on.

I grabbed the trial version of RazorSQL (which looked like a good shortcut after a few minutes of Googling), made a copy of my Zotero database, and started poking around. I thought about looking for detailed documentation (starting here I guess), but direct inspection started yielding results so I just kept going commando-style. It became clear at once that I wasn't going to find a single table containing my bibliographic entries. The Zotero client database is all normalized and modularized and stuff. So I viewed table columns and table contents as necessary and started building a SQL query to get at what I wanted. Here's what ultimately worked:

SELECT itemDataValues.value, items.key FROM items 
INNER JOIN libraries ON items.libraryID = libraries.libraryID
INNER JOIN groups ON libraries.libraryID = groups.libraryID
INNER JOIN itemData ON items.itemID = itemData.itemID
INNER JOIN itemDataValues ON itemData.valueID = itemDataValues.valueID
INNER JOIN fields ON itemData.fieldID = fields.fieldID
WHERE groups.name= "Papyrology" AND fields.fieldID=116

The SELECT statement gets me two values for each match dredged up by the rest of the query: a value stored in the itemDataValues table and a key stored in the items table. The various JOINs are used to get us close to the specific value (i.e., a short title) that we want. 116 in the fieldID field of the fields table corresponds to the short title field you see in your Zotero client. I found that out by inspecting the fields table; I could have used more JOINs to be able to use the string "shortTitle" in my WHERE clause, but that would have just taken more time.

The results of that query against my database looked like this:

P.Cair.Preis.    2245UKTH
CPR 18           26K8TAJT
P.Bodm. 28       282XKDE9
P.Gebelen        29ETKPXC
O.Krok           2BBMS7NS
P.Carlsb. 5      2D2ZNT4C
P.Mich.Aphrod.   2DTD2NIZ
P.Carlsb. 9      2FWF6T6I
P.Col. 1         2G4CF756
P.Lond.Copt. 2   2GAEU5QP
P.Harr. 1        2GCCNGJV
O.Deir el-Bahari 2GH3FEA2
P.Harrauer       2H3T6EU2
(etc).

So, copy that tabular result out of the RazorSQL GUI, paste it into a new LibreOffice spreadsheet and save it and I've got an XML file that I can dip into from the XSLT I had already started on to produce my HTML view.

Here's the resulting HTML file.

On we go.

Oh, and for those paying attention to such things, XCode finished downloading about two-thirds of the way through this process ...

Tuesday, February 7, 2012

Playing with PELAGIOS: Open Context and Labels

Latest in the Playing with PELAGIOS series.

I've just modified the tooling and re-run the Pleiades-oriented-view-of-the-GAWD report to include the RDF triples just published by Open Context and to exploit, when available, rdfs:label on the annotation target in order to produce more human-readable links in the HTML output. This required the addition of an OPTIONAL clause to the SPARQL query, as well as modifications to the results-processing XSLT. The new versions are indicated/linked on the report page.

You can see the results of these changes, for example, in the Antiochia/Theoupolis page.

Monday, February 6, 2012

Playing with PELAGIOS: The GAWD is Live

The is the lastest in an on-going series chronicling my dalliances with data published by the PELAGIOS project partners.

I think it's safe to say that, thanks to the PELAGIOS partner institutions, that we do have a Graph of Ancient World Data (GAWD) on the web. It's still in early stages, and one has to do some downloading, unzipping, and so forth to engage with it at the moment, but indeed the long-awaited day has dawned.

Here's the perspective, as of last Friday, from the vantage point of Pleiades. I've used SPARQL to query the GAWD for all information resources that the partners claim (via their RDF data dumps) are related to Pleiades information resources. I.e., I'm pulling out a list of information resources about texts, pictures, objects, grouped by their relationships to what Pleiades knows about ancient places (findspot, original location, etc.). I've sorted that view of the graph by the titles Pleiades gives to its place-related information resources and generated an HTML view of the result. It's here for your browsing pleasure.

Next Steps and Desiderata

For various technical reasons, I'm not yet touching the data of a couple of PELAGIOS partners (CLAROS and SPQR), but the will hopefully be resolved soon. I still need to dig into figuring out what Open Context is doing on this front. Other key resources -- especially those emanating from ISAW -- are not yet ready to produce RDF (but we're working on it).

There are a few things I'd like the PELAGIOS partners to consider/discuss adding to their data:

  • Titles/labels for the information resources (using rdfs:label?). This would make it possible for me to produce more intuitive/helpful labels for users of my HTML index. Descriptions would be cool too. As would some indication of the type of thing(s) a given resource addresses (e.g., place, statue, inscription, text)
  • Categorization of the relationships between their information resources and Pleaides information resources. Perhaps some variation of the terms originally explored by Concordia (whence the GAWD moniker), as someone on the PELAGIOS list has already suggested.
What would you like to see added to the GAWD? What would you do with it?

Thursday, February 2, 2012

Playing with PELAGIOS: Dealing with a bazillion RDF files

Latest in a Playing with PELAGIOS series

Some of the PELAGIOS partners distribute their annotation RDF in a relatively small number of files. Others (like SPQR and ANS) have a very large number of files. This makes the technique I used earlier for adding triples to the database ungainly. Fortunately, 4store provides some command line methods for loading triples.

First, stop the 4store http server (why?):
$ killall 4s-httpd
Try to import all the RDF files.  Rats!
$ 4s-import -a pelagios *.rdf
-bash: /Applications/4store.app/Contents/MacOS/bin/4s-import: Argument list too long
Bash to the rescue (but note that doing one file at a time has a cost on the 4store side):
$ for f in *.rdf; do 4s-import -av pelagios $f; done
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.00000.rdf>
Pass 1, processed 10 triples (10)
Pass 2, processed 10 triples, 8912 triples/s
Updating index
Index update took 0.000890 seconds
Imported 10 triples, average 4266 triples/s
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.101.rdf>
Pass 1, processed 11 triples (11)
Pass 2, processed 11 triples, 9856 triples/s
Updating index
Index update took 0.000936 seconds
Imported 11 triples, average 4493 triples/s
Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.10176.rdf>
Pass 1, processed 8 triples (8)
Pass 2, processed 8 triples, 6600 triples/s
Updating index
Index update took 0.000892 seconds
Imported 8 triples, average 3256 triples/s
... 
This took a while. There are 86,200 files in the ANS annotation batch.

Note the use of the -a option on 4s-import to ensure the triples are added to the current contents of the database, rather than replacing them! Note also the -v option, which is what gives you the report (otherwise, it's silent and that makes my ctrl-c finger twitchy).

Now, back to the SPARQL mines.

Wednesday, February 1, 2012

Playing with PELAGIOS: Arachne was easy after nomisma

Querying Pleiades annotations out of Arachne RDF was as simple as loading the Arachne Objects by Places RDF file into 4store the same way I did nomisma and running the same SPARQL query.  Cost: 5 minutes. Now I know about 29 objects in the Arachne database that they think are related to Akragas/Agrigentum. For example:

Playing with PELAGIOS: Nomisma

So, I want to see how hard it is to query the RDF that PELAGIOS partners are putting together. The first experiment is documented below.

Step 1: Set up a Triplestore (something to load the RDF into and support queries)

Context: I'm a triplestore n00b. 

I found Jeni Tennison's Getting Started with RDF and SPARQL Using 4store and RDF.rb and, though I had no interest in messing around with Ruby as part of this exercise, the recommendation of 4store as a triplestore sounded good, so I went hunting for a Mac binary and downloaded it.

Step 2: Grab RDF describing content in Nomisma.org

Context: I'm a point-and-click expert.

I downloaded the PELAGIOS-conformant RDF data published by Nomisma.org at http://nomisma.org/nomisma.org.pelagios.rdf.

Background: "Nomisma.org is a collaborative effort to provide stable digital representations of numismatic concepts and entities, for example the generic idea of a coin hoard or an actual hoard as documented in the print publication An Inventory of Greek Coin Hoards (IGCH)."

Step 3: Fire up 4store and load in the nomisma.org 

Context: I'm a 4store n00b, but I can cut and paste, read and reason, and experiment.

Double-clicked the 4store icon in my Applications folder. It opened a terminal window.

To create and start up an empty database for my triples, I followed the 4store instructions and Tennison's post (mutatis mutandis) and so typed the following in the terminal window ("pelagios" is the name I gave to my database; you could call yours "ray" or "jay" if you like):
$ 4s-backend-setup pelagios
$ 4s-backend pelagios
Then I started up 4store's SPARQL http server and aimed it at the still-empty "pelagios" database so I could load my data and try my hand at some queries:
$ 4s-httpd pelagios
Loading the nomisma data was then as simple as moving to the directory where I'd saved the RDF file and typing:
$ curl -T nomisma.org.pelagios.rdf 'http://localhost:8080/data/http://nomisma.org/nomisma.org.pelagios.rdf/'
Note how the URI base for nomisma items is appended to the URL string passed via curl. This is how you specify the "model URI" for the graph of triples that gets created from the RDF.

Step 4: Try to construct a query and dig out some data.

Context: I'm a SPARQL n00b, but I'd done some SQL back in the day and XML and namespaces are pretty much burned into my soul at this point. 

Following Tennison's example, I pointed my browser at http://localhost:8080/test/. I got 4store's SPARQL test query interface. I googled around looking grumpily at different SPARQL "how-tos" and "getting starteds" and trying stuff and pondering repeated failure until this worked:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oac: <http://www.openannotation.org/ns/>

SELECT ?x
WHERE {
 ?x oac:hasBody <http://pleiades.stoa.org/places/462086> .
} 

That's "find the ID of every OAC Annotation in the triplestore that's linked to Pleiades Place 462086" (i.e., Akragas/Agrigentum, modern Agrigento in Sicily). It's a list like this:
  • http://nomisma.org/nomisma.org.pelagios.rdf#igch1910-agrigentum-5
  • http://nomisma.org/nomisma.org.pelagios.rdf#igch2089-agrigentum-24
  • http://nomisma.org/nomisma.org.pelagios.rdf#igch2101-agrigentum-32
  • ...
51 IDs in all.

But what I really want is a list of the IDs of the nomisma entities themselves so I can go look up the details and learn things. Back to the SPARQL mines until I produced this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oac: <http://www.openannotation.org/ns/>

SELECT ?nomismaid
WHERE {
 ?x oac:hasBody <http://pleiades.stoa.org/places/462086> .
 ?x oac:hasTarget ?nomismaid .
} 

Now I have a list of 51 nomisma IDs: one for the mint and 50 coin hoards that illustrate the economic network in which the ancient city participated (e.g., http://nomisma.org/id/igch2081).

Cost: about 2 hours of time, 1 cup of coffee, and three favors from Sebastian Heath on IRC.

Up next: Arachne, the object database of the Deutsches Archäologisches Institut.



Tuesday, January 17, 2012

Wednesday, January 11, 2012

Changes to Electra and Maia Atlantis

I've just added the feed for the news blog on the following site to both Maia and Electra:

The following sites have been removed from Maia for the reasons indicated (they were not in Electra):
  • Antiquated Vagaries: feed returns 401 (i.e., it's been taken private)
Records for some other blogs were updated to reflect the fact that their feeds had moved to new URLs (with proper forwarding instructions). I pass over these details in silence here.