Tuesday, August 28, 2012

Text of my talk at CIEGL 2012


CIEGL 2012 Paper: Efficient Digital Publishing for Inscriptions
cc-by

2. I considered giving this talk the following title:

why build a submarine to cross the Tiber?

It's a question we've heard a lot over the years in various forms. And by "we" I mean not just digital epigraphers -- if you'll accept such an appellation -- but the large and growing number of scholars and practitioners across the humanities who seek to bring computational methods to bear on the evidence, analysis, and publication of our scholarly work.

3. To build a submarine.

The phrase implies something complicated, expensive, time-consuming. Something with special capabilities. Perhaps a little bit dangerous.

4. To cross the Tiber.

Something we know how to do and have been doing for years using a small number of well-known techniques (bridges, boats, swimming). Something commonplace and, given its ubiquity, easy and inexpensive (at least if calculated per trip over time).

Whether asked rhetorically or in earnest, it's a question that deserves an answer. Time is precious. Funds are limited. There are many texts.

But maybe we don't just want to cross the Tiber. Maybe we want to explore the oceans.  And this is the point: for what reasons are we publishing inscriptions in digital form? Are the tools and methods we use fit for that purpose?

5. Why are we digitizing inscriptions? Why are we digitally publishing inscriptions? What uses do we hope to make of digital epigraphy?

I think it's safe to say that no sane person would prefer to lose the ability to search the epigraphic texts that are now available digitally. By my calculations, that's perhaps fifty or sixty percent of the Greek and Latin epigraphic harvest, probably a bit less if we successfully resolved all the duplicates within and across databases. Do you look forward to a day when all Greek and Latin inscriptions can be searched?

So we agree that "search" is a righteous use of digital technology and that we are making good progress toward the goal of "comprehensive search" we set for ourselves in Rome.

6. The relationship between "search" and digital epigraphy was treated at length by Hugh Cayless, Charlotte Roueché, Gabriel Bodard and myself in a 2007 contribution to Digital Humanities Quarterly, part of a themed volume entitled Changing the Center of Gravity: Transforming Classical Studies Through Cyberinfrastructure, which was assembled in honor of the late Ross Scaife.  Authors were asked to review digital developments in various subfields and to imagine the state of that field with respect to digital technology in ten year's time. When we wrote, we observed that the vast majority of digital epigraphic editions were still published solely in print, but we predicted that, by 2017, the situation will have changed drastically. We imagined a world in which computing will be as central to consuming epigraphic publications as it is now in making them.

Yet, the biggest challenge we still face in meeting the goals we identified in the DHQ article by 2017 is in making the transition to online publishing a reality by producing tools that are fit for the purpose. It's now no harder to create a traditional print-style epigraphic edition and put it online in HTML or PDF format than it is to get it ready to publish in print. A growing number of journals and institutional repositories -- though few yet devoted specifically to classics or epigraphy -- can now provide a publication venue for such born-digital editions that meets minimum expectations for stability, accessibility, citation, and preservation. Moreover,  I can't imagine that any of the major database projects would refuse the gift of clean digital texts and basic descriptive data corresponding to new publications in print or online, although the speed with which they might be able to incorporate same into their databases would be a function of available time and manpower.

7. But this scenario still assumes the old underlying assumptions inherited from a dead-tree era: an epigraphic publication is the work of an individual or small number of scholars, brought forth in a static format that, once discovered, must be read and understood by humans before it can be disassembled -- more or less by hand -- and absorbed into someone else's discovery or research process. From that process, new results will eventually emerge and get published in a similar way.

This is inefficient.

Time is precious. We are few. There are many texts and many questions to answer. Why are humans still doing work better fit for machines?

8. If we are to embrace and exploit the full range of benefits offered by the digital world, we have to remake our suppositions about not only publication, but about the entire research process. To the extent possible, our epigraphic publications must not only be online, but they must also meet certain additional criteria: universal discoverability, stability of access, reliability of citation, ubiquity of descriptive terminology, facility for download, readiness for reuse and remixing. Further, they must become open to incremental revision and microcontribution at levels below that of the complete text or edition.

Universal discoverability means that our editions -- and the discrete elements of their context -- must be discoverable in every way their potential readers and users might wish. So, yes, we must be able to search and browse for them via the homepage of an individual database or online publication, but they also must surface in Google, Bing, and their successors, as well as in library catalogs, introductory handbooks, and course materials. Links, created manually or automatically, ought to bring users to inscriptions from other web resources for ancient studies. Other special-purpose digital systems ought to be able to discover and access epigraphic content via open application programming interfaces. Changes and updates to contents should be reflected not only in a human-readable web page or printed conference handout, but also with a live web feed on the site that can be consumed and syndicated by automatic readers and aggregators.

Stability of access means that I should be able to revisit a given online publication at exactly the same web address I used to read it a month ago. A year ago. Ten years ago. For as long as the web exists. Don't make me run the query again. Ideally, that web address -- the Universal Resource Identifier or URI -- will be as short and easy to remember as possible.

Then there's Reliability of citation. If we cannot cite digital publications in a consistent and dependable way, those publications are of no value to the scholarly enterprise. I cannot cite your text for comparison, acknowledge your argument in making my own, or do any of the other things for which we must make scholarly reference if your online publication is not readily citable. This implies more than just stability of access, though that is essential. It also means that you must give me a way to include a reference to the appropriate part of the publication in the URI. So, if you're publishing 10 or 1,000 or 100,000 inscriptions online, you should give me a URI for each one of them. Just posting a PDF or returning a composite page of search results containing all of them doesn't cut it.

[[[Ubiquity of descriptive terminology]]]

Suppose I want to find all funerary inscriptions that have been published online that were originally written in Greek or Latin and that likely date to the third century after Christ. What if I want to narrow that group of inscriptions to a particular geographic context or pull out only those whose texts that contain Roman tria nomina?  We need an agreed mechanism for structuring and expressing the requisite descriptive elements in a standard, discoverable manner on the web.

There is an emerging web standard for this purpose. It is called Linked Data and, though we do not have time to explore it in detail here today, I believe that there is an urgent and immediate need for a collaborative effort, with real but modest funding behind it, to define and publish descriptive vocabularies for epigraphy that can be used in linked data applications. This work should build upon the metadata normalization efforts already well underway within the EAGLE consortium, and it might well benefit from the complementary work done a few years ago by EpiDoc volunteers to produce multi-language translations of many of the terms used by the Rome database. This would allow us to use a common set of machine-actionable terms for such key aspects as material, type of support, identifiable personages and places, mode of inscription, and date.

There is already a collaborative Linked Data mechanism for communicating geographic relationships in ancient studies. It is called Pelagios, and it already links XYZ different projects in various subfields of ancient studies to the Pleiades online gazetteer.

By joining Pelagios, epigraphic databases and publications would make their contents immediately discoverable by geographic location alongside the contents of the Perseus digital library, the DAI object database Arachne, the holdings of the British Museum, the archaeological reports published by Fasti Online, the coins database of the American Numismatic Society, and the documentary papyri (among others).

Facility of download and readiness for reuse and remixing. It is here perhaps that the epigraphic community faces its greatest twenty-first century challenge. We must decide no less than whether to embrace or forfeit the full promise of the digital age, for if scholars are unable to use digital surrogates -- programs designed to retrieve, analyze, and reformat data for a specific research need -- and use them across the full corpus of classical epigraphy, we will have forfeited the digital promise.

It is no secret that there is a history of disagreement and conflict in our community around the mere idea of putting published epigraphic texts in a digital database. Though such digitization and distribution is now established practice, the resulting databases and publications still assert a hodge-podge of rights and guidance for use, or else are distinguished by silence on the issue. In some cases, users are debarred from reuse of texts or other information in their own publications.

Let me offer, ex cathedra, a prescription for healing.

First, the social and institutional steps. Each person, project, institution, journal or publisher that puts inscriptions online should make a clear and complete statement of the rights it asserts over the publication, and the behavior it allows and expects of its users. "All rights reserved" or "for personal use only" are regimes that preclude most of the best of what we could do in the digital realm this century. If you choose such a path, you are in my opinion, standing in the way of progress. Moreover, you will find many colleagues who, depending on the legal jurisdiction and their experience, will contest any copyright claims to the text itself, and reuse it anyway. Far better than that fraught scenario is to use a standard open license, or even to make a public domain declaration. There are now several licenses crafted by the Creative Commons and the Open Knowledge Foundation that preserve the copyright and data rights that are permissible by law in your jurisdiction, while freely granting users a range of uses that are consistent with academic needs and practice. By choosing an appropriate CC or OKF license for your epigraphic publication, you will help bring the future.

Technologically, we need to make downloading and reusing easier. "Click here to download" is a good start, but to make serious change we will have to move beyond the PDF file to provide formats that can be chewed up and remixed by computational agents without losing the significance of the more discipline-specific aspects of the presentation. (We will pass over in silence the abomination of posting Microsoft Word files online).

So, an epigraphic edition in HTML or plain text, using standard Unicode encodings for all characters, is an excellent improvement over the PDF. I'd urge you, whenever possible, to go further and provide EpiDoc XML for download. The chief additional virtues of EpiDoc being (with regard to reuse) that the constituent elements of the edition (text, translation, commentary) are distinguished from each other in a consistent, machine-actionable manner, and that the semantics of the sigla you use in the text itself are represented unambiguously for further computational processing.

And here let me offer another parenthetical call for action. If EpiDoc is to live up to position of esteem it has now obtained in Greek and Latin epigraphic circles, we must make it easier and cheaper to use. I'll speak more about some aspects in a minute, but allow me to observe here that there is a critical need to identify and obtain funding -- probably a relatively small amount -- to convene some working meetings aimed at completing and updating the EpiDoc guidelines, and to pay for a postdoc or graduate student to support and contribute to that effort steadily for an academic year or so. The present state of the Guidelines is, I'm sad to say, close to useless. Many good people have tried to redress the problem, but job pressures and the lack of resources necessary to create real working time have so far stymied progress. The state of the EpiDoc Guidelines is a train wreck waiting to happen. We need to fix it.

I'd like to wrap up the discussion of digital benefits with a few words on subject of microcontribution. Microcontribution is another area in which we could enhance scholarly progress in epigraphy. By microcontribution I mean the incorporation into an online, scholarly publication of any contribution of content or interpretation in a unit too small to have been given its own article in a traditional journal. Have you ever read someone else's text and thought "I'd emend this bit differently?" Can you provide a realiable identification for a personal or geographic name that puzzled the original editor? We do have in print articles the genre of "notes on inscriptions" of one type or another -- and the annual bibliographic reviews strive mightily to keep these connected to the published editions for us, but what if that were made to happen automatically in an online environment?

So, are the tools now at hand for digital publication of epigraphy fit for purpose? Do they do all these things for us? Are they effective? Efficient? User-friendly?

No.

Of course, that's partly because we've had such a huge task of catching up to do. But we're making good, consistent progress on retrospective digitization. We cannot ignore the future.

And we are seeing important gains in some areas. The new interface to the Heidelberg database, for example, uses clear, stable URIs for each inscription's record (and for each image and bibliographic entry). Similar URIs were always available, but they were not foregrounded in the application and so required extra effort on the part of a user to discover. But I think you'll all agree that consistency of citation is better supported in the new system. On the reuse front, Heidelberg has long given clear guidance on expectations -- a reused text should indicate its origin in EDH.

But we have a long way to go in other areas. I would encourage those who currently manage (or are planning) large databases or digital corpora of inscriptions to look closely at what the papyrologists have been doing with papyri.info. Not only does the system provide descriptive information, texts, translations, and images -- at varying levels of completeness -- for some 80,000 papyri in a manner consistent with many of the desiderata I have enumerated above, but also it serves as the publication of record for a small but growing number of born-digital papyrological editions and microcontributions, all created and managed in EpiDoc.

The papyrologists have benefited, of course, from the significant largesse of the Andrew W. Mellon Foundation, as well as support from the U.S. National Endowment for the Humanities, in bringing this resource to life. Fortunately, both the institutions involved and the funders felt that it was essential to produce the software under open license, so it's ripe for reuse. But it's a complex piece of software that would require modification and extension to support the needs of epigraphers. It, like the major databases we already have in the field, would need an institutional home, tech support staff, and on-going funding. At NYU we are waiting for the National Endowment for the Humanities to tell us if they will give us funding to begin the customization work that would lead to a version of Papyri.info for epigraphy. If funded, this effort will move forward in collaboration with EAGLE and other major database projects, but we will also seek to provide, as soon as possible, an online environment for the creation, collaborative emendation, and digital publication of epigraphic texts from individuals and projects.



1 comment:

  1. Unfortunately, the speed at which I spoke proved a bit too slow to get this in in the requisite 15 minutes (despite some earlier trial runs). So I had to cut off before getting to microcontribution in detail. I did add a few extemporaneous comments about papyri.info.

    ReplyDelete