SyntaxHighlighter

Showing posts with label bibliography. Show all posts
Showing posts with label bibliography. Show all posts

Thursday, April 18, 2013

Citing Sources in Digital Annotations

I'm collaborating with other folks both in and outside ISAW on a variety of digital scholarly projects in which Linked Open Data is playing a big role. We're using the Resource Description Framework (RDF) to provide descriptive information for, and make cross-project assertions about, a variety of entities of interest and the data associated with them (places, people, themes/subjects, creative works, bibliographic items, and manuscripts and other text-bearing objects). So, for example, I can produce the following assertions in RDF (using the Terse RDF Triple Language, or TuRTLe):

<http://syriaca.org/place/45> a <http://geovocab.org/spatial#Feature> ;
  rdfs:label "Serugh" ;
  rdfs:comment "An ancient city where Jacob of Serugh was bishop."@en ;
  foaf:primaryTopicOf <http://en.wikipedia.org/wiki/Suruç> ;
  owl:sameAs <http://pleiades.stoa.org/places/658405#this> .

This means: 'There's a resource identified with the Uniform Resource Identifier (URI) "http://syriaca.org/place/45" about which the following is asserted:
(Folks familiar with what Sean Gillies has done for the Pleiades RDF will recognize my debt to him in the what proceeds.)

But there are plenty of cases in which just issuing a couple of triples to encode an assertion about something isn't sufficient; we need to be able to assign responsibility/origin for those assertions and to link them to supporting argument and evidence (i.e., standard scholarly citation practice). For this purpose, we're very pleased by the Open Annotation Collaboration, whose Open Annotation Data Model was recently updated and expanded in the form of a W3C Community Draft (8 February 2013) (the participants in Pelagios use basic OAC annotations to assert geographic relationships between their data and Pleiades places).


A basic OADM annotation uses a series of RDF triples to link together a "target" (the thing you want to make an assertion about) and a "body" (the content of your assertion). You can think of them as footnotes. The "target" is the range of text after which you put your footnote number (only in OADM you can add a footnote to any real, conceptual, or digital thing you can identify) and the "body" is the content of the footnote itself. The OADM draft formally explains this structure in section 2.1. This lets me add an annotation to the resource from our example above (the ancient city of Serugh) by using the URI "http://syriaca.org/place/45" as the target of an annotation) thus:
<http://syriaca.org/place/45/anno/desc6> a oa:Annotation ;
  oa:hasBody <http://syriaca.org/place/45/anno/desc6/body> ;
  oa:hasTarget <http://syriaca.org/place/45> ;
  oa:motivatedBy oa:describing ;
  oa:annotatedBy <http://syriaca.org/editors.xml#tcarlson> ;
  oa:annotatedAt "2013-04-03T00:00:01Z" ;
  oa:serializedBy <https://github.com/paregorios/srpdemo1/blob/master/xsl/place2ttl.xsl> ;
  oa:serializedAt "2013-04-17T13:35:05.771-05:00" .

<http://syriaca.org/place/45/anno/desc6/body> a cnt:ContentAsText, dctypes:Text ;
  cnt:chars "an ancient town, formerly located near Sarug."@en ;
  dc:format "text/plain" ;

I hope you'll forgive me for not spelling that all out in plain text, as all the syntax and terms are explained in the OADM. What I'm concerned about in this blog post is really what the OADM doesn't explicitly tell me how to do, namely: show that the annotation body is actually a quotation from a published book. The verb oa:annotatedBy lets me indicate that the annotation itself was made (i.e., the footnote was written) by a resource identified by the URI "http://syriaca.org/editors.xml#tcarlson". If I'd given you a few more triples, you could have figured out that that resource is a real person named Thomas Carlson, who is one of the editors working on the Syriac Reference Portal project. But how do I indicate (as he very much wants to do because he's a responsible scholar and has no interest in plagiarizing anyone) that he's deliberately quoting a book called The Scattered Pearls: A History of Syriac Literature and Sciences? Here's what I came up with (using terms from Citation Typing Ontology and the DCMI Metadata Terms):
<http://syriaca.org/place/45/anno/desc7/body> a cnt:ContentAsText, dctypes:Text ;
  cnt:chars "a small town in the Mudar territory, between Ḥarran and Jarabulus. [Modern name, Suruç (tr.)]"@en ;
  dc:format "text/plain" ;
  cito:citesAsSourceDocument <http://www.worldcat.org/oclc/255043315> ;
  dcterms:biblographicCitation  "The Scattered Pearls: A History of Syriac Literature and Sciences, p. 558"@en .

The addition of the triple containing cito:citesAsSourceDocument lets me make a machine-actionable link to the additional structured bibliographic data about the book that's available at Worldcat (but it doesn't say anything about page numbers!). The addition of the triple containing dcterms:bibliographicCitation lets me provide a human-readable citation.

I'd love to have feedback on this approach from folks in the OAC, CITO, DCTERMS, and general linked data communities. Could I do better? Should I do something differently?


The SRP team is currently evaluating a sample batch of such annotations, which you're also welcome to view. The RDF can be found here. These files are generated from the TEI XML here using the XSLT here.

Saturday, June 2, 2012

How to get a born-for-print bibliography into RDF

It began life as a Word file for a printed-on-paper dissertation. I want it to become linked data so that I can hook up other linked data I'm putting online. Here's a quick-and-basic way that involves no programming, writing of scripts, or other computational heroics on my part:
  • Open the Word file in Libre Office and save it (download copy here). The basic structure puts one citation per paragraph, with a tab dividing a short title from a full citation. E.g.:  
Ager 1989    S. Ager, “Judicial Imperialism: the Case of Melitaia,” AHB 3.5 (1989) 107-114.
Ager 1996    S. Ager, Interstate arbitrations in the Greek world, 337-90 B.C., Berkeley, 1996.
Aichinger 1982    A. Aichinger, “Grenzziehung durch kaiserliche Sonderbeauftragte in den römischen provinzen,” ZPE 48 (1982) 193-204.
  •  Rip out everything (like title, introductory materials, etc.) that's not the list of short titles and citations (download copy here).
  • "Save as ..." -> File Type = "text encoded" (select the "edit filter settings" checkbox) -> "Save" -> (in filter options, make sure "Unicode (UTF-8)" is the chosen encoding) -> "OK" (see here).
  • Close the text file in Libre Office.
  • Open a new spreadsheet file in Libre Office (don't use Excel for this; it will make a mess of your Unicode text. Ditto exporting to CSV from Word)
  • "File" -> "Open..." -> File Type = "Text CSV (*.csv, *.txt)" -> "Open"
  • In the "Text Import" dialog box, make sure the character set is "Unicode (UTF-8)" and change the "separator" from "comma" to "tab"
  • Click "OK"
  • Make sure the spreadsheet gives you two columns (one for the short title and the other for the full citation).
  • Add an empty top row and in the first cell type "shortTitle" (no quotes). Enter the string "shortDescription" in the second cell (no quotes). Save the file (still in the tab-delimited format). (see here).
  • If you have python installed on your computer, download the tab2n3.py script from the W3C website and save it into the same folder as your data.
  • Open a command window or terminal and navigate to the folder where your data is.
  • Type the following:
$ python tab2n3.py -id -schema -namespace http://purl.org/ontology/bibo/ < BoundaryDisputesJustDataHeadings.csv > BoundaryDisputes.ttl
  • Open the resulting ttl file in the text-editor of your choice. You've got RDF! (see here).

Friday, June 1, 2012

Ancient Studies Needs Open Bibliographic Data and Associated URIs

Update 1:  links throughout, minor formatting changes, proper Creative Commons Public Domain tools, parenthetical about import path from Endnote and such, fixing a few typos.

The NEH-funded Linked Ancient World Data Institute, still in progress at ISAW, has got me thinking about a number of things. One of them is bibliography and linked data. Here's a brain dump, intended to spark conversation and collaboration.

What We Need

  • As much bibliographic data as possible, for both primary and secondary sources (print and digital), publicly released to third parties under either a public domain declaration or an unrestrictive open license.
  • Stable HTTP URIs for every work and author included in those datasets.

Why

Bibliographic and citation collection and management are integral to every research and publication in project in ancient studies. We could save each other a lot of time, and get more substantive work done in the field, if it was simpler and easier to do. We could more easily and effectively tie together disparate work published on the web (and appearing on the web through retrospective digitization) if we had a common infrastructure and shared point of reference. There's already a lot of digital data in various hands that could support such an effort, but a good chunk of it is not out where anybody with good will and talent can get at it to improve it, build tools around it, etc.

What I Want You (and Me) To Do If You Have Bibliographic Data
  1. Release it to the world through a third party. No matter what format it's in, give a copy to someone else whose function is hosting free data on the web. Dump it into a public repository at github.com or sourceforge.net. Put it into a shared library at Zotero, Bibsonomy, Mendeley, or another bibliographic content website (most have easy upload/import paths from Endnote, and other citation management applications). Hosting a copy yourself is fine, but giving it to a third party demonstrates your bona fides, gets it out of your nifty but restrictive search engine or database, and increments your bus number.
  2. Release it under a Creative Commons Public Domain Mark or Public Domain Dedication (CC0).  Or if you can't do that, find as open a Creative Commons or similar license as you can. Don't try to control it. If there's some aspect of the data that you can't (because of rights encumberance) or don't want to (why?) give away to make the world a better place, find a quick way to extract, filter, or excerpt that aspect and get the rest out.
  3. Alert the world to your philanthropy. Blog or tweet about it. Post a link to the data on your institutional website. Above all, alert Chuck Jones and Phoebe Acheson so it gets announced via Ancient World Online and/or Ancient World Open Bibliographies.
  4. Do the same if you have other useful data, like identifiers for modern or ancient works or authors.
  5. Get in touch with me and/or anyone else to talk about the next step: setting up stable HTTP URIs corresponding to this stuff.
Who I'm Talking To

First of all, I'm talking to myself, my collaborators, and my team-mates at ISAW. I intend to eat my own dogfood.

Here are other institutions and entities I know about who have potentially useful data.
  • The Open Library : data about books is already out there and available, and there are ways to add more
  • Perseus Project : a huge, FRBR-ized collection of MODS records for Greek and Latin authors, works, and modern editions thereof.
  • Center for Hellenic Studies: identifiers for Greek and Latin authors and works
  • L'AnnĂ©e Philologique and its institutional partners like the American Philological Association: the big collection of analytic secondary bibliography for classics (journal articles)
  • TOCS-IN: a collaboratively collected batch of analytic secondary bibliography for classics
  • Papyri.info and its contributing project partners: TEI bibliographic records for  much of the bibliography produced for or cited by Greek and Latin papyrologists (plus other ancient language/script traditions in papyrology)
  • Gnomon Bibliographische Datenbank: masses of bibliographic data for books and articles for classics
  • Any and every university library system that has a dedicated or easily extracted set of associated catalog records. Especially any with unique collections (e.g., Cincinnati) or those with databases of analytical bibliography down to the level of articles in journals and collections.
  • Ditto any and every ancient studies digital project that has bibliographic data in a database.
Comments, Reactions, Suggestions

Welcome, encouraged, and essential. By comment here or otherwise (but not private email please!).

Friday, February 10, 2012

Give Me the Zotero Item Keys!

I fear and hope that this post will cause someone smarter than me to pipe up and say UR DOIN IT WRONG ITZ EZ LYK DIS ...

Here's the use case:

The Integrating Digital Papyrology project (and friends) have a Zotero group library populated with 1,445 bibliographic records that were developed on the basis of an old, built-by-hand Checklist of Editions of Greek and Latin Papyri (etc.). A lot of checking and improving was done to the data in Zotero.

Separately, there's now a much larger pile of bibliographic records related to papyrology that were collected (on different criteria) by the Bibliographie Papyrologique project. They have been machine-converted (into TEI document fragments) from a sui generis Filemaker Pro database and are now hosted via papyri.info (the raw data is on github).

There is considerable overlap between these two datasets, but also signifcant divergeance. We want to merge "matching" records in a carefully supervised way, making sure not to lose any of the extra goodness that BP adds to the data but taking full advantage of the corrections and improvements that were done to the Checklist data.

We started by doing an export-to-RDF of the Zotero data and, as a first step, that was banged up (programmatically) against the TEI data on the basis of titles. Probable matches were hand-checked and a resulting pairing of papyri.info bibliographic ID numbers against Zotero short titles was produced. You can see the resulting XML here.

I should point out that almost everything up to here including the creation and improvement of the data, as well as anything below regarding the bibliography in papyri.info, is the work of others. Those others include Gabriel Bodard, Hugh Cayless, James Cowey, Carmen Lantz, Adam Prins, Josh Sosin, and Jen Thum. And the BP team. And probably others I'm forgetting at the moment or who have labored out of my sight. I erect this shambles of a lean-to on the shoulders of giants.

To guide the work of our bibliographic researchers in analyzing the matched records, I wanted to create an HTML file that looks like this:
  • Checklist Short Title = Papyri.info ID number and Full Title String
  • BGU 10 = PI idno 7513: Papyrusurkunden aus ptolemäischer Zeit. (Ă„gyptische Urkunden aus den Staatlichen Museen zu Berlin. Griechische Urkunden. X. Band.)
  • etc. 
In that list, I wanted items to the left to be linked to the online view of the Zotero record at zotero.org and items on the right linked to the online view of the TEI record at papyri.info. The XML data we got from the initial match process provided the papyri.info bibliographic ID numbers, from which it's easy to construct the corresponding URIs, e.g., http://papyri.info/biblio/7513.

But Zotero presented a problem. URIs for bibliographic records in Zotero server use alphanumeric "item keys" like this: CJ3WSG3S (as in https://www.zotero.org/groups/papyrology/items/itemKey/CJ3WSG3S/).

That item key string is not, to my knowledge, included in any of the export formats produced by the Zotero desktop client, nor is it surfaced in its interface (argh). It appears possible to hunt them down programmatically via the Zotero Read API, though I haven't tried it for reasons that will be explained shortly. It is certainly possible to hunt for them manually via the web interface, but I'm not going to try that for more than about 3 records.

How I got the Zotero item keys

So, I have two choices at this point: write some code to automate hunting the item keys via the Zotero Read API or crack open the Zotero SQLLite database on my local client and see if the item keys are lurking in there too. Since I'm on a newish laptop on which I hadn't yet installed XCode, which seems to be a prerequisite to installing support for a Python virtual environment, which is the preferred way to get pip, which is the preferred install prerequisite for pyzotero, which is the python wrapper for the Zotero API, I had to make some choices about which yaks to shave.

I decided to start the (notoriously slow) XCode download yak and then have a go at the SQLLite yak while that was going on.

I grabbed the trial version of RazorSQL (which looked like a good shortcut after a few minutes of Googling), made a copy of my Zotero database, and started poking around. I thought about looking for detailed documentation (starting here I guess), but direct inspection started yielding results so I just kept going commando-style. It became clear at once that I wasn't going to find a single table containing my bibliographic entries. The Zotero client database is all normalized and modularized and stuff. So I viewed table columns and table contents as necessary and started building a SQL query to get at what I wanted. Here's what ultimately worked:

SELECT itemDataValues.value, items.key FROM items 
INNER JOIN libraries ON items.libraryID = libraries.libraryID
INNER JOIN groups ON libraries.libraryID = groups.libraryID
INNER JOIN itemData ON items.itemID = itemData.itemID
INNER JOIN itemDataValues ON itemData.valueID = itemDataValues.valueID
INNER JOIN fields ON itemData.fieldID = fields.fieldID
WHERE groups.name= "Papyrology" AND fields.fieldID=116

The SELECT statement gets me two values for each match dredged up by the rest of the query: a value stored in the itemDataValues table and a key stored in the items table. The various JOINs are used to get us close to the specific value (i.e., a short title) that we want. 116 in the fieldID field of the fields table corresponds to the short title field you see in your Zotero client. I found that out by inspecting the fields table; I could have used more JOINs to be able to use the string "shortTitle" in my WHERE clause, but that would have just taken more time.

The results of that query against my database looked like this:

P.Cair.Preis.    2245UKTH
CPR 18           26K8TAJT
P.Bodm. 28       282XKDE9
P.Gebelen        29ETKPXC
O.Krok           2BBMS7NS
P.Carlsb. 5      2D2ZNT4C
P.Mich.Aphrod.   2DTD2NIZ
P.Carlsb. 9      2FWF6T6I
P.Col. 1         2G4CF756
P.Lond.Copt. 2   2GAEU5QP
P.Harr. 1        2GCCNGJV
O.Deir el-Bahari 2GH3FEA2
P.Harrauer       2H3T6EU2
(etc).

So, copy that tabular result out of the RazorSQL GUI, paste it into a new LibreOffice spreadsheet and save it and I've got an XML file that I can dip into from the XSLT I had already started on to produce my HTML view.

Here's the resulting HTML file.

On we go.

Oh, and for those paying attention to such things, XCode finished downloading about two-thirds of the way through this process ...

Thursday, November 19, 2009

Bridging Institutional Repository and Bibliographic Management

As an institution, ISAW has an interest in disseminating, preserving and promoting the research products and publications of its faculty, research staff, students, affiliates and collaborators. Our parent institution, NYU, has made a commitment to the persistent dissemination of such materials when voluntarily contributed to its Faculty Digital Archive (FDA). We'll use the FDA as a locus for materials that fit well into DSpace (with which the FDA is realized) and that aren't rights-constrained. But we also need mechanisms for developing and publishing the whole bibliographic story of a particular faculty member, research group, project or conference with links from the individual entries to digital copies wherever they may be (e.g., the FDA, JSTOR, Internet Archive, Google Books). For this function, we like Zotero. Atop Zotero's robust and ubiquitous feed documents, we can build interoperability with our website and other tools and venues in a way that is also completely visible to commercial and third-party search and discovery tools.

There will be a number of iterations necessary to reach a fully robust solution, but we're already taking some of the first steps.

As an early experiment with the FDA, we had a student assistant input all of my boss's articles in PDF format, along with descriptive metadata (see: Roger Bagnall's Publications). The default metadata schema in the FDA wasn't a perfect fit for journal article citations, but the FDA staff is now working with us to extend the schema to meet our needs. We're using the Zotero data model as a guide.

Given that the metadata in this collection is the only structured dataset around for Roger's articles, I wanted to be able to get it all back out to use for other things. The FDA does provide web feeds, but (unlike Zotero) these aren't comprehensive for a given context and don't incorporate all the metadata fields. But we can use FDA's OAI-PMH interface to get the full metadata with a query like:

http://archive.nyu.edu/request?verb=ListRecords&metadataPrefix=oai_dc&set=hdl_2451_28115

where "hdl_2451_28115" is the identifier for the "Roger Bagnall's Publications" container I linked to above. (Special thanks to Ekaterina Pechekhonova on the NYU Digital Library team, who helped me with syntax).

As a further experiment, I wrote an XSL transform to convert the OAI-PMH XML document into the RDF XML Zotero can import. There are a couple of inelegant hacks in the transform (mainly to get at substrings within single fields), but I'm still happy with the results. The import into Zotero went smoothly:

http://www.zotero.org/paregorios/items/collection/1505597

Next steps: move this to a shared Zotero library so Roger, a student assistant and members of our digital projects team can collaborate to enter the rest of the publications (books, book sections, etc.) and fix any errors in the article records. Then we'll look at the process for using that metadata (via another transform) to help us populate the FDA. We'll also start working on parsing and aggregating Zotero's feeds for use on our website (in Roger's online profile and aggregated with other affiliates' feeds to provide a "recent publications" section).

We're also experimenting with Zotero for the bibliography of our Pleiades project (a collaborative online gazetteer of the Greek and Roman world), and as a component in a potential replacement for the Checklist of Editions of Greek, Latin, Demotic and Coptic Papyri, Ostraca and Tablets. On a more personal level, I've taken to doing all my bookmarking with Zotero and have set up a folder in my library (with associated feed) so that colleagues can following what I'm citing on a daily basis.

Monday, September 29, 2008

Friday, September 26, 2008

Reuters (EndNote) sues George Mason over Zotero

By way of the Courthouse News Service we hear that:

Thomson Reuters demands $10 million and an injunction to stop George Mason University from distributing its new Web browser application, Zotero ... Reuters claims George Mason is violating its license agreement and destroying the EndNote customer base.

Tuesday, July 22, 2008

Prayer Answered: BMCR gets feed!

Thanks to Troels I got a happy surprise this morning. The Bryn Mawr Classical Review has published a web feed of recent reviews. This is great news for dissemination, data sharing and general all-around knowing what's going on. I have already added the feed to the Maia Atlantis aggregator. Now we just need to encourage more people -- including me -- to write and submit more reviews of web publications!

Thursday, May 8, 2008

Open Library API, Bibo Ontology and Digital Bibliographies

I bet we're going to want to fiddle with the Open Library API and the Bibo Ontology in the context of the Pleiades bibliography application (and some others we're thinking about, like a next-generation Checklist of Editions for papyri and the like).
  • Seek and get digital books from the Open Library.
  • Use Bibo in other-than-html serializations of the underlying MODS records, and maybe even microformatishly in the HTML version. (We already use COinS -- for interop with Zotero -- but it's lossy, ungainly and suboptimally human-readable).
Thanks to Dave Pattern (via planet code4lib) for the pointer to the OA API).

Monday, February 25, 2008

Thraces and Moesi update

Nikolay Sharankov has just posted a thorough and helpful response to my recent query about a Roman boundary inscription from Bulgaria.

He includes revised readings -- not only of the inscription for which I have the photo -- but for all four originally published by Hristov in Minalo.

Wednesday, October 24, 2007

Citations =? links

I've been thinking a lot about scholarly citation and hyperlinking lately. I've been struggling to test mentally the degree to which the latter can support what I see as the distinct functions of the former:
  • attribution of argument or idea
  • source of fact or quotation
  • pointer to further information not directly germane to the issue at hand
  • disambiguation
  • classification
I'd be grateful for suggested additions to, and refinements of, the list of functions. It would also be helpful to learn of any literature on the subject.

Of course the goal is "computationally actionable citation."

Thursday, September 27, 2007

Bibliographic Proximity

Shawn is trying out the Pleiades Atom+GeoRSS feeds in Yahoo Pipes, and is also thinking about geoparsing locational data from regular bibliography.

Seems to me it would be fun to throw something like Sean Gillies' Mush into the mix to get spatial correlations between the Pleiades gazetteer (if I can call it that in this context) and Shawn's geo-bib.

Then, if we could apply some version of the combined process to, say, the new acquisitions list of the Burnam Classics Library in Cincinnati, we'd have some nifty pre-processing that could speed identification of new works to cite in the Pleiades bibliography.