SyntaxHighlighter

Friday, February 10, 2012

Give Me the Zotero Item Keys!

I fear and hope that this post will cause someone smarter than me to pipe up and say UR DOIN IT WRONG ITZ EZ LYK DIS ...

Here's the use case:

The Integrating Digital Papyrology project (and friends) have a Zotero group library populated with 1,445 bibliographic records that were developed on the basis of an old, built-by-hand Checklist of Editions of Greek and Latin Papyri (etc.). A lot of checking and improving was done to the data in Zotero.

Separately, there's now a much larger pile of bibliographic records related to papyrology that were collected (on different criteria) by the Bibliographie Papyrologique project. They have been machine-converted (into TEI document fragments) from a sui generis Filemaker Pro database and are now hosted via papyri.info (the raw data is on github).

There is considerable overlap between these two datasets, but also signifcant divergeance. We want to merge "matching" records in a carefully supervised way, making sure not to lose any of the extra goodness that BP adds to the data but taking full advantage of the corrections and improvements that were done to the Checklist data.

We started by doing an export-to-RDF of the Zotero data and, as a first step, that was banged up (programmatically) against the TEI data on the basis of titles. Probable matches were hand-checked and a resulting pairing of papyri.info bibliographic ID numbers against Zotero short titles was produced. You can see the resulting XML here.

I should point out that almost everything up to here including the creation and improvement of the data, as well as anything below regarding the bibliography in papyri.info, is the work of others. Those others include Gabriel Bodard, Hugh Cayless, James Cowey, Carmen Lantz, Adam Prins, Josh Sosin, and Jen Thum. And the BP team. And probably others I'm forgetting at the moment or who have labored out of my sight. I erect this shambles of a lean-to on the shoulders of giants.

To guide the work of our bibliographic researchers in analyzing the matched records, I wanted to create an HTML file that looks like this:
  • Checklist Short Title = Papyri.info ID number and Full Title String
  • BGU 10 = PI idno 7513: Papyrusurkunden aus ptolemäischer Zeit. (Ägyptische Urkunden aus den Staatlichen Museen zu Berlin. Griechische Urkunden. X. Band.)
  • etc. 
In that list, I wanted items to the left to be linked to the online view of the Zotero record at zotero.org and items on the right linked to the online view of the TEI record at papyri.info. The XML data we got from the initial match process provided the papyri.info bibliographic ID numbers, from which it's easy to construct the corresponding URIs, e.g., http://papyri.info/biblio/7513.

But Zotero presented a problem. URIs for bibliographic records in Zotero server use alphanumeric "item keys" like this: CJ3WSG3S (as in https://www.zotero.org/groups/papyrology/items/itemKey/CJ3WSG3S/).

That item key string is not, to my knowledge, included in any of the export formats produced by the Zotero desktop client, nor is it surfaced in its interface (argh). It appears possible to hunt them down programmatically via the Zotero Read API, though I haven't tried it for reasons that will be explained shortly. It is certainly possible to hunt for them manually via the web interface, but I'm not going to try that for more than about 3 records.

How I got the Zotero item keys

So, I have two choices at this point: write some code to automate hunting the item keys via the Zotero Read API or crack open the Zotero SQLLite database on my local client and see if the item keys are lurking in there too. Since I'm on a newish laptop on which I hadn't yet installed XCode, which seems to be a prerequisite to installing support for a Python virtual environment, which is the preferred way to get pip, which is the preferred install prerequisite for pyzotero, which is the python wrapper for the Zotero API, I had to make some choices about which yaks to shave.

I decided to start the (notoriously slow) XCode download yak and then have a go at the SQLLite yak while that was going on.

I grabbed the trial version of RazorSQL (which looked like a good shortcut after a few minutes of Googling), made a copy of my Zotero database, and started poking around. I thought about looking for detailed documentation (starting here I guess), but direct inspection started yielding results so I just kept going commando-style. It became clear at once that I wasn't going to find a single table containing my bibliographic entries. The Zotero client database is all normalized and modularized and stuff. So I viewed table columns and table contents as necessary and started building a SQL query to get at what I wanted. Here's what ultimately worked:

SELECT itemDataValues.value, items.key FROM items 
INNER JOIN libraries ON items.libraryID = libraries.libraryID
INNER JOIN groups ON libraries.libraryID = groups.libraryID
INNER JOIN itemData ON items.itemID = itemData.itemID
INNER JOIN itemDataValues ON itemData.valueID = itemDataValues.valueID
INNER JOIN fields ON itemData.fieldID = fields.fieldID
WHERE groups.name= "Papyrology" AND fields.fieldID=116

The SELECT statement gets me two values for each match dredged up by the rest of the query: a value stored in the itemDataValues table and a key stored in the items table. The various JOINs are used to get us close to the specific value (i.e., a short title) that we want. 116 in the fieldID field of the fields table corresponds to the short title field you see in your Zotero client. I found that out by inspecting the fields table; I could have used more JOINs to be able to use the string "shortTitle" in my WHERE clause, but that would have just taken more time.

The results of that query against my database looked like this:

P.Cair.Preis.    2245UKTH
CPR 18           26K8TAJT
P.Bodm. 28       282XKDE9
P.Gebelen        29ETKPXC
O.Krok           2BBMS7NS
P.Carlsb. 5      2D2ZNT4C
P.Mich.Aphrod.   2DTD2NIZ
P.Carlsb. 9      2FWF6T6I
P.Col. 1         2G4CF756
P.Lond.Copt. 2   2GAEU5QP
P.Harr. 1        2GCCNGJV
O.Deir el-Bahari 2GH3FEA2
P.Harrauer       2H3T6EU2
(etc).

So, copy that tabular result out of the RazorSQL GUI, paste it into a new LibreOffice spreadsheet and save it and I've got an XML file that I can dip into from the XSLT I had already started on to produce my HTML view.

Here's the resulting HTML file.

On we go.

Oh, and for those paying attention to such things, XCode finished downloading about two-thirds of the way through this process ...

4 comments:

Anonymous said...

Bibliontology RDF exports the item keys. You can also modify another export translator to export them quite trivially. (Use the uri property of the item returned by Zotero.nextItem().)

John Muccigrosso said...

You needed Xcode anyway.

Stefano Costa said...

This discussion might be useful for your purposes (I didn't try it myself): http://forums.zotero.org/discussion/8306/testing-zoteroselect/

Tom Elliott said...

Alas, biblioontology RDF export produces: [Javascript Application] An error occurred while trying to export the selected file.