Thursday, November 19, 2009

Bridging Institutional Repository and Bibliographic Management

As an institution, ISAW has an interest in disseminating, preserving and promoting the research products and publications of its faculty, research staff, students, affiliates and collaborators. Our parent institution, NYU, has made a commitment to the persistent dissemination of such materials when voluntarily contributed to its Faculty Digital Archive (FDA). We'll use the FDA as a locus for materials that fit well into DSpace (with which the FDA is realized) and that aren't rights-constrained. But we also need mechanisms for developing and publishing the whole bibliographic story of a particular faculty member, research group, project or conference with links from the individual entries to digital copies wherever they may be (e.g., the FDA, JSTOR, Internet Archive, Google Books). For this function, we like Zotero. Atop Zotero's robust and ubiquitous feed documents, we can build interoperability with our website and other tools and venues in a way that is also completely visible to commercial and third-party search and discovery tools.

There will be a number of iterations necessary to reach a fully robust solution, but we're already taking some of the first steps.

As an early experiment with the FDA, we had a student assistant input all of my boss's articles in PDF format, along with descriptive metadata (see: Roger Bagnall's Publications). The default metadata schema in the FDA wasn't a perfect fit for journal article citations, but the FDA staff is now working with us to extend the schema to meet our needs. We're using the Zotero data model as a guide.

Given that the metadata in this collection is the only structured dataset around for Roger's articles, I wanted to be able to get it all back out to use for other things. The FDA does provide web feeds, but (unlike Zotero) these aren't comprehensive for a given context and don't incorporate all the metadata fields. But we can use FDA's OAI-PMH interface to get the full metadata with a query like:

where "hdl_2451_28115" is the identifier for the "Roger Bagnall's Publications" container I linked to above. (Special thanks to Ekaterina Pechekhonova on the NYU Digital Library team, who helped me with syntax).

As a further experiment, I wrote an XSL transform to convert the OAI-PMH XML document into the RDF XML Zotero can import. There are a couple of inelegant hacks in the transform (mainly to get at substrings within single fields), but I'm still happy with the results. The import into Zotero went smoothly:

Next steps: move this to a shared Zotero library so Roger, a student assistant and members of our digital projects team can collaborate to enter the rest of the publications (books, book sections, etc.) and fix any errors in the article records. Then we'll look at the process for using that metadata (via another transform) to help us populate the FDA. We'll also start working on parsing and aggregating Zotero's feeds for use on our website (in Roger's online profile and aggregated with other affiliates' feeds to provide a "recent publications" section).

We're also experimenting with Zotero for the bibliography of our Pleiades project (a collaborative online gazetteer of the Greek and Roman world), and as a component in a potential replacement for the Checklist of Editions of Greek, Latin, Demotic and Coptic Papyri, Ostraca and Tablets. On a more personal level, I've taken to doing all my bookmarking with Zotero and have set up a folder in my library (with associated feed) so that colleagues can following what I'm citing on a daily basis.

1 comment:

Amanda French said...

Interesting stuff, Tom. Kind of weird to hear that the FDA's metadata scheme wasn't already good for journal articles -- what the heck else was the FDA supposed to be for?

Since I've been working so much with Omeka (a history and archives-oriented web publishing platform developed by the same folks that brought us Zotero), I've been looking closely at the FDA. I was excited to see that Tamiment Library had FDA collections for things like the Communist Party Papers; I was hoping we could import at least the metadata for those items using OAI-PMH. But it turns out those are collections in name only -- there's nothing in them. Very few faculty seem to be using the repository, too, especially in the humanities -- apparently the Stern Business School publishes a lot of its working papers there, but that's about all.

I do recognize that the FDA is meant to be mainly for faculty research products, not archival materials held in the library, but it's the only repository I know of at NYU that's OAI-compliant.

Anyway, enough complaining. Thanks for blogging about this -- I think you're absolutely right that we need tools that *both* archive *and* disseminate research.