Friday, February 29, 2008

Clean URLs, Sebastian Heath and feed aggregation

PDQ SubmissionI see that Sebastian has submitted his post on URLs to PDQ. That seems to me to be a good thing, and not just because I'm a clean URL fettishist. Sebastian helped me explain our Pleiades URLs and the thinking behind them -- their sanity is Sean's brainchild -- in a series of posts and comments last fall.

I've been thinking more about URLs lately (which partly prompted yesterday's mouthing off about the TEI website) ... and here's why:

This morning, I gave a talk to a group at the British School in Rome (remotely, alas). The topic was "Atom+GeoRSS for interoperability". I imagined a feed-of-feeds that might aggregate glosses/citations/summaries of content on multiple websites related to the archaeology and epigraphy of Cyrene (this, an example of what we'd like to do for every place cited in Pleiades).

Most feed aggregators now bring feed content together thematically as a consequence of selection and filtering criteria established by the aggregator's editor (consider my Planet Atlantides, or Planet OSGeo or Planet Code4Lib). But we could also bring feed entries together by virtue of the values contained in the href attribute on <link rel="related"> tags (think: "all entries related to Cyrene, if the href value is the Pleiades URL for Cyrene"). Spatial correlation (containment, proximity) could also be a great way to aggregate feed content (see Sean Gillies' Mush demo) if your feeds have coordinates in them (by way of GeoRSS) or if you can geocode on the basis of an asserted link relationship with a reference that has coordinates.

I hacked up a fake, hypothetical result feed:
And a fantasy of one way the content could be exploited in the Pleiades website:
The latter was generated from the former using an xslt stylesheet (atom2nearby.xsl).

In these mockups, I re-imagined the URLs of the resources glossed/described in the various entries (in some cases, I imagined their web resources structure pretty much from scratch!). The consequences for various otherwise innocent websites are as follows:

Inscriptions of Roman Cyrenaica: Cyrene
Base URL (now):
http://ircyr.kcl.ac.uk/
Real URL for collection/search of resources (now):
n/a (in development)
Fantasy URL for collection/search of resources:
http://ircyr.kcl.ac.uk/inscriptions/cyrene
Real URL for basic resource (inscription) now:
n/a
Fantasy URL for basic resource:
http://ircyr.kcl.ac.uk/inscriptions/C61300

Epigraphische Datenbank Heidelberg
Base URL (now):
http://www.uni-heidelberg.de/institute/sonst/adw/edh/Fantasy base URL:
http://edh.uni-heidelberg.de/
Real URL for collection/search of resources (now):
http://edh12.iaw.uni-heidelberg.de/offen/suchen2.html?fuan=Cyrene (hidden by frames and a form)
Fantasy URL for collection/search of resources:
http://edh.uni-heidelberg.de/findspots/cyrene
Real URL for basic resource (inscription) now:
http://edh12.iaw.uni-heidelberg.de/offen/suchen2.html?hdnr=000838 (hidden by frames and a form)
Fantasy URL for basic resource (inscription) now:
http://edh.uni-heidelberg.de/inscriptions/HD000838

Cyrenaica Archaeological Project
Base URL (now):
http://www.cyrenaica.org
Real URL for collection/search of resources (now):
n/a (in development)
Fantasy URL for collection/search of resources (in this case, an index to a reference of features and monuments on the site, as indexed in an earlier publication, Bonacasa 2000):
http://www.cyrenaica.org/bonacasa-2000/
Real URL for basic resource (monument/feature) now:
n/a
Fantasy URL for basic resource:
http://www.cyrenaica.org/bonacasa-2000/fontana-dei-buoi-di-euripilo

American Numismatic Society
Base URL (now):
http://numismatics.org/
Real URL for collection/search of resources (now):
http://publicserver.numismatics.org/collection/accnum/list?field1=mint&field1op=equals&field1kws=Cyrene (hidden by a form)
Fantasy URL for collection/search of resources:
http://numismatics.org/mints/cyrene
Real URL for basic resource (coin) now:
http://numismatics.org/dnid/numismatics.org:1997.9.200
Fantasy URL for basic resource (coin) -- I left Sebastian's DNIDs, but I was tempted to replace them with:
http://numismatics.org/coins/1997.9.200/

So, what principles do I think I've embodied in these fantasy URLs? Simplicity. Intuition. Implementation-hiding. Elimination of redundancy. Linkability and browseability for collections and likely searches (and therefore visibility of database content that would otherwise be hidden from 3rd party search engines).

Your thoughts?

5 comments:

  1. A very small comment. The ANS already accepts URLs of the form:

    http://numismatics.org/collection/1858.1.1 .

    'collection' is more generic than 'coin' but otherwise the same idea.

    Tom, fantastic, no?

    ReplyDelete
  2. This is absolutely right. The IRCyr URL will probably include a directory level with publication-date (for permanency in case of second editions etc.) http://ircyr.kcl.ac.uk/ircyr2010/ircyr_C06300 vel sim. (I don't know that we won't keep the .html and .xml suffixes, however.) Have to see how Cocoon handles it otherwise.

    (Brief note: EDH does use the URL http://www.epigraphische-datenbank-heidelberg.de/ but I don't think that currently persists further down into the site.)

    ReplyDelete
  3. GB: thanks for that clarification; one question: why repeat the string "ircyr" thrice in the url?

    ReplyDelete
  4. I have to admit my first reaction was so long as the functionality is there does the URL matter? After some thought I think you’re right it does. From a programmer’s perspective so long as the URLs are reliable then it doesn’t matter what form it takes. But from a user perspective this is like saying that it doesn’t matter if a table of contents is at the front or back of a book. True, but a simple system aids usability.

    Further if this approach is adopted across several projects then it means that inter-operability is being built in to the sites from the start, making all sorts of things possible. Shared XML formats would seem to be a missed opportunity if there wasn’t a standard way of accessing them.

    Perhaps URLs should be included in any discussing of XML standards for archaeological/historical data?

    ReplyDelete