horothesia: August 2008

Friday, August 29, 2008

Public Transit in Huntsville

Good to learn, via the Huntsville Times, that the "City Looks to Improve Public Transit System." I hope that Louise Heidish (the PR/Marketing Specialist they've hired) knows something about the web and can advise the city on some good, working, location-based and webfeed-enabled services to make trip planning and stop-finding easier, and to keep us abreast of route changes and other news. Doesn't look like she has a web presence though ...

Get Paid to Read Greek!

From Greg Crane:

Contribute to the Greek and Latin Treebanks!

We are currently looking for advanced students of Greek and Latin to contribute syntactic analyses (via a web-based system) to our existing Latin Treebank (described below) and our emerging Greek Treebank as well (for which we have just received funding). We particularly encourage students at various levels to design research projects around this new tool. We are looking in particular for the following:

Get paid to read Greek! We can have a limited number of research assistantships for advanced students of the languages who can work for the project from their home institutions. We particularly encourage students who can use the analyses that they produce to support research projects of their own.
We also encourage classes of Greek and Latin to contribute as well. Creating the syntactic analyses provides a new way to address the traditional task of parsing Greek and Latin. Your class work can then contribute to a foundational new resource for the study of Greek and Latin - both courses as a whole and individual contributors are acknowledged in the published data.
Students and faculty interested in conducting their own original research based on treebank data will have the option to submit their work for editorial review to have it published as part of the emerging Scaife Digital Library.

To contribute, please contact David Bamman (david.bamman@tufts.edu) or Gregory Crane (gregory.crane@tufts.edu).

For more information, see http://nlp.perseus.tufts.edu/syntax/treebank/.

Thursday, August 28, 2008

Barrington Atlas ID update: maps 89-99

README file for Barrington Atlas Identifiers, version published 2008-08-28
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99
List of all maps presently covered: 7-99

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

No changes to previously released IDs.

All 2,183 of you

Sometime this month, the diligent pigeons at Google Analytics tallied the twothousandth unique visitor in the history of this blog. Here's what they tell me about y'all:

56% of you prefer Firefox (as opposed to 28% for Internet Explorer and 12% for Opera)
68% of you use Windows (as opposed to 25% Macintosh and 6% Linux)
On average you view 1.6 pages per visit and spend less than 2 minutes on the site per visit
80% of you are "bounces" (i.e., either you got here from somewhere else and didn't like what you saw, or you are info-snackers, just dipping quickly into the latest post and then fluttering on)
Your top 5 languages are: English, French, German, Greek and Italian
Top 5 countries: United States, United Kingdom, Canada, Greece, Germany
Top 5 cities: London, New York, Athens (Greece), Lexington, Washington
Top 10 referring sites (other than search engines): Sean Gillies' Blog, Pleiades, The Stoa, Current Epigraphy, David Meadows' Rogue Classicism, Planet Atlantides, Ancient World Bloggers Group, Bill Caraher's The Archaeology of the Mediterranean World, Hugh Cayless's Scriptio Continua and Alun Salt's Archaeoastronomy

Wednesday, August 27, 2008

AIA News feed?

Can I really be right that the AIA website does not have a webfeed for its News section?

The First Thousand Years of Greek

The Center for Hellenic Studies has just announced, via its website, a project led by Neel Smith (Holy Cross) entitled "The First Thousand Years of Greek." I reproduce the entire announcement here, since the CHS website isn't set up to let me link directly to the announcement itself:

The First Thousand Years of Greek aims to create a corpus, to be made available under a free license, of TEI-compliant texts and lemmatized word indices coordinated with the on-line Liddell-Scott-Jones lexicon from the Perseus project. The coverage ultimately should include at least one version of every Greek text known to us from manuscript transmission from the beginning of alphabetic writing in Greece through roughly the third century CE.
In 2008, the capabilities of consumer-level personal computers, the tools available specifically for working with ancient Greek, and above all the publication of digital resources under licenses enabling scholarly use place the dream of the First Thousand Years of Greek within reach. Gregory Crane and the Perseus project have augmented Liddell-Scott-Jones with unique identifiers on every entry, and released this under a Creative Commons (free) license. Peter Heslin, whose work has always been a model of appropriate free licensing, has recently published in Diogenes 3 a polished library for working with the TLG E corpus, and by applying the open-sourced Perseus morphological parser to every word in the TLG E word list and then publishing the resulting index, has shown how even data sets with a restrictive license like the TLG can be used to create valuable new free resources. Hugh Cayless' transcoding transformer has become an indispensable piece of the programmer's toolkit, as support for Unicode continues to mature in a range of programming languages on different operating systems. At the Center for Hellenic Studies, Neel Smith and Christopher Blackwell have led the development of Canonical Text Services (information at chs, or mirrored here), a network service that retrieves passages of text identified by canonical references.

By combining public-domain readings of ancient texts or translations, which can be automatically transferred from digital collections such as the TLG, Perseus, and Project Gutenberg, with existing free resources, the CHS team will automate —and make it possible for others to automate— the most tedious aspects of creating the First Thousand Years of Greek. What we currently lack, and must create manually, is shockingly basic: an inventory of existing ancient Greek texts. The TLG Canon is a useful reference, but it is an inventory of print volumes, not of Greek texts. (So Ptolemy's Geography appears as two works in the TLG Canon because the TLG used two different print editions for different parts of the work; and of course entries for texts in “fragments” collections appear in the TLG Canon even though they do not exist as independent texts.) An inventory of Greek texts preserved by manuscript transmission will necessarily present a selection of material that is radically different from the material found in the TLG Canon.

In addition to historical metadata included in such an inventory, we need to determine for each text how it should be cited, and how that citation scheme should be mapped on to the TEI's semantic markup. There is no way to avoid making these editorial decisions individually for each text included in the First Thousand Years of Greek, but once the citation scheme has been been organized for a given text, we should be able to extract readings automatically from the TLG, Perseus, or Project Gutenberg, and then apply software to the extracted content to generate the new texts and indices of the First Thousand Years of Greek.

The quality of existing digital and print editions across the set of texts covered by the First Thousand Years of Greek will not be perfectly even. This will certainly mean that coverage of some parts of the project will advance more quickly than others. The CHS team expects that by beginning with material already available in good digital and print sources, we can gather a significant corpus quickly, and continue to expand its coverage over time. In the fall of 2008, the project is focusing on the first thousand years of Greek verse, with the goal of creating a complete corpus of all Greek texts in verse known through manuscript copying through the third century CE. The CHS welcomes collaborators, and invites any individuals, groups, or institutions who would like to contribute or just find out more about the First Thousand Years of Greek to email the project lead, Neel Smith, at first1kyears at chs.harvard.edu.

Tuesday, August 26, 2008

The Canadian Epigraphic Mission of Xanthos - Letoon (Lycia)

It's been a long time since I had an interesting conversation about digital approaches to epigraphic publication with Patrick Baker during the Epigraphic Congress in Barcelona. It's been not quite so long -- but clearly too long -- since I had a close look at what he and Gaétan Thériault have been doing since then with the Xanthos/Letoon epigraphy site:

Creative-commons licensing (cc-by-nc-nd)
Info on the project
Yearly reports on the survey seasons
Articles, papers, lectures and conferences
A documentary database including photographs of inscriptions and squeezes

Cool.

Thursday, August 21, 2008

Ann Macy Roth on Egypt in Huntsville: Monday 25 August 2008

The North Alabama Society of the Archaeological Institute of America is hosting Professor Ann Macy Roth for two talks next Monday:

Hatshepsut: Women and Power, 2:20 p.m. in Roberts 419 on the UAH campus
Androgeny and Blurred Boundaries in Ancient Egypt, 7:30 p.m. in the Chan Auditorium (first floor Business Administration Building) on the UAH campus

You can read more about Dr. Roth's work, and much else, on the NASAIA blog, Excavate!

Wednesday, August 20, 2008

BAtlas ID Update: Maps 28-34, 67-71, 81-83

README file for Barrington Atlas Identifiers, version published 2008-08-20
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 28, 29, 30, 31, 32, 33, 34, 67, 68, 69, 70, 71, 81, 82, 83
List of all maps presently covered: 7-88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

* No changes to previously released IDs.

Natual Language Toolkit (NLTK) penetration?

I'd be interested to know of digital classicists, antiquisters and those inhabiting neighboring nodes who are making use of NLTK and what your impressions of strengths and weaknesses are.

Tuesday, August 19, 2008

BAtlas ID update: Maps 7-9, 26-27

README file for Barrington Atlas Identifiers, version published 2008-08-19
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 7, 8, 9, 26, 27
List of all maps presently covered: 7-27, 35-66, 72-80, 84-88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

No changes to previously released IDs.

Maia Adjustments

The following feeds have been removed from Maia because they are generating errors on access and I have not been able to identify alternatives:

Digital Arts and Humanities: Classics and Ancient History: the entire site seems to be having problems
The Scribal Guild: Epigraphy and Archaeology of the West Semitic World: the feed returns nothing and the front page now reads: "This blog is protected, to view it you must log in"

The following feeds remain removed from Maia for reasons previously identified:

Saving Antiquities for Everyone News [xml]: still abusing the "pubdate" field by populating with event dates instead of announcement publication dates. This causes notices to hover at the top of the aggregator until the date of the event -- sometimes months away -- arrives.
Bryn Mawr Classical Review: Most Recent Articles [xml]: still abusing the "pubdate" field by repopulating it for all prior entries when a new entry is added, even if the prior entries remain unchanged. This causes old articles to "recycle" as new in feed readers subscribed to the aggregator feed. But note that the Bryn Mawr Classical Review Blog, which replicates the review content, does not have this problem and has been included in Maia since last Friday.

Monday, August 18, 2008

BAtlas ID update: Maps 19, 41-48

README file for Barrington Atlas Identifiers, version published 2008-08-15
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 19, 41, 42, 43, 44, 45, 46, 47, 48
List of all maps presently covered: 10-25, 35-66, 72-80, 84-88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

No changes to previously released IDs.

Friday, August 15, 2008

Atlantides: lists, comments out; BMCR in

Jo Cook is not the only one who didn't like my experiment with including comment and list archive feeds in the Maia and Electra aggregators. I don't like it either. They're out.

Meanwhile, BMCR has started producing a blog version of their reviews. It puts out a feed that properly handles the pubdate element. I've added that feed to Maia (see earlier comments on the direct BMCR feed). Thanks to Camilla MacKay for the notice.

Friday, August 8, 2008

BAtlas ID update: Maps 14-18, 24, 25, 39, 40

README file for Barrington Atlas Identifiers, version published 2008-08-08
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 14, 15, 16, 17, 18, 24, 25, 39, 40
List of all maps presently covered: 10-18, 20-25, 35-40, 49-65, 72-80, 84-88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

* No changes to previously released IDs.

Model Beijing

I was psyched to learn, through this morning's Huntsville Times, that the simulation development group at my old employer, AEgis Technologies Group, is getting their 3D modeling work showcased on NBC's Olympics coverage. I had a chance to get the guided tour last year; they're doing some great work, using a combination of sharp people, innovative methods, DigitalGlobe (and other) imagery and open-source software.

They've set up a demo site where you can find out more and play with some of the models: Virtual Beijing of Olympic Proportions.

I do have to correct one what must be a blunder in the Huntsville Times' write up: the images used cannot all be free ... it's the software they're using that's open-source. Or maybe the confusion is over the difference between "freely available" (i.e., not classified) and "free" (as in better than cheap).

Thursday, August 7, 2008

Bamboo Rising: Are Databases the "New Ground" of Humanities Research?

For those who didn't have a chance to participate in one of initial Project Bamboo workshops, or who haven't had an opportunity to catch up with what's going on now in that context, I thought I might provide a pointer to the Project Bamboo Planning Wiki.

One current activity there is an attempt to Identify Themes of Arts and Humanities Scholarly Practice. My feed reader tells me that there's only one actual theme defined in this new section (just a bit ago), but I bet there will be more soon. The sole present one was offered by F. Allan Hanson (U. of Kansas, Anthropology):

Ground of Research: "Humanities research is changing (or will change, or should change) from being grounded in texts (bibliographies) to relational databases."

I bet my legions of gentle readers have some opinions about this assertion. Feel free to comment in the comments, or on your blog, or on a public list ... or in the Bamboo Planning Wiki itself.

NYU Programming Job: Papyrological Navigator

New York University: Programmer/Analyst (7421BR)

New York University’s Division of the Libraries seeks a Programmer/Analyst to work on the "Papyrological Navigator" (http://papyri.info), a major web-based research portal that provides scholars worldwide with access to texts, transcriptions, images and metadata related to ancient texts on papyri, pottery fragments and other material. The incumbent will work closely with the Project Coordinator (at Columbia University) and with scholars involved in the project at NYU's Institute for the Study of the Ancient World, Duke University and the University of Heidelberg, as well as with NYU Digital Library Technology staff.

The incumbent's initial responsibilities will include: migrating existing PN software applications from Columbia University to NYU; optimizing performance as needed; establishing a robust production environment at NYU for the ongoing ingest and processing of new and updated Greek text transcriptions, metadata and digital images; performing both analysis and programming of any required changes or enhancements to current PN applications.

This is a grant-funded position and is available for 2 years.

Candidates should have the following skills:

Bachelor's degree in computer or information science and 3 years of relevant experience or equivalent combination
Must include experience developing applications using Java
Demonstrated knowledge of Java, Tomcat, Saxon, Lucene, Apache, SQL, XML, XSLT
Experience with metadata standards (e.g. TEI, EpiDoc)
Experience working in a Unix/Linux environments
Preferred: Experience with image serving software (eRez/FSI), Java Portlets, Apache Jetspeed-2, and Velocity templates.
Preferred: Experience designing, building, and deploying distributed systems.
Preferred: Experience working with non-Roman Unicode-based textual data (esp. Greek)
Excellent communication and analytical skills

Applicants should submit resume and cover letter, which reflects how applicant’s education and experience match the job requirements.

Please apply through NYU's application management system: www.nyu.edu/hr/jobs/apply.

At this page click on "External Applicants" then "Search Openings." Type 7421BR in the "Keyword Search" field and select search. NYU offers a generous benefit package including 22 days of vacation annually. NYU is an Equal Opportunity/Affirmative Action Employer.

New York University Libraries: Library facilities at New York University serve the school’s 40,000 students and faculty and contain more than 4 million volumes. New York University is a member of the Association of Research Libraries, the Research Libraries Group, the Digital Library Federation; serves as the administrative headquarters of the Research Library Association of South Manhattan, a consortium that includes three academic institutions. The Library’s website URL is http://library.nyu.edu

Tuesday, August 5, 2008

BAtlas IDs: Maps 10-13, 20-21, 49

README file for Barrington Atlas Identifiers, version published 2008-08-05
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 10, 11, 12, 13, 20, 21, 49
List of all maps presently covered: 10, 11, 12, 13, 20, 21, 22, 23, 35, 36, 37, 38, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 72, 73, 74, 75, 76, 77, 78, 79, 80, 84, 85, 86, 87, 87 inset, 88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

* No changes to previously released IDs.

Pondering Change to Atlantides Aggregators: Excavation Blogs

The subscription list for Maia Atlantis is getting pretty huge. In a recent post, Bill Caraher reminded me that there's a big (and growing) genre of excavation blogs. I think this genre is heavily underrepresented in the Atlantides feed aggregator constellation.

It occurred to me that it might be worthwhile to put dig-specific blogs into their own aggregator, and pull the few currently in Maia out and put them there too.

On the up side, that might help keep Maia to a manageable size. On the down side it would mean splitting up what has, until now, been a one-stop shop for ancient world blog content. And there would inevitably be some blogs in which lots of interesting non-excavation posts appear alongside hard-core dig news and status.

Thoughts?

Monday, August 4, 2008

BAtlas ID update: maps 23, 84, 85, 87, 87 inset, 88 and fixed dates

README file for Barrington Atlas Identifiers, version published 2008-08-04
Reference URL: http://atlantides.org/batlas

Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 23, 84, 85, 87, 87 inset, 88
List of all maps presently covered: 22, 23, 35, 36, 37, 38, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 72, 73, 74, 75, 76, 77, 78, 79, 80, 84, 85, 86, 87, 87 inset, 88

Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.

All readme files, dated folders and compressed tar files have been modified and renamed as necessary to redress the erroneous substitution of 2007 for 2008. No changes to IDs have occurred.

Sunday, August 3, 2008

BMCR and SAFE Events Feeds Pulled from Maia

It is with regret that this morning I have pulled the Bryn Mawr Classical Review (BMCR) Most Recent Articles feed from the list of feeds aggregated by Maia Atlantis. I took this step in accordance with my own Atlantis Suppression Policy.

It would appear that every time the BMCR adds a new article, dates on all articles in the feed are updated to present. As of this writing every single entry contains an identical "pubdate" tag with the value "03:49:18, Sunday, 03 August 2008" even though some of the entries have been in the list since it was first deployed a few weeks ago. This is non-standard behavior, and has the effect of pushing all the BMCR entries, in a block, to the top of any feed reader or aggregator, ahead of other content that is actually new. And in most feed readers they will show up highlighted or bolded, to indicate "new content." The appropriate behavior is to adjust dates only on those entries that have been added or substantially changed.

The Saving Antiquities for Everyone (SAFE) Events Feed is also blocked because it is forward-dating announcements of events to the date of the event, rather than the date of the entry. For example, the current feed contains a single entry with the following pubdate: "Thu, 16 Oct 2008 07:00:00 EST". This is the event date, not the publication date of the feed entry. This is also abuse of feed entry date fields and has the effect of causing these entries to linger at the top of the aggregation list for weeks or months until the date of the event passes.

I will be contacting the editors of both resources in the hopes of resolving these technical difficulties so that their content can once again be featured in Maia Atlantis.

Friday, August 1, 2008

Hidden Web: Don't Love It, Leave It

There's been a bit of buzz lately about Google's "failure" to effectively search the "hidden (deep) web". In the discussions I've been seeing, the hidden web is equated with stuff in academic and digital library repositories, i.e., "OAI-based resources" (which I assume to mean OAI/PMH).

I have to say: repositories != hidden web. The hidden web is simply the stuff the search engines don't find. Systems that surface information about their content only through OAI/PMH interfaces might make up a small part of the hidden web because they're not being surfaced to the bots, but frankly the hidden web holds way more stuff than what's in Fedora and DSpace at universities. Just ask Wikipedia.

The assertion that repository content == the hidden web is circular and false rhetoric that obscures the real problem: people are fighting the web instead of working with it. If you fight it, it will ignore you. This sort of thinking also makes hay for enterprises like the Internet Search Environment Number that seem to me to be trying to carve out business models that exploit, perpetuate and promote the cloistering of content and the rationing of information discovery.

Yesterday, Peter Millington posted what's effectively the antidote on the JISC-REPOSITORIES list (cross-posted to other lists). I reproduce it here in full because it's good advice not just for repositories but for anybody who is putting complex collections of content on the web and wants that content to be discoverable and useful:

Ways to snatch defeat from the jaws of victory
Peter Millington
SHERPA Technical Development Officer
University of Nottingham

You may have set up your repository and filled it with interesting papers, but it is still possible to screw things up technically so that search engines and harvesters cannot index your material. Here are seven common gotchas spotted by SHERPA:
Require all visitors to have a username and password
Do not have a 'Browse' interface with hyperlinks between pages
Set a 'robots.txt' file and/or use 'robots' meta tags in HTML headers that prevent search engine crawling
Restrict access to embargoed and/or other (selected) full texts
Accept poor quality or restrictive PDF files
Hide your OAI Base URL
Have awkward URLs
Full explanations and some solutions are given at: http://www.sherpa.ac.uk/documents/ways-to-screw-up.html

If you know of any other ways in which things may go awry, please contact us and we will consider adding them to the list.

I'm happy to say: Pleiades gets a clean bill of health if we count nos. 5 and 6 as non-applicable (since we're not a repository per se and we don't have a compelling use case for OAI/PMH or PDF).

Disclaimer: we are exploring the use of OAI/ORE through our Concordia project. One of the things we like most about it is that its primary serialization format is Atom, which is already indexed by the big search engines. With the web.

Can you believe it?

So, how long was it before the Mulder/Hades/Orpheus nexus dropped on your head like an anvil from Zeus? It was the dog that put me over the edge.