thoughts and comments across the boundaries of computing, ancient history, epigraphy and geography ... oh, and barbeque, coffee and rockets
SyntaxHighlighter
Friday, August 29, 2008
Public Transit in Huntsville
Get Paid to Read Greek!
Contribute to the Greek and Latin Treebanks!
We are currently looking for advanced students of Greek and Latin to contribute syntactic analyses (via a web-based system) to our existing Latin Treebank (described below) and our emerging Greek Treebank as well (for which we have just received funding). We particularly encourage students at various levels to design research projects around this new tool. We are looking in particular for the following:
- Get paid to read Greek! We can have a limited number of research assistantships for advanced students of the languages who can work for the project from their home institutions. We particularly encourage students who can use the analyses that they produce to support research projects of their own.
- We also encourage classes of Greek and Latin to contribute as well. Creating the syntactic analyses provides a new way to address the traditional task of parsing Greek and Latin. Your class work can then contribute to a foundational new resource for the study of Greek and Latin - both courses as a whole and individual contributors are acknowledged in the published data.
- Students and faculty interested in conducting their own original research based on treebank data will have the option to submit their work for editorial review to have it published as part of the emerging Scaife Digital Library.
For more information, see http://nlp.perseus.tufts.edu/syntax/treebank/.
Thursday, August 28, 2008
Barrington Atlas ID update: maps 89-99
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99
List of all maps presently covered: 7-99
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
- No changes to previously released IDs.
All 2,183 of you
- 56% of you prefer Firefox (as opposed to 28% for Internet Explorer and 12% for Opera)
- 68% of you use Windows (as opposed to 25% Macintosh and 6% Linux)
- On average you view 1.6 pages per visit and spend less than 2 minutes on the site per visit
- 80% of you are "bounces" (i.e., either you got here from somewhere else and didn't like what you saw, or you are info-snackers, just dipping quickly into the latest post and then fluttering on)
- Your top 5 languages are: English, French, German, Greek and Italian
- Top 5 countries: United States, United Kingdom, Canada, Greece, Germany
- Top 5 cities: London, New York, Athens (Greece), Lexington, Washington
- Top 10 referring sites (other than search engines): Sean Gillies' Blog, Pleiades, The Stoa, Current Epigraphy, David Meadows' Rogue Classicism, Planet Atlantides, Ancient World Bloggers Group, Bill Caraher's The Archaeology of the Mediterranean World, Hugh Cayless's Scriptio Continua and Alun Salt's Archaeoastronomy
Wednesday, August 27, 2008
AIA News feed?
The First Thousand Years of Greek
The First Thousand Years of Greek aims to create a corpus, to be made available under a free license, of TEI-compliant texts and lemmatized word indices coordinated with the on-line Liddell-Scott-Jones lexicon from the Perseus project. The coverage ultimately should include at least one version of every Greek text known to us from manuscript transmission from the beginning of alphabetic writing in Greece through roughly the third century CE.In 2008, the capabilities of consumer-level personal computers, the tools available specifically for working with ancient Greek, and above all the publication of digital resources under licenses enabling scholarly use place the dream of the First Thousand Years of Greek within reach. Gregory Crane and the Perseus project have augmented Liddell-Scott-Jones with unique identifiers on every entry, and released this under a Creative Commons (free) license. Peter Heslin, whose work has always been a model of appropriate free licensing, has recently published in Diogenes 3 a polished library for working with the TLG E corpus, and by applying the open-sourced Perseus morphological parser to every word in the TLG E word list and then publishing the resulting index, has shown how even data sets with a restrictive license like the TLG can be used to create valuable new free resources. Hugh Cayless' transcoding transformer has become an indispensable piece of the programmer's toolkit, as support for Unicode continues to mature in a range of programming languages on different operating systems. At the Center for Hellenic Studies, Neel Smith and Christopher Blackwell have led the development of Canonical Text Services (information at chs, or mirrored here), a network service that retrieves passages of text identified by canonical references.
By combining public-domain readings of ancient texts or translations, which can be automatically transferred from digital collections such as the TLG, Perseus, and Project Gutenberg, with existing free resources, the CHS team will automate —and make it possible for others to automate— the most tedious aspects of creating the First Thousand Years of Greek. What we currently lack, and must create manually, is shockingly basic: an inventory of existing ancient Greek texts. The TLG Canon is a useful reference, but it is an inventory of print volumes, not of Greek texts. (So Ptolemy's Geography appears as two works in the TLG Canon because the TLG used two different print editions for different parts of the work; and of course entries for texts in “fragments” collections appear in the TLG Canon even though they do not exist as independent texts.) An inventory of Greek texts preserved by manuscript transmission will necessarily present a selection of material that is radically different from the material found in the TLG Canon.
In addition to historical metadata included in such an inventory, we need to determine for each text how it should be cited, and how that citation scheme should be mapped on to the TEI's semantic markup. There is no way to avoid making these editorial decisions individually for each text included in the First Thousand Years of Greek, but once the citation scheme has been been organized for a given text, we should be able to extract readings automatically from the TLG, Perseus, or Project Gutenberg, and then apply software to the extracted content to generate the new texts and indices of the First Thousand Years of Greek.
The quality of existing digital and print editions across the set of texts covered by the First Thousand Years of Greek will not be perfectly even. This will certainly mean that coverage of some parts of the project will advance more quickly than others. The CHS team expects that by beginning with material already available in good digital and print sources, we can gather a significant corpus quickly, and continue to expand its coverage over time. In the fall of 2008, the project is focusing on the first thousand years of Greek verse, with the goal of creating a complete corpus of all Greek texts in verse known through manuscript copying through the third century CE. The CHS welcomes collaborators, and invites any individuals, groups, or institutions who would like to contribute or just find out more about the First Thousand Years of Greek to email the project lead, Neel Smith, at
first1kyears
atchs.harvard.edu
.
Tuesday, August 26, 2008
The Canadian Epigraphic Mission of Xanthos - Letoon (Lycia)
- Creative-commons licensing (cc-by-nc-nd)
- Info on the project
- Yearly reports on the survey seasons
- Articles, papers, lectures and conferences
- A documentary database including photographs of inscriptions and squeezes
Thursday, August 21, 2008
Ann Macy Roth on Egypt in Huntsville: Monday 25 August 2008
- Hatshepsut: Women and Power, 2:20 p.m. in Roberts 419 on the UAH campus
- Androgeny and Blurred Boundaries in Ancient Egypt, 7:30 p.m. in the Chan Auditorium (first floor Business Administration Building) on the UAH campus
Wednesday, August 20, 2008
BAtlas ID Update: Maps 28-34, 67-71, 81-83
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 28, 29, 30, 31, 32, 33, 34, 67, 68, 69, 70, 71, 81, 82, 83
List of all maps presently covered: 7-88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
* No changes to previously released IDs.
Natual Language Toolkit (NLTK) penetration?
Tuesday, August 19, 2008
BAtlas ID update: Maps 7-9, 26-27
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 7, 8, 9, 26, 27
List of all maps presently covered: 7-27, 35-66, 72-80, 84-88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
- No changes to previously released IDs.
Maia Adjustments
- Digital Arts and Humanities: Classics and Ancient History: the entire site seems to be having problems
- The Scribal Guild: Epigraphy and Archaeology of the West Semitic World: the feed returns nothing and the front page now reads: "This blog is protected, to view it you must log in"
- Saving Antiquities for Everyone News [xml]: still abusing the "pubdate" field by populating with event dates instead of announcement publication dates. This causes notices to hover at the top of the aggregator until the date of the event -- sometimes months away -- arrives.
- Bryn Mawr Classical Review: Most Recent Articles [xml]: still abusing the "pubdate" field by repopulating it for all prior entries when a new entry is added, even if the prior entries remain unchanged. This causes old articles to "recycle" as new in feed readers subscribed to the aggregator feed. But note that the Bryn Mawr Classical Review Blog, which replicates the review content, does not have this problem and has been included in Maia since last Friday.
Monday, August 18, 2008
BAtlas ID update: Maps 19, 41-48
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 19, 41, 42, 43, 44, 45, 46, 47, 48
List of all maps presently covered: 10-25, 35-66, 72-80, 84-88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
- No changes to previously released IDs.
Friday, August 15, 2008
Atlantides: lists, comments out; BMCR in
Meanwhile, BMCR has started producing a blog version of their reviews. It puts out a feed that properly handles the pubdate element. I've added that feed to Maia (see earlier comments on the direct BMCR feed). Thanks to Camilla MacKay for the notice.
Friday, August 8, 2008
BAtlas ID update: Maps 14-18, 24, 25, 39, 40
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 14, 15, 16, 17, 18, 24, 25, 39, 40
List of all maps presently covered: 10-18, 20-25, 35-40, 49-65, 72-80, 84-88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
* No changes to previously released IDs.
Model Beijing
They've set up a demo site where you can find out more and play with some of the models: Virtual Beijing of Olympic Proportions.
I do have to correct one what must be a blunder in the Huntsville Times' write up: the images used cannot all be free ... it's the software they're using that's open-source. Or maybe the confusion is over the difference between "freely available" (i.e., not classified) and "free" (as in better than cheap).
Thursday, August 7, 2008
Bamboo Rising: Are Databases the "New Ground" of Humanities Research?
One current activity there is an attempt to Identify Themes of Arts and Humanities Scholarly Practice. My feed reader tells me that there's only one actual theme defined in this new section (just a bit ago), but I bet there will be more soon. The sole present one was offered by F. Allan Hanson (U. of Kansas, Anthropology):
- Ground of Research: "Humanities research is changing (or will change, or should change) from being grounded in texts (bibliographies) to relational databases."
NYU Programming Job: Papyrological Navigator
New York University: Programmer/Analyst (7421BR)
New York University’s Division of the Libraries seeks a Programmer/Analyst to work on the "Papyrological Navigator" (http://papyri.info), a major web-based research portal that provides scholars worldwide with access to texts, transcriptions, images and metadata related to ancient texts on papyri, pottery fragments and other material. The incumbent will work closely with the Project Coordinator (at Columbia University) and with scholars involved in the project at NYU's Institute for the Study of the Ancient World, Duke University and the University of Heidelberg, as well as with NYU Digital Library Technology staff.
The incumbent's initial responsibilities will include: migrating existing PN software applications from Columbia University to NYU; optimizing performance as needed; establishing a robust production environment at NYU for the ongoing ingest and processing of new and updated Greek text transcriptions, metadata and digital images; performing both analysis and programming of any required changes or enhancements to current PN applications.
This is a grant-funded position and is available for 2 years.
Candidates should have the following skills:
- Bachelor's degree in computer or information science and 3 years of relevant experience or equivalent combination
- Must include experience developing applications using Java
- Demonstrated knowledge of Java, Tomcat, Saxon, Lucene, Apache, SQL, XML, XSLT
- Experience with metadata standards (e.g. TEI, EpiDoc)
- Experience working in a Unix/Linux environments
- Preferred: Experience with image serving software (eRez/FSI), Java Portlets, Apache Jetspeed-2, and Velocity templates.
- Preferred: Experience designing, building, and deploying distributed systems.
- Preferred: Experience working with non-Roman Unicode-based textual data (esp. Greek)
- Excellent communication and analytical skills
Applicants should submit resume and cover letter, which reflects how applicant’s education and experience match the job requirements.
Please apply through NYU's application management system: www.nyu.edu/hr/jobs/apply.
At this page click on "External Applicants" then "Search Openings." Type 7421BR in the "Keyword Search" field and select search. NYU offers a generous benefit package including 22 days of vacation annually. NYU is an Equal Opportunity/Affirmative Action Employer.
New York University Libraries: Library facilities at New York University serve the school’s 40,000 students and faculty and contain more than 4 million volumes. New York University is a member of the Association of Research Libraries, the Research Libraries Group, the Digital Library Federation; serves as the administrative headquarters of the Research Library Association of South Manhattan, a consortium that includes three academic institutions. The Library’s website URL is http://library.nyu.edu
Tuesday, August 5, 2008
BAtlas IDs: Maps 10-13, 20-21, 49
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 10, 11, 12, 13, 20, 21, 49
List of all maps presently covered: 10, 11, 12, 13, 20, 21, 22, 23, 35, 36, 37, 38, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 72, 73, 74, 75, 76, 77, 78, 79, 80, 84, 85, 86, 87, 87 inset, 88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
* No changes to previously released IDs.
Pondering Change to Atlantides Aggregators: Excavation Blogs
It occurred to me that it might be worthwhile to put dig-specific blogs into their own aggregator, and pull the few currently in Maia out and put them there too.
On the up side, that might help keep Maia to a manageable size. On the down side it would mean splitting up what has, until now, been a one-stop shop for ancient world blog content. And there would inevitably be some blogs in which lots of interesting non-excavation posts appear alongside hard-core dig news and status.
Thoughts?
Monday, August 4, 2008
BAtlas ID update: maps 23, 84, 85, 87, 87 inset, 88 and fixed dates
Reference URL: http://atlantides.org/batlas
Background: http://horothesia.blogspot.com/search/label/batlasids
New maps covered in this release: 23, 84, 85, 87, 87 inset, 88
List of all maps presently covered: 22, 23, 35, 36, 37, 38, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 72, 73, 74, 75, 76, 77, 78, 79, 80, 84, 85, 86, 87, 87 inset, 88
Major classes of change from prior versions are listed below. Consult individual files named like map22-diff.txt for output files differencing from prior version to this version.
- All readme files, dated folders and compressed tar files have been modified and renamed as necessary to redress the erroneous substitution of 2007 for 2008. No changes to IDs have occurred.
Sunday, August 3, 2008
BMCR and SAFE Events Feeds Pulled from Maia
It would appear that every time the BMCR adds a new article, dates on all articles in the feed are updated to present. As of this writing every single entry contains an identical "pubdate" tag with the value "03:49:18, Sunday, 03 August 2008" even though some of the entries have been in the list since it was first deployed a few weeks ago. This is non-standard behavior, and has the effect of pushing all the BMCR entries, in a block, to the top of any feed reader or aggregator, ahead of other content that is actually new. And in most feed readers they will show up highlighted or bolded, to indicate "new content." The appropriate behavior is to adjust dates only on those entries that have been added or substantially changed.
The Saving Antiquities for Everyone (SAFE) Events Feed is also blocked because it is forward-dating announcements of events to the date of the event, rather than the date of the entry. For example, the current feed contains a single entry with the following pubdate: "Thu, 16 Oct 2008 07:00:00 EST". This is the event date, not the publication date of the feed entry. This is also abuse of feed entry date fields and has the effect of causing these entries to linger at the top of the aggregation list for weeks or months until the date of the event passes.
I will be contacting the editors of both resources in the hopes of resolving these technical difficulties so that their content can once again be featured in Maia Atlantis.
Friday, August 1, 2008
Hidden Web: Don't Love It, Leave It
I have to say: repositories != hidden web. The hidden web is simply the stuff the search engines don't find. Systems that surface information about their content only through OAI/PMH interfaces might make up a small part of the hidden web because they're not being surfaced to the bots, but frankly the hidden web holds way more stuff than what's in Fedora and DSpace at universities. Just ask Wikipedia.
The assertion that repository content == the hidden web is circular and false rhetoric that obscures the real problem: people are fighting the web instead of working with it. If you fight it, it will ignore you. This sort of thinking also makes hay for enterprises like the Internet Search Environment Number that seem to me to be trying to carve out business models that exploit, perpetuate and promote the cloistering of content and the rationing of information discovery.
Yesterday, Peter Millington posted what's effectively the antidote on the JISC-REPOSITORIES list (cross-posted to other lists). I reproduce it here in full because it's good advice not just for repositories but for anybody who is putting complex collections of content on the web and wants that content to be discoverable and useful:
Ways to snatch defeat from the jaws of victoryI'm happy to say: Pleiades gets a clean bill of health if we count nos. 5 and 6 as non-applicable (since we're not a repository per se and we don't have a compelling use case for OAI/PMH or PDF).
Peter Millington
SHERPA Technical Development Officer
University of Nottingham
You may have set up your repository and filled it with interesting papers, but it is still possible to screw things up technically so that search engines and harvesters cannot index your material. Here are seven common gotchas spotted by SHERPA:Full explanations and some solutions are given at: http://www.sherpa.ac.uk/documents/ways-to-screw-up.html
- Require all visitors to have a username and password
- Do not have a 'Browse' interface with hyperlinks between pages
- Set a 'robots.txt' file and/or use 'robots' meta tags in HTML headers that prevent search engine crawling
- Restrict access to embargoed and/or other (selected) full texts
- Accept poor quality or restrictive PDF files
- Hide your OAI Base URL
- Have awkward URLs
If you know of any other ways in which things may go awry, please contact us and we will consider adding them to the list.
Disclaimer: we are exploring the use of OAI/ORE through our Concordia project. One of the things we like most about it is that its primary serialization format is Atom, which is already indexed by the big search engines. With the web.