Some of the PELAGIOS partners distribute their annotation RDF in a relatively small number of files. Others (like SPQR and ANS) have a very large number of files. This makes the technique I used earlier for adding triples to the database ungainly. Fortunately, 4store provides some command line methods for loading triples.
First, stop the 4store http server (why?):
Try to import all the RDF files. Rats!
$ killall 4s-httpd
Bash to the rescue (but note that doing one file at a time has a cost on the 4store side):
$ 4s-import -a pelagios *.rdf -bash: /Applications/4store.app/Contents/MacOS/bin/4s-import: Argument list too long
This took a while. There are 86,200 files in the ANS annotation batch.
$ for f in *.rdf; do 4s-import -av pelagios $f; done Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.00000.rdf> Pass 1, processed 10 triples (10) Pass 2, processed 10 triples, 8912 triples/s Updating index Index update took 0.000890 seconds Imported 10 triples, average 4266 triples/s Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.101.rdf> Pass 1, processed 11 triples (11) Pass 2, processed 11 triples, 9856 triples/s Updating index Index update took 0.000936 seconds Imported 11 triples, average 4493 triples/s Reading <file:///Users/paregorios/Documents/files/P/pelagios-data/coins/0000.999.10176.rdf> Pass 1, processed 8 triples (8) Pass 2, processed 8 triples, 6600 triples/s Updating index Index update took 0.000892 seconds Imported 8 triples, average 3256 triples/s ...
Note the use of the -a option on 4s-import to ensure the triples are added to the current contents of the database, rather than replacing them! Note also the -v option, which is what gives you the report (otherwise, it's silent and that makes my ctrl-c finger twitchy).
Now, back to the SPARQL mines.