er at xs4all.nl
Sat Jun 8 13:54:11 CDT 2013
On Mon, June 3, 2013 15:24, Roderic Page wrote:
> Hi Erik,
> Sorry for the delay in replying. I need to do a bit of work on this, but for now there are a coupe of ways to get the data.
> 1. There is a crude Darwin Core Archive dump at http://bionames.org/data/darwincore/bionames.zip (warning, ~144 Mb) This
> has names, references, and map between names and reference.
Thank you, that is really nice to play with.
But it also raises some questions:
- Three zipfile contains three .tsv files:
$ unzip -l /tmp/bionames.zip
Length Date Time Name
--------- ---------- ----- ----
154968043 05-30-2013 21:15 taxa.tsv
717387476 05-30-2013 21:16 media.tsv
129913824 05-30-2013 21:10 references.tsv
3255 05-29-2013 19:16 meta.xml
1002272598 4 files
Two of the .tsv files (taxa.tsv and media.tsv) end with a line that looks like an error message:
$ unzip -p /tmp/bionames.zip taxa.tsv | tail -1
failed [/Users/rpage/Desktop/bionames-data/darwincore/doi/simple_taxa.php:50]: SELECT * FROM names WHERE id= LIMIT 1
unzip -p /tmp/bionames.zip media.tsv | tail -1
failed [/Users/rpage/Desktop/bionames-data/darwincore/doi/simple_media.php:83]: SELECT * FROM names WHERE id= LIMIT 1
Now, it's easy enough to grep out those error-lines, but of course one wonders if those files are truncated and would be
larger, had the error not occurred?
The imported tables rowcounts are now:
which corresponds to number of lines in the .tsv files. Are these correct?
- Is it worthwhile to check regularly on http://bionames.org/data/darwincore/bionames.zip? ( Will it get updated?
(weekly, monthly, yearly))
I'll attach a (pretty basic) bash script for loading the .tsv files into postgres.
More information about the Taxacom