name server coverage

Roderic Page
Tue Jun 7 10:48:04 CDT 2005

Doug's email raises some interesting issue about data access and
comparing results of queries.

If taxonomic name databases did provide information in a standard
fashion, and in bulk, then some of these could be tackled. The
Taxonomic Search Engine essentially takes information from each
database and translates it internally in a consistent format (XML). In
the same way, when it serves LSIDs for a name, the metadata is in the
same format regardless of database. So, one way to compare databases
would be to compare the metadata generated for the "same" name in each
dataset. Of course, the limitation here is that the metadata is only as
good as the source database -- if the database doesn't provide an easy
way to retrieve synonyms then those will be absent from the metadata.
Some make this relatively easy, in other cases it might not be
provided, or may require extensive screen scraping.

There are standards being developed for serving taxonomic name
information, although from my own perspective I find these somewhat

The other aspect of this is bulk access to data. The Taxonomic Search
Engine has a web service interface, so that with a script one could
automate the process of reading a list of names in a file, querying
each database TSE supports, and comparing the information returned for
each name.

One issue in all this is what constitutes an error. In many cases
differences among databases might simply be variation in completeness
(i.e., one database records a pair of names as synonyms whereas another
doesn't). This will depend in part on the aims of the database. I guess
that for a nomenclator, failing to record two names as synonyms is not
an error but an omission, but if a database such as ITIS fails to
report such a relationship it may result in an error (i.e., the two
"accepted" taxa will appear in the ITIS classification as distinct
taxa, whereas they are synonyms).

Clearly, there is a lot of scope for some work on developing tools to
automate the comparison of taxonomic data.

On the other issue of submitting corrections, for a single record it
would be straightforward for the database web site to populate a form
with the current details of the record, then provide fields for the
user to make their suggested changes. This would avoid the issue of
parsing email messages. To correct names in bulk, probably the way
forward is for users to have a tool (perhaps a standalone program) that
they can use to download a bunch of names, make their changes, then
submit those to the database curator. None of this is rocket science,
it just(!) needs time, effort, and money...



