[Taxacom] validation of taxon names
bboyle at email.arizona.edu
Thu Feb 16 16:46:20 CST 2012
Sorry to be getting into this discussion rather late. I haven't been monitoring taxacom for a few days.
As Tony mentioned in his response, there are several solutions out there at the moment, each with its own limitations and caveats. For plants, one of those solutions is the iPlant Taxonomic Name Resolution Service (http://tnrs.iplantcollaborative.org/), a project I've been involved with for the past couple of years.
The purpose of the iPlant TNRS is too provide tools for bulk resolution of large lists of names--such as yours--against taxonomic databases. The TNRS is not a primary provider of taxonomic data, but rather a "window" on other databases; it provides a means of accessing the content of those databases. The tools consist of Dmitry Mozzherin's GNI Name Parser (http://gni.globalnames.org/parsers/new; with thanks to Dmitry), a somewhat modified version of Tony's TAXAMATCH (http://www.cmar.csiro.au/datacentre/taxamatch.htm...with thanks and apologies to Tony), and additional code for disambiguating results which are machine-resolvable and flagging those which are not. And finally, a user interface and api.
Currently we are at version 2.0, which uses Tropicos (http://www.tropicos.org/) as its source of names and synonymy (the latter as determined by their computed acceptance algorithm). As a Neotropics-centric botanist, I can attest that Tropicos provides extremely thorough coverage of bryophytes, ferns and seed plants of the Americas; indeed, it is the major electronic reference for a large number of Neotropical herbaria, many of which synchronize their taxonomy with Tropicos. Tropicos is also comprehensive for many regions of the Old World--Madagascar, for example. Of course if you need to resolve algae, fungi or insects, or have a lot of plant names from Indonesia, then the TNRS may not do the job.
As a redistributor, the TNRS is not able to allow users to download its cache of source databases. But it will enable you resolve your names in a single pass against all the names in those databases. For example, all 1.2 million names in Tropicos.
The TNRS will only be as useful as its taxonomic sources. If Tropicos is not adequate for your purposes, then the current version of the TNRS won't be adequate either. As of version 3.0 (currently under development) the TNRS will provide access to multiple sources of taxonomy. If you prefer NCBI to Tropicos, you will be able to use NCBI. If you are with a government agency that requires you standardize taxonomy according to an "approved" source, then you will be able to use that source (in the US, many agencies are required to use USDA Plants, http://plants.usda.gov/java/). Most importantly, you will be able to combine sources and rank them, so that "gold standard" monographic taxonomies can be applied to specific clades, and the remaining names mopped up using more comprehensive but less stringent sources.
The TNRS approach is agnostic in that it does not attempt to reconcile these sources against each other; rather, it provides access to what is currently out there. By no means is the TNRS is intended to be a substitute for more comprehensive solutions such as the Global Names Architecture. However, until solutions such as the GNA are fully implemented, applications like the TNRS can provide a practical means of cleaning and merging the enormous quantities of useful but taxonomically "dirty" data that many of us depend upon for our research.
Finally, although iPlant's funding and mission constrain the taxonomic scope of the iPlant TNRS to plants, we invite the community to develop other instances capable of resolving all organism names. The entire code base through version 2.0 is freely available via our GitHub repository (http://tnrs.iplantcollaborative.org/about.html#source_code).
Brad Boyle, Ph.D.
Dept. of Ecology and Evolutionary Biology
University of Arizona
P.O. Box 210088
Tucson, AZ 85721, USA
bboyle at email.arizona.edu
On Feb 14, 2012, at 11:00 AM, taxacom-request at mailman.nhm.ku.edu wrote:
> Message: 13
> Date: Tue, 14 Feb 2012 09:42:10 +0100
> From: Armand Turpel <armand.turpel.mnhn at gmail.com>
> Subject: [Taxacom] validation of taxon names
> To: taxacom at mailman.nhm.ku.edu
> Message-ID: <4F3A1E62.80307 at gmail.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> We have a database with over 80000 species taxon names which we want to
> compare and validate against other databases. Doing this job isn?t very
> 1. The majority of organizations only provide web interfaces to search
> for single taxon names.
> 2. Copyrights of data are some times not very clear
> 3. Quality of data is doubtful. >
> - Lamia amputator Gu?rin-M?neville, 1844
> - Lamia amputator Guerin-Meneville, 1844
> - Lamia amputator Gu?rin-M?neville
> - ?...
> The only organization we know that provide its whole database for
> download is species2000 (catalogue of life > COL). We created a
> postgresql version for the COL data from which it is possible to compare
> a big number of taxon names in one run. Postgresql provide good fuzzy
> string algorithms. But the COL data isn?t error free and it isn?t
> complete for our region.
> The question is: Which organization provide trustful, complete (as
> possible) and full accessible data?
More information about the Taxacom