[Taxacom] Towards a consensus higher classification of organisms (was: List of Orders of the world), misspellings, etc...

David Remsen dremsen at gbif.org
Fri Jun 13 05:55:22 CDT 2008


In the 2005 publication "Evolution of the Insects, " the authors  
Grimaldi and Engel state "All accumulated information of a species is  
tied to a scientific name, a name that serves as a link between what  
has been learned in the past and what we today add to the body of  
knowledge."

All that accumulated information is coming on line, here and there,  
from initiatives large and small. GBIF currently provides access to  
135 million records of specimen and observation data living in 1200  
different databases.  The Biodiversity Heritage Library has digitized  
almost 6 million pages of the sort of literature that many Taxacom  
readers would like to access.  They each intend to increase the  
volumes of data available by at least an order of magnitude.  These  
are just two of many initiatives mobilising biodiversity  
information.   Inherent to all of that information are taxon names and  
as the quote alludes, form the primary means by which you or I will  
find that information.

We facilitate access to these data by building indexes.  We are, in a  
sense, "hoovering" them out of these resources.   While this was  
clearly meant to be a disparaging term in the context of this  
discussion, when replaced with "indexing" it forms the logical basis  
for discovery of these resources.    We certainly can assume this  
index is messy.  It will contain correct and incorrectly spelled  
names.  It will contain names that do and do not comply with the  
nomenclatural codes.  It contains names are are taxonomically (valid  
or accepted or correct) and those that are not.  It is not however,  
without value.  On the contrary, it forms the ultimate basis for what  
Grimaldi and Engel were referring to and to what I suspect Tony means  
when he refers to the "complete"  Catalogue of Life.

It would be great if each namestring in that list magically aligned  
with some existing authoritative taxonomic/nomenclatural dictionary  
that would distinguish the correctly spelled from the misspelled, the  
nomenclaturally sound from the unsound, and the taxonomically  
(accepted or valid or correct) from the non.   Of course they don't  
because there is no such dictionary.  If there was it should be a  
virtual dictionary composed of lots of contributing members.

GBIF has an index of names that are linked to the 135 million specimen  
and observation records that one might say were hoovered out of 2000+   
databases.  There are approximately  3.2 million distinct namestrings  
purported to be species names in that index.  We use the Catalogue of  
Life as our primary taxonomic 'backbone' and it matches less than one  
third of these.

This is not a criticism of the COL which provides the largest  
taxonomic "backbone" we have.  But the ambitions of the COL are toward  
organizing taxa and what we need added to this (aside from access to a  
wider range of taxonomic opinion)  is a separate mechanism to organise  
names.  We also need a wider and more distinct set of semantic and  
syntactic assertions about those names.   The problem lies in how, and  
from whom, we try to get this information.

What I would like to see emerge is a system that recognises a  
distinction between:

1. An index of names that link to content as part of a discovery  
process.  These exist and for good reason, and they are all trying,  
more or less ineffectively, to reconcile matters of syntax and  
semantics to two primary classes of information that many of you  
curate: nomenclature and taxonomy.

2. A coordinated "dictionary" of nomenclature that serves to  
explicitly reconcile the code-compliant from the non-code compliant  
names.   This dictionary of nomenclature should be derived from  
services provided by the existing nomenclatural databases and not  
originate within the index.    Among these services I would like to  
see emerge from the nomenclators are:

   --  A  consensus higher classification of names that would serve as  
a scaffolding for a subsequent consensus catalog of taxa.   Users of  
the GBIF data portal seeking data on birds would only get a receive a  
small subset of data records tied to avian names if the COL forms the  
basis of a search on Aves.    We can do better.

  -- access to statements of nomenclatural status for those names that  
some would prefer to forget but who themselves forgot to remove them  
from the literature and other places where they may inadvertantly crop  
up.

--  access to name dictionaries that provide the basis for de-aliasing  
all of the little lexical variations that occur in-situ within the  
information we index and that enable those who build these indexes to  
reconcile these variants to their authoritative nomenclator analogs.   
This is  why Tony wrote his TaxaMatch programme and we wrote our  
LexMapper algorithm and so on.   A global names index,  conceptually  
separate from, but coordinated with, the primary nomenclators ensures  
the sort of qualitative separation we need while allowing these  
efforts to be effectively informed.

-- nomenclatural synonymy that tie all objective combinations to the  
type

3. coordinated with a virtual directory of taxonomic opinion as  
diverse as reality dictates, tied to nomenclature via globally unique  
and persistent nomenclatural identifiers, and themselves serving  
similar taxonomic identifiers.    Such a directory requires the  
development of coordinated discovery services itself and standard  
mechanisms to index and resolve information provided.

Hopefully we come full circle where the resolution of taxonomic  
expertise results in the sort of semantics that Tony and I can use  
across those indexes of content that provide access to all of that  
accumulated information.  To use this more authoritative information  
means we need access and we need copies.  Attributable, temporary,  
cached copies but copies.   It's not good enough to see it's been  
done.  It needs to be applied.

In my opinion, distinguishing these components provides synergies that  
will effectively embed the efforts of authoritative taxonomy within  
the evolving cyberinfrastructure in a measure-able way that can  
insulate the collision we see between informatics and biology and  
place taxonomy as the centerpiece of the whole show where you and I  
know it should be.

Three cups of coffee and a whole morning.  Try to be kind.
David Remsen

On Jun 12, 2008, at 3:52 PM, B.J.Tindall wrote:

> Dear Tony,
> But I do appreciate what you are doing and yes,
> believe me I do know that there are all sorts of
> errors out there. I am also not being negative
> because if you had the expertise to appreciate
> the mess that was out there you would probably be
> as critical as I am. Examples:
> Halobacter:
> Halobacter Wainoe, Tindall & Ingvorsen, 1999 (0)
> - as a co-author of the paper I can tell you the
> organism entered the literature as Halorhabdus
> The other example:
> Halobacter (1)
> Halobacter salinaria (Harrison & Kennedy, 1922)
> Anderson, 1954 makes reference to only one
> species of many in that genus that didn't make it
> on to the bench mark of the Approved Lists of
> Bacterial Names, so why is only one there?
>
> Bacillus pestis (Lehmann & Neumann, 1896) Migula,
> 1900 Bacillaceae CoL2006/BIO-1918-8570 is an
> older, no longer used name for Yersinia pestis - not even mentioned
>
> the entry:
> http://www.marine.csiro.au/mirrorsearch/ir_search.go?searchtxt=Pedobacter+heparinus&hlevel=species
> lists only Cytophaga heparina (Payza & Korn,
> 1956) Christensen, 1980 , Sphingobacterium
> heparinum (Payza & Korn, 1956) Takeuchi & Yokota,
> 1993 Pedobacter heparinus (Payza & Korn 1956)
> Steyn et al., 1998 while Flavobacterium heparinum
> Payza & Korn 1956 is not mentioned
>
> Flavobacterium yabuuchiae Holmes et al., 1988 -
> no indication of the fact that is also considered
> to be a synonym of Sphingobacterium spiritivorum.
>
> There is absolutely no way of telling whether the
> names listed are names recognised by the current
> Bacteriological Code or simply ballast from the
> past. We are currently running at about 10,000
> names in use and 20,000 names we would prefer to forget.
>
> Haloincola (2) Halomonadaceae SN2000 unverified -
> but SN 2000 doesn't include this genus in this
> family (which it doesn't belong to either).
>
> and I only sepnt 15 minutes on the site.
>
> I am afraid that your site isn't the only one to
> be dogged by problems with regards names of
> prokaryotes, some are significantly worse. The
> major problem is the the average user wouldn't be
> able to distinguish the problem information from
> correct information and this just causes
> unnecessary confusion. This is particularly
> worrying if one apprecaites that we know all
> names that the current Code recognises and also
> the links between the appropriate synonyms etc.
>
> Sorry.
> Brian
>
>
> At 14:04 12.06.08, Tony.Rees at csiro.au wrote:
>> Penny, Brian,
>>
>> First, I was commending Parker as a worthy
>> exercise that has not yet been surpassed, but
>> was overdue for an update - including correcting
>> any shortcomings of course (and yes, I also
>> noticed the omission of the springtails).
>> Second, the implied criticism of "uncritically
>> just hoovering in anything on the web" is
>> unfair. What I am actually attempting to do is
>> to fill gaps in the available Catalogue of Life
>> compilation from other supposedly
>> "authoritative" lists (including some not
>> available anywhere in electronic form, and
>> others not yet published, courtesy of their
>> authors), and then address some of the issues of
>> mis-matching names, and hierarchies to a lesser
>> degree in due course (the latter a secondary
>> consideration). My ultimate reason for this is
>> to have a local list (or construct a web
>> service) that will attempt to answer, at a
>> machine readable level and in a consistent
>> manner, the two questions (a) is this genus /
>> genus + species combination marine or nonmarine,
>> and (b) is it extant or fossil (or potentially
>> both), also correcting errors in sources as I
>> come across them (and I can assure you that the
>> supposed "gold standard" Catalogue of Life is by
>> no means error free). Since the alternative
>> appears to be to wait for the CoL to be complete
>> (another xx years??) and even then it will not
>> contain the habitat information that I seek, and
>> will also miss all the fossil taxa, I feel that
>> if there is a requirement for such a list "now",
>> one has no practical alternative to constructing one's own...
>>
>> I guess I was hoping for constructive criticism
>> rather than negativity. If the latter is a
>> general response, it is a simple matter to
>> disable the high level search options once more
>> and just use the system to suit my own needs and
>> those of my "immediate" clients (principally,
>> OBIS and others with similar habitat-specific
>> requirements). But maybe there are persons on
>> the list who see *some* wisdom in this approach
>> - note the "Interim" in the IRMNG title - when
>> there is something better to use, I will be the first to use it.
>>
>> - Tony
>>
>> -----Original Message-----
>> From: taxacom-bounces at mailman.nhm.ku.edu
>> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of B.J.Tindall
>> Sent: Thursday, 12 June 2008 7:21 PM
>> To: taxacom at mailman.nhm.ku.edu
>> Subject: Re: [Taxacom] Towards a consensus
>> higher classification of organisms (was: List of
>> Orders of the world), misspellings, etc...
>>
>> ...and I hope that real benchmark lists of
>> formally registered names, like the system in use
>> in prokaryote nomenclature won't get swamped by
>> those sites that uncritically just hoover in anything on the web.....
>> Brian
>>
>>
>> At 10:06 12.06.08, Penny Greenslade wrote:
>>> I hope any such list will not follow Parker (ed.)'s "Synopsis and
>>> Classification of Living Organisms" (publ. 1982) too closely as  
>>> that book
>>> left out a whole (abundant and widespread)
>> Class of organisms, the Collembola.
>>>
>>> Penelope Greenslade
>>>
>>> At 01:14 PM 12/06/2008 +1000, Tony.Rees at csiro.au wrote:
>>>> Parker
>>>> (ed.)'s "Synopsis and Classification of Living Organisms" (publ.  
>>>> 1982),
>>>
>>> Penelope Greenslade
>>> Division of Botany and Zoology
>>> Australian National University
>>> GPO Box
>>> Australian Capital Territory 0200
>>> Australia
>>> Telephone 61 (0) 2 6125 0774
>>> Faximile    61 (0)2 6125 5573
>>>
>>>
>>> _______________________________________________
>>> Taxacom mailing list
>>> Taxacom at mailman.nhm.ku.edu
>>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> Dr.B.J.Tindall
> DSMZ-Deutsche Sammlung von Mikro-
> organismen und Zellkulturen GmbH
> Inhoffenstra├če 7B
> 38124 Braunschweig
> Germany
> Tel. ++49 531-2616-224
> Fax  ++49 531-2616-418
> http://www.dsmz.de
> Director: Prof. Dr. Erko Stackebrandt
> Local court: Braunschweig HRB 2570
> Chairman of the management board: MR Dr. Axel Kollatschny
>
> DSMZ - A member of the Leibniz Association (WGL)
>
>
> _______________________________________________
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>





More information about the Taxacom mailing list