[Taxacom] Towards a consensus higher classification of organisms (was: List of Orders of the world), misspellings, etc...

B.J.Tindall bti at dsmz.de
Fri Jun 13 09:37:54 CDT 2008

Dear David,
I can't recall using the term "hoovering" as a 
disparaging term. In the case of prokaryotes, as 
you well know the list of names to be used is 
defined, together with the basoynms and synonyms 
and much more. The question is how to tackle the 
harvesting of the data. In the case of 
prokaryotes the names usually seem to be cleaned 
of the accompanying information when they get 
"indexed". This would be rather like scanning a 
dictionary and simply indexing every word without 
taking into account that significant parts of the 
dictionary actually qualify what the word means.

As to certain specific comment to some of your points
1) this is not easy but also requires a more 
formal regulation of what is going on
2) as you know most of this is already possible with prokaryotes
3) my feeling is that in order to do some of this 
one has to invest in areas where one can realise 
these goals in a reasonable time - perhaps one 
group at a time and also making sure that there 
is continued support both in terms of manpower 
and finances. Every small success story might be 
more useful than trying to do all of it all at 
once. There is a danger of massive funding followed by massive pull out.


At 12:55 13.06.08, David Remsen wrote:
>In the 2005 publication "Evolution of the Insects, " the authors
>Grimaldi and Engel state "All accumulated information of a species is
>tied to a scientific name, a name that serves as a link between what
>has been learned in the past and what we today add to the body of
>All that accumulated information is coming on line, here and there,
>from initiatives large and small. GBIF currently provides access to
>135 million records of specimen and observation data living in 1200
>different databases.  The Biodiversity Heritage Library has digitized
>almost 6 million pages of the sort of literature that many Taxacom
>readers would like to access.  They each intend to increase the
>volumes of data available by at least an order of magnitude.  These
>are just two of many initiatives mobilising biodiversity
>information.   Inherent to all of that information are taxon names and
>as the quote alludes, form the primary means by which you or I will
>find that information.
>We facilitate access to these data by building indexes.  We are, in a
>sense, "hoovering" them out of these resources.   While this was
>clearly meant to be a disparaging term in the context of this
>discussion, when replaced with "indexing" it forms the logical basis
>for discovery of these resources.    We certainly can assume this
>index is messy.  It will contain correct and incorrectly spelled
>names.  It will contain names that do and do not comply with the
>nomenclatural codes.  It contains names are are taxonomically (valid
>or accepted or correct) and those that are not.  It is not however,
>without value.  On the contrary, it forms the ultimate basis for what
>Grimaldi and Engel were referring to and to what I suspect Tony means
>when he refers to the "complete"  Catalogue of Life.
>It would be great if each namestring in that list magically aligned
>with some existing authoritative taxonomic/nomenclatural dictionary
>that would distinguish the correctly spelled from the misspelled, the
>nomenclaturally sound from the unsound, and the taxonomically
>(accepted or valid or correct) from the non.   Of course they don't
>because there is no such dictionary.  If there was it should be a
>virtual dictionary composed of lots of contributing members.
>GBIF has an index of names that are linked to the 135 million specimen
>and observation records that one might say were hoovered out of 2000+
>databases.  There are approximately  3.2 million distinct namestrings
>purported to be species names in that index.  We use the Catalogue of
>Life as our primary taxonomic 'backbone' and it matches less than one
>third of these.
>This is not a criticism of the COL which provides the largest
>taxonomic "backbone" we have.  But the ambitions of the COL are toward
>organizing taxa and what we need added to this (aside from access to a
>wider range of taxonomic opinion)  is a separate mechanism to organise
>names.  We also need a wider and more distinct set of semantic and
>syntactic assertions about those names.   The problem lies in how, and
>from whom, we try to get this information.
>What I would like to see emerge is a system that recognises a
>distinction between:
>1. An index of names that link to content as part of a discovery
>process.  These exist and for good reason, and they are all trying,
>more or less ineffectively, to reconcile matters of syntax and
>semantics to two primary classes of information that many of you
>curate: nomenclature and taxonomy.
>2. A coordinated "dictionary" of nomenclature that serves to
>explicitly reconcile the code-compliant from the non-code compliant
>names.   This dictionary of nomenclature should be derived from
>services provided by the existing nomenclatural databases and not
>originate within the index.    Among these services I would like to
>see emerge from the nomenclators are:
>   --  A  consensus higher classification of names that would serve as
>a scaffolding for a subsequent consensus catalog of taxa.   Users of
>the GBIF data portal seeking data on birds would only get a receive a
>small subset of data records tied to avian names if the COL forms the
>basis of a search on Aves.    We can do better.
>  -- access to statements of nomenclatural status for those names that
>some would prefer to forget but who themselves forgot to remove them
>from the literature and other places where they may inadvertantly crop
>--  access to name dictionaries that provide the basis for de-aliasing
>all of the little lexical variations that occur in-situ within the
>information we index and that enable those who build these indexes to
>reconcile these variants to their authoritative nomenclator analogs.
>This is  why Tony wrote his TaxaMatch programme and we wrote our
>LexMapper algorithm and so on.   A global names index,  conceptually
>separate from, but coordinated with, the primary nomenclators ensures
>the sort of qualitative separation we need while allowing these
>efforts to be effectively informed.
>-- nomenclatural synonymy that tie all objective combinations to the
>3. coordinated with a virtual directory of taxonomic opinion as
>diverse as reality dictates, tied to nomenclature via globally unique
>and persistent nomenclatural identifiers, and themselves serving
>similar taxonomic identifiers.    Such a directory requires the
>development of coordinated discovery services itself and standard
>mechanisms to index and resolve information provided.
>Hopefully we come full circle where the resolution of taxonomic
>expertise results in the sort of semantics that Tony and I can use
>across those indexes of content that provide access to all of that
>accumulated information.  To use this more authoritative information
>means we need access and we need copies.  Attributable, temporary,
>cached copies but copies.   It's not good enough to see it's been
>done.  It needs to be applied.
>In my opinion, distinguishing these components provides synergies that
>will effectively embed the efforts of authoritative taxonomy within
>the evolving cyberinfrastructure in a measure-able way that can
>insulate the collision we see between informatics and biology and
>place taxonomy as the centerpiece of the whole show where you and I
>know it should be.
>Three cups of coffee and a whole morning.  Try to be kind.
>David Remsen
>On Jun 12, 2008, at 3:52 PM, B.J.Tindall wrote:
>>Dear Tony,
>>But I do appreciate what you are doing and yes,
>>believe me I do know that there are all sorts of
>>errors out there. I am also not being negative
>>because if you had the expertise to appreciate
>>the mess that was out there you would probably be
>>as critical as I am. Examples:
>>Halobacter Wainoe, Tindall & Ingvorsen, 1999 (0)
>>- as a co-author of the paper I can tell you the
>>organism entered the literature as Halorhabdus
>>The other example:
>>Halobacter (1)
>>Halobacter salinaria (Harrison & Kennedy, 1922)
>>Anderson, 1954 makes reference to only one
>>species of many in that genus that didn't make it
>>on to the bench mark of the Approved Lists of
>>Bacterial Names, so why is only one there?
>>Bacillus pestis (Lehmann & Neumann, 1896) Migula,
>>1900 Bacillaceae CoL2006/BIO-1918-8570 is an
>>older, no longer used name for Yersinia pestis - not even mentioned
>>the entry:
>>lists only Cytophaga heparina (Payza & Korn,
>>1956) Christensen, 1980 , Sphingobacterium
>>heparinum (Payza & Korn, 1956) Takeuchi & Yokota,
>>1993 Pedobacter heparinus (Payza & Korn 1956)
>>Steyn et al., 1998 while Flavobacterium heparinum
>>Payza & Korn 1956 is not mentioned
>>Flavobacterium yabuuchiae Holmes et al., 1988 -
>>no indication of the fact that is also considered
>>to be a synonym of Sphingobacterium spiritivorum.
>>There is absolutely no way of telling whether the
>>names listed are names recognised by the current
>>Bacteriological Code or simply ballast from the
>>past. We are currently running at about 10,000
>>names in use and 20,000 names we would prefer to forget.
>>Haloincola (2) Halomonadaceae SN2000 unverified -
>>but SN 2000 doesn't include this genus in this
>>family (which it doesn't belong to either).
>>and I only sepnt 15 minutes on the site.
>>I am afraid that your site isn't the only one to
>>be dogged by problems with regards names of
>>prokaryotes, some are significantly worse. The
>>major problem is the the average user wouldn't be
>>able to distinguish the problem information from
>>correct information and this just causes
>>unnecessary confusion. This is particularly
>>worrying if one apprecaites that we know all
>>names that the current Code recognises and also
>>the links between the appropriate synonyms etc.
>>At 14:04 12.06.08, Tony.Rees at csiro.au wrote:
>>>Penny, Brian,
>>>First, I was commending Parker as a worthy
>>>exercise that has not yet been surpassed, but
>>>was overdue for an update - including correcting
>>>any shortcomings of course (and yes, I also
>>>noticed the omission of the springtails).
>>>Second, the implied criticism of "uncritically
>>>just hoovering in anything on the web" is
>>>unfair. What I am actually attempting to do is
>>>to fill gaps in the available Catalogue of Life
>>>compilation from other supposedly
>>>"authoritative" lists (including some not
>>>available anywhere in electronic form, and
>>>others not yet published, courtesy of their
>>>authors), and then address some of the issues of
>>>mis-matching names, and hierarchies to a lesser
>>>degree in due course (the latter a secondary
>>>consideration). My ultimate reason for this is
>>>to have a local list (or construct a web
>>>service) that will attempt to answer, at a
>>>machine readable level and in a consistent
>>>manner, the two questions (a) is this genus /
>>>genus + species combination marine or nonmarine,
>>>and (b) is it extant or fossil (or potentially
>>>both), also correcting errors in sources as I
>>>come across them (and I can assure you that the
>>>supposed "gold standard" Catalogue of Life is by
>>>no means error free). Since the alternative
>>>appears to be to wait for the CoL to be complete
>>>(another xx years??) and even then it will not
>>>contain the habitat information that I seek, and
>>>will also miss all the fossil taxa, I feel that
>>>if there is a requirement for such a list "now",
>>>one has no practical alternative to constructing one's own...
>>>I guess I was hoping for constructive criticism
>>>rather than negativity. If the latter is a
>>>general response, it is a simple matter to
>>>disable the high level search options once more
>>>and just use the system to suit my own needs and
>>>those of my "immediate" clients (principally,
>>>OBIS and others with similar habitat-specific
>>>requirements). But maybe there are persons on
>>>the list who see *some* wisdom in this approach
>>>- note the "Interim" in the IRMNG title - when
>>>there is something better to use, I will be the first to use it.
>>>- Tony
>>>-----Original Message-----
>>>From: taxacom-bounces at mailman.nhm.ku.edu
>>>[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of B.J.Tindall
>>>Sent: Thursday, 12 June 2008 7:21 PM
>>>To: taxacom at mailman.nhm.ku.edu
>>>Subject: Re: [Taxacom] Towards a consensus
>>>higher classification of organisms (was: List of
>>>Orders of the world), misspellings, etc...
>>>...and I hope that real benchmark lists of
>>>formally registered names, like the system in use
>>>in prokaryote nomenclature won't get swamped by
>>>those sites that uncritically just hoover in anything on the web.....
>>>At 10:06 12.06.08, Penny Greenslade wrote:
>>>>I hope any such list will not follow Parker (ed.)'s "Synopsis and
>>>>Classification of Living Organisms" (publ. 1982) too closely as
>>>>that book
>>>>left out a whole (abundant and widespread)
>>>Class of organisms, the Collembola.
>>>>Penelope Greenslade
>>>>At 01:14 PM 12/06/2008 +1000, Tony.Rees at csiro.au wrote:
>>>>>(ed.)'s "Synopsis and Classification of Living Organisms" (publ.
>>>>Penelope Greenslade
>>>>Division of Botany and Zoology
>>>>Australian National University
>>>>GPO Box
>>>>Australian Capital Territory 0200
>>>>Telephone 61 (0) 2 6125 0774
>>>>Faximile    61 (0)2 6125 5573
>>>>Taxacom mailing list
>>>>Taxacom at mailman.nhm.ku.edu
>>DSMZ-Deutsche Sammlung von Mikro-
>>organismen und Zellkulturen GmbH
>>Inhoffenstra├če 7B
>>38124 Braunschweig
>>Tel. ++49 531-2616-224
>>Fax  ++49 531-2616-418
>>Director: Prof. Dr. Erko Stackebrandt
>>Local court: Braunschweig HRB 2570
>>Chairman of the management board: MR Dr. Axel Kollatschny
>>DSMZ - A member of the Leibniz Association (WGL)
>>Taxacom mailing list
>>Taxacom at mailman.nhm.ku.edu

DSMZ-Deutsche Sammlung von Mikro-
organismen und Zellkulturen GmbH
Inhoffenstra├če 7B
38124 Braunschweig
Tel. ++49 531-2616-224
Fax  ++49 531-2616-418
Director: Prof. Dr. Erko Stackebrandt
Local court: Braunschweig HRB 2570
Chairman of the management board: MR Dr. Axel Kollatschny

DSMZ - A member of the Leibniz Association (WGL)

More information about the Taxacom mailing list