names vs. "names" (was: Names for BioDiv Informatics)

B.J.Tindall bti at DSMZ.DE
Wed Feb 9 07:50:33 CST 2005

I was wondering whether I should comment on an earlier posting from Richard
Pyle, and then this comprehensive post came in. I have no quarrel what
David wrote, but it occurs to me that if we have three levels, then end
users need to know exactly at which level they are working. This is
particularly important once one introduces the "code layer" and the
"taxonomic layer". My assumption is that in order to understand these two
levels one would need to know something about how the codes handle names
and then how taxonomies tie these names into taxonomic concepts. My
experience in bacteriology is that precious few people are taught the
principles behind taxonomy properly and you can count the experts in the
Bacteriological Code on the fingers of one hand. The result, to date, has
been an intermingling of David/Rich's level 1, 2 and 3 in a rather chaotic
fashion in some major databases, despite the fact that the source of the
names was originally a level 2 (Code) names database. While we can correct
this now, the future depends on there being people out there properly
trained in bacterial taxonomy and the interpretation of the Bacteriological
Code. Perhaps it is a sobering thought when one considers that the vast
majority of bacteriologist working in quality control environments
(identification of organisms in the pharmaceutical and food industry)
rarely have any direct training in bacterial taxonomy or identification
when they take up their positions.
In summary, making all this information available depends (at least) on
experts providing input at level 2 and 3, and one would need to communicate
the differences between the different levels to the end users. Rather than
replacing taxonomists and code experts this seems to imply that the
significance of these two areas would increase, simply because end users
can now access the data and are indirectly forced to deal with the topics
of taxonomy and codes.

At 12:59 8.2.2005 -0500, Dave Remsen wrote:
>I hope that any biological name infrastructure needs to be
>necessarily broad, encompassing the widest possible
>definition of the term “name.”  I like Richs submission of
>“any text of symbol intended to represent organisms.”   A
>broad definition is inclusive.  It should account for
>vernacular terms, ad-hoc terms, OTUs, misspellings and
>anything else that is currently recorded or used an
>exchange currency in reference to organisms.  From our
>library perspective this is critical.  We have no control
>over the names recorded in our information assets.  They
>come as they are with all the rough edges intact and they
>are essentially immutable.  We have to account for them
>even if they are frowned upon by some but an accounting is
>not a blessing.  Proper attributes can eventually qualify
>these things.  Vernacular names are a primary gateway to
>biological information by the majority of humanity.  A
>well-defined vernacular concept provides the scientific
>community with an opportunity to engage them  (We
>conservatively estimate 4 million+ vernacular names).
>At the library/taxonomy meeting in London this past
>weekend, we made a reference to a “second taxonomic
>impediment.”  This is the lack of clear separation between
>nomenclature and taxonomy both within and between
>technological implementations.  Compilations of the two
>are based on entirely different rate-limiting steps.
>  Names are factual data objects while taxonomic concepts
>represent subjective information.  Accounting for the
>factual existence of names is not a compilation of
>taxonomic concepts but the two are often not separated
>with the result that the rate of compilation is
>significantly reduced.  A separate, factually-defined
>names compilation can proceed much more quickly.   This
>lack of formal separation within information systems also
>impedes collaboration and interchange.  I have seen more
>taxonomic data models in the past two years than I care to
>remember but one thing is quite clear to me and that is
>there are generally very good reasons for them.   I can’t
>imagine a single generalized taxonomic model that would
>work for everyone (although I did try once).   But today
>what this implies is that for every new model one must
>essentially rebuild the underlying nomenclatural component
>that must be a part of the system.  The same name, same
>authors, same sets of homotypic names, etc.     The same
>facts, over and over again and different efforts get
>different subsets of this same information yet none of it
>is new science and it all costs money.
>Our survival strategy within uBio has been based on a
>philosophy of making friends and seeking collaborations.
>  One way we realized early on we could do this was to
>separate our taxonomic data model from our nomenclatural
>subsystem creating essentially two new services:  a
>biological name server (called NameBank) and a taxonomic
>name server (TNS).  Now folks could disagree on the
>utility of out taxonomic model and yet still find common
>ground for sharing nomenclature.  Multiple taxonomic
>models could, in principle, share the same set of
>nomenclatural facts.  Work out issues of data
>quality/completeness assessment so that multiple parties
>can add and elevate the data quality and add a attribution
>subsystem that provides an itemized accounting to all
>contributors and we have the potential of a single
>contribution having multi-dimensional and accountable
>utility to the contributors.
>We have been advocating a layered informatics architecture
>for some time now for the simple reason is that a.) it
>works and b.) it might allow us to survive.  Layered
>system are how the internet protocols work. I like to
>point to TCP/IP and UDP/IP as two different transport
>models that share the same address protocol.  Different
>taxonomic models may address different points in the
>taxonomic concept continuum but they should certainly be
>able to share a common nomenclature.
>I see at least three separate layers.  Starting at the
>1. A generalized biological name service (NameBank or
>something like it):  Broad enough to account for all forms
>of an inclusive consensus names definition with sufficient
>disambiguating attributes.  All scientific and vernacular
>names, OTUs, objective synonymy,.  There are millions of
>these.   All attributes are known facts.   Start real
>simple.  What are the minimum things we would all agree
>on.  Stop there.  Draw line.  Build that.
>2. Code Layer – Some subset of the entries in Level 1 will
>be names that would fall within the scope of the
>nomenclatural codes.  Many would not.   This service would
>add the code-related attributes to the service object
>(perhaps Linnean core) to provide a higher order taxonomic
>service layer with a vetted list of NameBank entries.
>3. Taxonomic Layer – Taxonomic data models would sit atop
>either the code layer or the underlying nameBank layer.
>This separation allows the compilation of known recorded
>names, the application of the codes, and the enterprise of
>taxonomy to proceed independently.
>That's our wish list from here.
>David Remsen

* Dr.B.J.Tindall      E-MAIL bti at                           *          
* DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH *
* Mascheroder Weg 1b, D-38124 Braunschweig, Germany                *
* Tel.: ++ 531 2616 0 (general)                                    *
* Tel.: ++ 531 2616 224 (direct)                                   *
* Fax:  ++ 531 2616 418                                            *
*                                                                  *
* Homepage:                          *
* E-MAIL: contact at (general enquiries)                      *
*         sales at (sales)                                    *

More information about the Taxacom mailing list