names vs. "names" (was: Names for BioDiv Informatics)

Dave Remsen dremsen at MBL.EDU
Tue Feb 8 12:59:16 CST 2005


I hope that any biological name infrastructure needs to be
necessarily broad, encompassing the widest possible
definition of the term “name.”  I like Richs submission of
“any text of symbol intended to represent organisms.”   A
broad definition is inclusive.  It should account for
vernacular terms, ad-hoc terms, OTUs, misspellings and
anything else that is currently recorded or used an
exchange currency in reference to organisms.  From our
library perspective this is critical.  We have no control
over the names recorded in our information assets.  They
come as they are with all the rough edges intact and they
are essentially immutable.  We have to account for them
even if they are frowned upon by some but an accounting is
not a blessing.  Proper attributes can eventually qualify
these things.  Vernacular names are a primary gateway to
biological information by the majority of humanity.  A
well-defined vernacular concept provides the scientific
community with an opportunity to engage them  (We
conservatively estimate 4 million+ vernacular names).

At the library/taxonomy meeting in London this past
weekend, we made a reference to a “second taxonomic
impediment.”  This is the lack of clear separation between
nomenclature and taxonomy both within and between
technological implementations.  Compilations of the two
are based on entirely different rate-limiting steps.
  Names are factual data objects while taxonomic concepts
represent subjective information.  Accounting for the
factual existence of names is not a compilation of
taxonomic concepts but the two are often not separated
with the result that the rate of compilation is
significantly reduced.  A separate, factually-defined
names compilation can proceed much more quickly.   This
lack of formal separation within information systems also
impedes collaboration and interchange.  I have seen more
taxonomic data models in the past two years than I care to
remember but one thing is quite clear to me and that is
there are generally very good reasons for them.   I can’t
imagine a single generalized taxonomic model that would
work for everyone (although I did try once).   But today
what this implies is that for every new model one must
essentially rebuild the underlying nomenclatural component
that must be a part of the system.  The same name, same
authors, same sets of homotypic names, etc.     The same
facts, over and over again and different efforts get
different subsets of this same information yet none of it
is new science and it all costs money.

Our survival strategy within uBio has been based on a
philosophy of making friends and seeking collaborations.
  One way we realized early on we could do this was to
separate our taxonomic data model from our nomenclatural
subsystem creating essentially two new services:  a
biological name server (called NameBank) and a taxonomic
name server (TNS).  Now folks could disagree on the
utility of out taxonomic model and yet still find common
ground for sharing nomenclature.  Multiple taxonomic
models could, in principle, share the same set of
nomenclatural facts.  Work out issues of data
quality/completeness assessment so that multiple parties
can add and elevate the data quality and add a attribution
subsystem that provides an itemized accounting to all
contributors and we have the potential of a single
contribution having multi-dimensional and accountable
utility to the contributors.

We have been advocating a layered informatics architecture
for some time now for the simple reason is that a.) it
works and b.) it might allow us to survive.  Layered
system are how the internet protocols work. I like to
point to TCP/IP and UDP/IP as two different transport
models that share the same address protocol.  Different
taxonomic models may address different points in the
taxonomic concept continuum but they should certainly be
able to share a common nomenclature.

I see at least three separate layers.  Starting at the
bottom.

1. A generalized biological name service (NameBank or
something like it):  Broad enough to account for all forms
of an inclusive consensus names definition with sufficient
disambiguating attributes.  All scientific and vernacular
names, OTUs, objective synonymy,.  There are millions of
these.   All attributes are known facts.   Start real
simple.  What are the minimum things we would all agree
on.  Stop there.  Draw line.  Build that.

2. Code Layer – Some subset of the entries in Level 1 will
be names that would fall within the scope of the
nomenclatural codes.  Many would not.   This service would
add the code-related attributes to the service object
(perhaps Linnean core) to provide a higher order taxonomic
service layer with a vetted list of NameBank entries.

3. Taxonomic Layer – Taxonomic data models would sit atop
either the code layer or the underlying nameBank layer.

This separation allows the compilation of known recorded
names, the application of the codes, and the enterprise of
taxonomy to proceed independently.

That's our wish list from here.

David Remsen




More information about the Taxacom mailing list