names vs. "names" (was: Names for BioDiv Informatics)

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Wed Feb 9 08:37:31 CST 2005

> Actually, this very strongly reminds me of an attitude all too often found
> among the compilers of databases.
> "Yes, we know this name will be erroneous. Yes, we realize that
> including it
> in the database will perpetuate and propagate the error. However,
> if it was
> once used in a book (even if in clear error) we are including it.
> We realize
> this means the world will go to hell in a handbasket, but we
> don't care. As
> long as as our database is 'complete' we are happy."

Evidently you know a completely different set of database compilers than I
know.  Either that, or you miss the point.

By your logic, the World Wide Web is causing the world to go to "hell in a
handbasket", because a certain fraction of the information displayed on web
pages is erroneous.

The reason most people feel that the overall benefits of the internet and
the correct information that it gives us access to outweigh the costs of the
bogus information, is that most people have little difficulty separating the

In the paradigm that David Remsen described, the difference between a "good"
and a "bad" name will be VASTLY more obvious than the difference between a
reliable web page and a hoax.  This is because the presentation of the names
will always be within its relevant context, with all the necessary caveats
and corrections.

You insinuate that database compiliers promote the perpetuation and
propagation of errors.  The exact opposite is true. Unlike the case with
bogus web sites, the goal of the name indexers is to ELIMINATE the
perpetuation and propagation of errors.  Once someone (anyone) has revealed
the erroneous nature of a name, that information gets attached to the name,
and is presented in all its glory to anyone else who encounters that name.

The reason why it is ESSENTIAL to capture every string of characters that
has been used in a book to represent a name of an organism (even if in clear
error), is that there is potentially useful biological information in that
book, associated with that erroneous name.  The strategy is to develop a
thesaurus such that everyone can see that "When Smith used the bogus name
'xyzpdq', he used it to represent a taxonomic concept that is congruent with
the taxonomic concept well-defined by Jones, who represented it by the name
'Aus bus'."  Without esatblishing this thesaural connection between "Aus
bus" SEC Jones and "xyzpdq" SEC Smith, then a future researcher looking for
information on "Aus bus" would likely never realize that Smith, though
defficient in nomenclatural prowess, nevertheless published critical
information about the taxon.

The goal of organism name indexing is NOT simply to accumulate the largest
collection of names.  The goal is to stop the perpetuation and propagation
of errors.  Without indexing the bogus names and identifying them as such,
the future world is at risk of more perpetuation and propagation of such.

> This "first layer" is the step where relevant information is excluded.
> If too much information is excluded (i.e. just the minimum is
> recorded) the
> database will have to be thrown out as error-riddled and unusable
> once it is
> complete (but it won't be thrown out, it will be there forever as
> an excuse
> never to do it right). I am beginning to despair of databases
> (unless built
> from a solid taxonomic basis).

I think that your dispair is a result of "taxonomists' blinders" that I
alluded to in an earlier post.  The universe of biological information is
not owned by, nor is the exclusive domain of, taxonomists.  It is our job as
taxonomists to assist the non-taxonomic world in accessing information about
organisms by providing them with (mostly) unique identifers (names), so that
they can communicate with greater efficiency. I believe that the next step
in fulfilling this job is to sort out the wheat from the chaffe in
nomenclature, and do it in a way that any given correction needs to be made
only ONCE, and thenceforth forever known by all future users of biological


