names vs. "names" (was: Names for BioDiv Informatics)

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Thu Feb 10 09:53:00 CST 2005

Paul van Rijckevorsel wrote:

> Mulitple users means multiple entry by those users, from lots of different
> angles. It seems to me that content will be the driving force in
> determining form ("form follows function" ;-)).

I completely agree!  The idea is that there are enough databases out there
now that approach the level of function that we imagine we'd want (at each
of the three levels David defined), that we can use those examplars to start
working on the prototype "form" (infrastructure).  The exemplar datasets are
robust and varied enough that it's likely that the infrastructure will be
able to scale to all of taxonomic nokmenclature (the idealist in me starts
to reveal itself...)

> On the other hand we have been
> hearing for years about nifty packages that will allow users to
> conveniently enter data. This has led to all kinds of databases, with
> degrees of compatibility. It seems to be going on and on. What was sold
yesterday as
> the ultimate answer to all databasing problems is hopelessly obsolete
> Any new "standard" launched seems to add just one more war between
database makers ;-)

No arguments!!!  However, if you ask the people "on the ground" -- the ones
who have been beating their heads against this elusive promise for more than
a decade now -- I think you'll find that most of them feel that this time,
we really are on the cusp of something big, something different.  Some of
the factors that are different now include:

- More ubiquitous broadband access to the internet
- A trend towards open-source solutions, rather than proprietary ones
- International standards that seem to be, or liekly soon will be, catching
fire (e.g., ABCD/BioCase, DarwinCore/DiGIR, TCS)
- International collaboration of unprecedented scales (GBIF)
- Crossing of a threshold in price/performance of computer hardware
- The existance of a functioning digitally-based system for bacterial
- An explicit interest in "going digitial" by the custodians of at least one
of the other "big" Codes of nomenclature (watch this space)
- Achieving a critical mass of practicing taxonomists who belong to the
"computer-comfortable" generation

I could probably go on; but the point is that, while I can understand why
this may seem like the "same old, same old" to an outside observer, the
"chatter" among the bioinformatics (sensu lato) "insiders" is reaching a

> > One of the wonderful things about electronic information technology is
> that once one person puts all that effort into developing the
> infrastructure, it's available for everyone else to use (assuming an
> open-source paradigm). Once one taxonomist identifies the original
> description of a taxon name, and creates a complete bibliographic index of
> all subsequent uses of that name, no one else ever has to repeat the same
> work.
> ***
> To quite a large extent, the latter is true of a good monograph, as well.

Agreed -- except that the cost in distributing physical copies of a
monograph onto the desktop of every taxonomist who needs access to it is
VASTLY greater than the cost of downloading a PDF via the internet.  The
amount of time it takes to leaf through the pages (or even to scan through
the index) of a paper monograph to locate a particular piece of information
is much greater than the time it takes to electronically search a document
for a word or phrase.  Though neither of these examples are database-driven
per se, they are part of the broader "electronic information technology"
that I was referencing in the quoted paragraph above.

Where the database fits in is as an index.  No one would dispute the value
of a traditional index of a book for its role in increasing the efficiency
of accessing information contained in the book.  The promise of electronic
information technology and the sorts of databases that we are discussing is
that they can serve the function of an index -- not of a single published
word, or even a series of volumes; but potentially as an index to EVERY
piece of biological information that ever has been, or ever will be,
documented in any form.  That dream is, of course, a long way off -- but the
foundation for it is being built right now, and is the focus of these

> > Bill Eshmeyer devoted several decades of his professional life
> to creating
> the electronic Catalog of fishes, and the ichthyological world is a MUCH,
> MUCH better place because of it.
> ***
> Obviously invaluable, but likely content-driven in design?

Yes, and that content is one of the "exemplars" (in my mind, at least) upon
which the broader infrastructure should be based.

> ***
> Actually it was the blithe treatment of "vernacular names" that got to me.
> This is very well if there are standardized common names, as for the birds
> of NA, or even common names (in widespread use), but vernacular names as
> such are a terrible morass. Actually the 4 million vernacular
> names does not
> sound like a bad estimate for Angiosperms alone. At 250k-400k Angiosperm
> species, that is an average of 10-16 vernacular names per species. A major
> plant species will easily have hundreds, if not thousands, of vernacular
> names.

I think the point is that the development of an index of vernacular names
need not be seen as a distraction from (or competition to) the development
of a "Level-2" sort of subset index that focuses on Linnean-domain names.
Indeed, I think the slice of pie devoted to the latter is far greater than
what is devoted to the former (as a taxonomist, I'm happy with that fact).
However, as David pointed out, the vernacular names cannot be ignored,
because they serve  as a gateway to biological information for a huge number
of people (the sort of people who vote for the politicians that hold the
purse-strings to government funding of science).  Also (putting on my
information-manager's hat), Linnaean names are fundamentally a
highly-refined subset of the broader scope of "text or symbols intended to
represent organisms" (Sort of like how science in general can be thought of
as a highly-refined subset of philosophy.)

In any case, the development of an information management infrastructure for
dealing with Linnean-domain names need not be bogged down by concerns about
vernacular names, but should be developed with an awareness of how it will
integrate with a system designed to accomodae a broader scope of "text or
symbols intended to represent organisms".

> Nobody should think of going there, not without making a well-crafted plan
> first (and finding lots of resources).

I think what the uBio folks (as well as a number of others) are working on
right now is the "well-crafter plan" part, without really "stealing" lots of
resources from the "Level-2" camp.

Good lord!  Look at the time...


More information about the Taxacom mailing list