names & numbers

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Sun Oct 14 00:40:05 CDT 2001

Hi Jim,

Thanks for chiming in!

> What the heck is a 'type without a name'?

Ha! I knew I could count on you to catch that one!  I was beginning to worry
for a minute there that I might have been too obtuse ...

> A transient fantasy of a
> faceless taxonomist?   Of course, ALL my collections are types
> just waiting
> to happen...  :)

Actually, although I was trying to be ironic in my original post, it just
occured to me that there actually *are* types without names out there on
Museum shelves (or, I should say, specimens whose labels designate them as
types of names that never got published). But that's an entirely separate

> >Overall, I think it is necessary to regard a "name" as a distinct entity,
> >and the type as an attribute (a biologically defining attribute) of that
> >entity.
> If you model the universe that way you will come unstuck, big time.  They
> are both entities with their own definable attributes and they have a
> relationship to each other.  In fact a type is nothing more that
> a specimen
> with one particular attribute set - you could model it as a subentity of
> the specimen entity if you like...

I think this comes down to a simple miscommunication deal.  The original
version of this message (which you got directly, Jim) had an elaboration of
that miscommunication, but since Taxacom refused that version of this post
as too long, I've excised it from this version.

> This particular thread has come up several times on Taxacom in
> the past and
> much the same things are said and it NEVER resolves itself.  Why?

That's a question I keep hoping that someone would answer.

> For a number of reasons, I suspect: general misunderstanding or the
> relationships between names, types, taxa and taxonomic concepts;
> freedom-loving anarchists vs. analy retentive control freaks; evil
> doctrinaire centralist oligarchy with their big data sets of global
> hegemony vs. uncontrollable decentralist rabble with their individual
> uncoordinated datasets that are of damn all use to anyone but themselves;
> my taxonomy vs. anyone else who dares to question it; etc., etc...

That's what I would have figured as well, but were are they?  I keep hearing
about all these staunch opponents to the notion of assigning universal
serial numbers to taxon names, but so far this thread hasn't seemed to have
shaken them out. Eric Dunbar even posed the question point-blank, and I'm
still waiting for a response.

> On top of that is another ugly issue that also has an unpleasant habit of
> recurring on Taxacom - what exactly constitutes a species or taxon.  The
> problem here is that they can be circumscribed in one or more of
> three not
> entirely compatible and not entirely mutually exclusive ways:

Yes, but that can of worms can be dealt with separately.  I've been
carefully trying in my posts on this thread to distinguish between names and
circumscriptions.  The names are objective and clearly (for the most part,
anyway) defined within the context of the IC_N codes.  If we can't all come
to terms on settling *that* one, we're NEVER going to get very far building
consensus on how to manage circumscriptions (taxon definitions).  In my
(perhaps atypical) world view on this, a taxon NAME is defined by its
primary type, and nothing else.  A TAXON is defined by its circumscription,
which can include details about morphology, realtionshoips, etc., and
essentially serves as an assertion about where to draw the boundaries
between the kin of the primary type, and the kin of other priamary types
that are regarded as belonging to separate species.  An "original
description" provides both, while subsequent assertions only play with the
latter (unless you regard new combinations of Genus + species as new names
in themselves....but that's yet another topic for yet another thread).

> So... what to do?
> Numbers and codes?  Don't go there...  Humans use names for a reason -
> communication.

And my point has been (and continues to be) that the numbers are mainly for
the computers only.  I'm not talking about *replacing* names with numbers
(I'd argue vehemently against that sort of proposal). And unlike others, I'm
not even arguing that the numbers would ever be exposed to human eyes
outside of a computer programmer's office. And even then, there would seldom
be any need for a programmer to ever actually *type* one of these numbers
using a keyboard -- they would remain more or less permanetly imprissioned
within an electronic realm.

I say: let's hide the numbers from the humans; let the humans go on using
the names the way they have been using them quite successfully for the past
two and a half centuries.  But for crying out loud, let us computer nerds
find a way to come to agreement on how we make our database programs
reference any particular name, so that our data can be shared effortlessly
via electronic conduits in the cooperative spirit of what science was meant
to be.

> It is easy enough to design a database application that handles a single
> agreed taxonomy - we do it all the time - and even to handle
> quite complex
> synonymies, but we seem to fall in  heap when it comes to dealing
> with all
> but the very simplest of alternative taxonomies.

It's actually not that hard to design a system that can track EVERY taxonomy
ever asserted by any taxonomist (or non-taxonomist, for that reason).  In
fact, it's amazingly simple, once you separate names from circumscriptions.
You and I have already had this conversation (If I recall, it took us about
10 long back-and-forth messages to come to the realization that we were both
arguing prcisely the same point.) What's not so easy is getting the world of
taxonomists (or even the more limited world of taxonomy data buffs) to get
together and make it happen, so that the legions of folks who are doing the
real work (sniffing through all the original descriptions and subsequent
circumscription assertions) can dump the fruits of their efforts into a
common data pool.

> An approach we are trying here is to record taxonomies and
> synonymy as they
> are published, by each published reference.  This means that the
> compilers
> of the database do not have to venture an opinion - they are
> recording, as
> a simple matter of fact, what one particular taxonomist said or
> implied at
> one particular time.

I should forward you the enormous volume of email between myself and Bill
Eschmeyer a couple of years ago on PRECISELY this issue.  You think my posts
to Taxacom are long - you ain't seen NOTHING yet! :-)

To do it right, you only need two separate entities:  NAMES, and ASSERTIONS
(=circumscriptions).  Everything you enter into the database is OBJECTIVE -
no opinions by the data gatherers required.  If ITIS or some other agency
wants to select one particular taxonomy, then they need only designate one
assertion record for each name record to serve as the "current" or "correct"
circumscription of that name. From a data concept perspective, it's pretty
straightforward.  The real work is in trudging through all those
publications to capture all those taxonomic assertions in electronic form.

Incidentally, you need not restrict it to just "published" references.  Any
identification of a Museum specimen can serve as an assertion, just as any
unpublished pers. comm. from an expert.

> No one can argue with this, other than the
> fact that
> we may have entered the data wrongly - we are simply trying to record the
> history of taxonomic concepts surrounding a particular name.

EXACTLY!  Jim, if you didn't have that scruffy beard of yours, I'd want to
kiss you! :)

> Another downer is that it is not what most users or clients want.  They
> want to be told 'what is truth' and they do not want it qualified
> with any ifs or buts.

There's nothing stopping you or any other organization or person from
selecting one of the historical assertions and calling it "current". This
way you get the best of both worlds -- you capture the objective data, and
then as a separate meta-layer you select among those historical data.
Ultimately, this simply becomes itself yet another assertion.

> To handle this we have had to indicate that a
> particular taxonomy has been 'accepted', in whole or in part, for our
> purposes, in effect making the database itself a reference to be
> cited...  at this point things start to get a little complicated...

Not really that much more complicated -- if you don't want to deal with a
separate "accepted assertions" layer, then just add one more assertion
record (attributed to yourself) among the heaps of others, and then offer
the client only your own most recent assertions.

> Summary; taxon codes and numbers - don't do it!

How can we see the solution in so precisely the same way, but diverge on
this one point?  Methinks we're stuck in a rut of miscommunication again.
Let me wrap it up in two simple questions:

1) Does your database assign an arbitrary surrogate primary key to each
taxon name entity, for use as an internal constant and unique identifier for
that name?

2) Would the world not be a better place if my database, and the Species
2000 database, and the ITIS database, and the Index Kewensis database, and
every other taxon name database out there used the same set of arbitrary
surrogate primary keys for the same taxon names?

What am I missing?


More information about the Taxacom mailing list