names & numbers

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Tue Oct 16 07:34:36 CDT 2001


I said:

> >To do it right, you only need two separate entities:  NAMES, and
> ASSERTIONS
> >(=circumscriptions).

[etc...]

To which Doug replied:

> I don't think this claim applies to higher classification, however:
> most people have databases designed in a strict "parent:child"
> hierarchy. If taxon X belongs to subfamily Y of family Z in one
> classification, and to tribe A of subfamily B of family C in another,
> you have VERY different parent:child links all leading down to the
> same terminal taxon.

Yes, and this is very easy to track with the system I use.  I'll elaborate
in greater detail to you in a separate email off-list, so as not to bore the
non-computer nerds (anyone who is interested in getting a copy, please let
me know).

> If anything, this sort of variant supra-generic
> classification is probably *more* common (in terms of numbers of
> species involved) than disagreement about species and genus placement.
> Maybe others here will see a simple workaround to accommodate it, but
> I don't - it's an extremely complex thing to do.

It's not hard at all -- you just need a list of "NAMES", and a
(comprehensive) list of "ASSERTIONS" by experts about those names.  Things
like proper spelling, Parent, etc. go in the ASSERTIONS table.  So for each
NAME, you can have as few as one ASSERTION record (the original
description), or as many records as there have been assertions made by
various experts about the status of that name following its original
description.  To map your example, the NAMES table would have the following
records:

Name    Rank
----------------
X       Genus
Y       Subfamily
Z       Family
A       Tribe
B       Subfamily
C       Family

You'd also have a REFERENCE table:

Citation
--------------
Smith, 1999
Jones, Pers. Comm., 2001

And the ASSERTIONS table would have the following:

ID      Citation                Name    Parent
------------------------------------
1       Smith, 1999             X       Y
2       Smith, 1999             Y       Z
3       Jones, 2001             X       A
4       Jones, 2001             A       B
5       Jones, 2001             B       C

So, by moving the hierarchy details to the ASSERTIONS table, you've just
tracked two separate classifications, and of course, you're not in any way
limited by the number of classifications that you can track using this
method.

I've shown the sample data in simple form.  Bringing this back to the
original point of this thread, you would, of course, substitute a unique and
constant "TaxonNameID" number for each "Name", which would serve as the link
to the ASSERTIONS table.  And, you'd have a similar ID field for the
REFERENCE table as well.

You can use this same pair of tables (well...three tables if you count the
Reference table) to track synonymies as well.  Just imagine the additional
field "ValidName" in the ASSERTIONS table, which is another link back to the
NAMES table. In the ASSERTIONS table, if field "Name" equals field
"ValidName", then the experts noted in the Citation regarded the "Name" as
valid.  If they are not equal, then said experts regarded the "Name" as a
junior synonym of the "ValidName". If you really wanted to map synonomies
accurately, you'd split them out into a separate table that linked One
Assertion record back to another Assertion record....but that's getting
beyond the scope of this thread....

> I think it
> necessitates a disassociation of the terminal taxa and the
> alternative hierarchies within the data (in other words, taxa are
> stored *without* being classified,

Yes, exactly - the "NAMES" table.

> and the phylogenies are stored as
> separate data sets,

Yes, exactly - the "ASSERTIONS" table.

> so you only get the computer to spit out a
> classification for a taxon after you choose which phylogeny you wish
> to view), OR, you stick with the default, reliance on a single
> classification.

The way I have it set up (see my other email to you), you pick a single
"status" for each NAME from among the linked ASSERTIONS to serve as your
"current" status.  Part of the status is the Parent, so the base unit of
classification is selected in this way.  You can create as many "sets" of
these "current status" assertions for all taxa within a certain scope, and
these alternative sets become your alternative "classifications".

> Right now, *all* the major on-line databases,
> including ITIS, choose the latter option; a single, invariant higher
> classification.

But the system Jim describes is doing what my system does, which is to say,
tracks *ALL* classifications (or at least is set up to track them all - you
just need to do the legwork and crank the data in). Moreover, I was invited
to join a meeting last year involving some ITIS folks, and the outcome was a
data schema along the basic lines of what I just described.

> You can't have a child link to two different parents,

Not if that link is contained within the NAMES table ... but if the link to
the parent is kept in the ASSERTIONS table, then you can have as many
different links to different Parents as have ever been asserted for any
given taxon name (or you can even pers. com. yourself, and make up a new
one).  Each classification designates one assertion to be the "correct" or
"current" assertion, so you build a complete set of assertions (one per
taxon name), in order to create one complete classification.

> so ITIS will presumably *never* be able to designate alternative
> classifications above the terminal taxa - unless they radically
> change their data structure.

Maybe some folks at ITIS can comment on that. I'm waiting to hear back as to
whether a synonpsis of the aforementioned meeting has been posted to the
FDGC website yet (I couldn't find it during a brief look through:
http://www.fgdc.gov/index.html).

Aloha,
Rich




More information about the Taxacom mailing list