[Taxacom] Data query

Richard Pyle deepreef at bishopmuseum.org
Wed Jun 26 00:08:14 CDT 2013

I have to confess, it was a bit of baiting on my part.... to solicit
something to the effect of:

> No doubt ZooBank/GNUB has a take on this 
> which will or may assist - Rich?

So, it's not a ZooBank thing, because ZooBank only deals in names &
nomenclatural Acts.  But, of course, ZooBank sits on top of GNUB, and the
scope of GNUB is every TaxonNameUsage (TNU) ever.  The really cool thing
about using TNUs as the core object for taxonomy, is that you can do a LOT
of elegant stuff with them.

For those really interested in this technical stuff, read on.  For everyone
else, PRESS DELETE NOW!  It only gets ugly from here.....

A TNU is an individual usage of an individual "name".  The word "name" here
applies to the terminal epithet only.  The essential attributes of a TNU are
the a pointer to the Protonym for the name, a pointer to the reference in
which the TNU occurred, the Rank that the Reference treated the name, the
immediate taxonomic Parent of the name, whether or not it was valid, and
exactly how it was spelled.  It's important to understand this basic
structure in order o understand the answer to Tony's question (below).

So, let's take a simple example:

Linneaus 1758 established the genus name "Aus" and within it, the species
name "bus".  We therefore have two TNUs:

TNU	Prot	Reference	Rank	Valid	Parent	Spelling
1	1	Linn., 1758	Gen.	1	--	Aus
2	2	Linn., 1758	Sp.	2	1	bus

So, both of these TNUs are Protonyms (AKA "original descriptions"), because
in both cases TNU=Prot.  Both were treated as valid taxa, because in both
cases TNU=Valid.  We don't know what parent taxon Linn. 1758 placed the
genus Aus in, but we can see that he places the species "bus" within the
genus Aus (based on Parent=1).  And we see how Linnaeus spelled both names.

Now, suppose Smith 1950 reviews the group, and describes another species
(Aus cus).  We now have 3 more TNUs:

TNU	Prot	Reference	Rank	Valid	Parent	Spelling
3	1	Sm., 1950	Gen.	3	--	Aus
4	2	Sm., 1950	Sp.	4	3	bus
5	5	Sm., 1950	Sp.	5	3	cus

You should see that Smith treated both of Linnaeus' names, spelled them the
same way, and added a new Protonym (TNU=5) to establist the new speces "Aus

Now suppose Pyle comes along in 2012  and describes a new genus ("Xus") for
"bus", and decides that "cus" is a synonym of "bus":

TNU	Prot	Reference	Rank	Valid	Parent	Spelling
6	6	Pyle, 2012	Gen.	6	--	Xus
7	2	Pyle, 2012	Sp.	7	6	bus
8	5	Pyle, 2012	Sp.	7	--	cus

The key thing to note is that Pyle places Linnaeus' "bus" in his new genus
"Xus", and regards Smith's "cus" as a synonym of "bus" (Valid=7 for TNU 8).

If you can imagine extending this TNU table out to millions of individual
usages, for names at all ranks, etc., you're basically capturing the
essentials of every treatment and classification of every name that has ever
been documented.  That's what GNUB is all about -- indexing these basic
elements of every single TaxonNameUsage.

Now, on to Tony's question: How does GNUB manage multiple classifications?
The way it works is through a structure we developed that maps an "Accepted
Taxonomy" according to a particular "MetaAuthority".  This is all explained
in much more detail on pp. 35-36 of this publication:
http://systbio.org/files/phyloinformatics/1.pdf (note, what is referred to
in that publication as an "Assertion" is what I now refer to as a

Catalog of Life is an example of a MetaAuthority, as is WoRMS, or
WikiSpecies, or any individual taxonomist who wants to build a "preferred"
classification. The way it works is that each MetaAuthority selects a
particular TNU for each Protonym to represent their view.  The full
classification is assembled from the set of MetaAuthority-selected TNUs.

So, suppose Rich Pyle is a MetaAuthority, and he prefers to follow his own
2012 classification.  His Accepted Taxonomy records would look something
like this:

Pyle	6	6
Pyle	2	7
Pyle	5	8

According to this MetaAuthority (Pyle) the Genus "Xus" (Prot=6) should be
treated according to Pyle, 2012 (TNU=6), which is as a valid genus .  The
species "bus" (Prot=2) is treated as a valid species within "Xus" (TNU=7).
The species "cus" (Prot=5) is treated as a junior synonym of "bus" (TNU=8).

Now, let's say Tony Rees represents a different MetaAuthority.  He's got a
bit of different view.  He thinks that "cus" is correctly treated as a
synonym of bus, but he doesn't buy Pyle's argument for "Xus", so he wants to
place "bus" in "Aus".  The Rees Accepted Taxonomy might look like this:

Rees	1	3
Rees	2	4
Rees	5	8

He thinks Aus (Prot=1) is a valid genus, so he's following Smith's treatment
(TNU=3).  He thinks bus (Prot=2) is a valid species and belongs in the genus
Aus, so he's following Smith's treatment for that name as well (TNU=4).  But
he agrees with Pyle that cus is properly considered a synonym, so for that
name (Prot=5), he's following Pyle's treatment (TNU=8).

I've only touched the tip of the iceberg of how this work, but the important
point is that using this "MetaAuthority" approach, you can have an unlimited
number of classifications, and you can switch between them very quickly.
Using this technique, we can do things like:
"Show me a checklist of the species of fishes from Johnston Atoll, presented
according to the FishBase MetaAuthority."  What this would do is scour the
database for all records of fishes from Johnston Atoll, no matter how they
were originally identified, then translate them into the classification used
by FishBase today.  With the click of the mouse button, you can take this
same list and re-present it according to, say, the WoRMS classification, or
Richard Pyle's personal classification, or any other MetaAuthority's
classification.  This "translation" is done on the fly at query time, so you
can pretty quickly compare two alternate classifications to see where they
are the same, and where they differ.

Believe it or not, it's much simpler than it seems based on the description

OK, that's enough....


More information about the Taxacom mailing list