[Taxacom] the hurdle for all biodiv informatics initiatives

Stephen Thorpe s.thorpe at auckland.ac.nz
Wed Feb 17 20:07:58 CST 2010

>But when computers commimicate [sic] with other computers (as is happening with increasing frequency, and is what is needed to *really* make computers useful tools for taxonomists), the computers should "speak" to each other in their own language

A string is a string is a string to a computer. The string 'Feronia sodalis LeConte 1848' is just as good an identifier as the string 'urn:lsid:ubio.org:namebank:6755946', from the machine's perspective, surely? There seems to be a redundancy here ...

From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Richard Pyle [deepreef at bishopmuseum.org]
Sent: Thursday, 18 February 2010 2:02 p.m.
To: 'taxacom at mailman.nhm.ku.edu'
Subject: Re: [Taxacom] the hurdle for all biodiv informatics initiatives

Wolfgang Lorenz wrote:

> Dear taxacomers,
> all those initiatives, wikispecies not less than the alliance
> of GBIF, EoL, CoL, etc. are still under preparation, not yet
> in full operation, so it seems we should be patient,
> especially since they take very different approaches and roles.

The above sentence is spot-on, and one of the most thoughtful contributions
to this series of threads, in my opinion.

> What we can make out, at this early stage, is the taxonomic
> names problem as probably THE major hurdle for all those
> projects!

I couldn't agree more!!!

> Roderic Page's defense of DOIs for publications is
> quite understandable to me, but I'm in doubt when he writes:
> >>Lastly, imagine if we had similar services for the other things we
> >>care about, such as taxonomic names ..<
> Taxonomic names are not just things like other things, but
> even if we take them as abstracted "name objects" only, -
> cannot we study plenty of examples out there to see the
> massive problem?

Indeed; several of us have been doing that for quite some time now.

> And, isn't this a major reason why many taxonomists are so
> skeptical about what biodiversity informatics was doing so far?

I don't follow the logic here. But more below.

> Take the nearctic beetle Cyclotrachelus sodalis (LeConte
> 1848) for just one of so many examples: six different generic
> combinations (objective synonyms) have been used for it.
> These six names are listed as separate species in the
> following name-aggregators, with a total of 18 (!!) different
> LSIDs so far:


I think you misunderstand the purpose of these services.  You say above that
"These six names are listed as separate species" -- when in fact, these
services do NOT assign LSIDs to "species" (by anyone's definition of that
term).  The LSIDs are assigned to text strings purported to represent taxon
names.  The text string "Feronia sodalis" is separate from "Feronia sodalis
LeConte 1848", which is why it shows up as two separate records in uBio.
Now....I'm not a fan of assigning GUIDs to unique text strings, because the
text strings themselves are unique; and hence there is not much advantage to
creating a seprate identifier for the *text string*.  However, there are
many, many other examples of why GUIDs (=persitent identifiers) can be
*extremely* useful when assigned to taxon names and related information.

> Why do we need such identifiers and who can take control of it???
> Instead of machine-only-readable identifiers, which are
> obviously "out of human control" in so many examples, we
> could have perfectly stable, unique and readable Name Strings
> for each available name, registered and resolvable in a
> future ZooBank:
> ZS-Feronia_sodalis
> ZS-Feronia_sodalis/Eumolops_sodalis
> ZS-Feronia_sodalis/Evarthrus_sodalis
> ZS-Feronia_sodalis/Pterostichus_sodalis
> ZS-Feronia_sodalis/Cyclotrachelus_sodalis
> ZS-Feronia_sodalis/Abax_sodalis

This approach seems very sensible to anyone who who has never needed to sort
out homonyms.  If you add author and year, you can reduce the problem of
homonymy, but you increase the problem of establishing a standard way of
representing author & year data.

> Together with such standardized name strings, Zoobank should
> store information that belongs to a name but does not form
> part of it, like
> author+date, page numbers, type information, grammar, etc.
> And the idea is that projects like GBIF, EoL, etc. could
> build upon these human-readable identifiers by just adding
> something like an usage instance number. E.g., GBIF, when it
> has parsed occurrence records for "Cyclotrachelus sodalis
> Lec." and "Abax sodalis" it can assign human-readable
> ID-strings, perhaps in the following format:
> ZS-Feronia_sodalis/Cyclotrachelus_sodalis#2345
> ZS-Feronia_sodalis/Abax_sodalis#324
> With such standardized strings it should be much less of a
> problem for humans AND computers to know what's in those
> names. This is not the solution for ALL problems, of course,
> but a solid nomenclatural basis could bring us a huge step
> forward, IMHO ... or do I miss something?

I don't think you're necessarily missing anything.  We already have a
human-friendly way of displaying a taxon name; e.g. "Feronia sodalis LeConte
1848".  That's worked reasonably well for human brains for the past couple
of centuries.  The only reason we're having a conversation about GUIDs is
that -- as has been proven from decades of experience trying to use computer
databases as tools to manage taxonomic information -- these sorts of
human-friendly identifiers are not well suited to establishing resolvable
links among digitized information.  The exceptions (homonyms, mis-spellings,
etc.) seem rare to us, but even low-frequency exceptions to general rules
create all sorts of confusion and complexity when we try to build robust
computer databases as tools for taxonomists.  My point here is that when
humans communicate with humans, or when computers communicate with humans,
the human side of the equation should always see the human-friendly
identifier. But when computers commimicate with other computers (as is
happening with increasing frequency, and is what is needed to *really* make
computers useful tools for taxonomists), the computers should "speak" to
each other in their own language.

Just as you would not want to confuse a human with
"urn:lsid:zoobank.org:act:44530E74-95E2-4F58-B1F0-9816AFD37772", neither
would you want to confuse a computer with "Corydoras sodalis  Nijssen &
Isbrücker 1986".

I was going to include an elaborate explanation of GNUB (which Paul Kirk
alluded to), and how it has the potential to solve essentially *all* of
these problems.  But I'm out of time right now.  Depending on how this
thread goes, I may take the time to explain more.  Otherwise:  watch the GN*
space (GNA/GNI/GNUB/GNOMA/CiteBank).



Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

More information about the Taxacom mailing list