[Taxacom] the hurdle for all biodiv informatics initiatives

Richard Pyle deepreef at bishopmuseum.org
Wed Feb 17 20:41:34 CST 2010

> A string is a string is a string to a computer. The string 
> 'Feronia sodalis LeConte 1848' is just as good an identifier 
> as the string 'urn:lsid:ubio.org:namebank:6755946', from the 
> machine's perspective, surely? There seems to be a redundancy here ...

Well...two things.  First, we don't know whether the computer is resolving
GUIDs as strings, or as binaries.  For example, the UUID
"44530E74-95E2-4F58-B1F0-9816AFD37772" is really just a
less-human-unfriendly (I hesitate to say "more human friendly"!) way of
representing 128 consecutive 1's and 0's.  I don't know whether UUIDs are
resolved by computers in their textual form, or in their binary form.
Personally, I don't care -- I just want the computer to do whatever is most
efficient for its own purposes.  I, as a human, never want to see
"44530E74-95E2-4F58-B1F0-9816AFD37772" -- I only want to see "Corydoras
sodalis  Nijssen & Isbrücker 1986".

Second, it seems to us humans that "Feronia sodalis LeConte 1848" is an
adequatley unique identification string.  But it's a lousy GUID, for the
following reasons:
- There may be homonyms (even when you include authorship & year)
- There may be different ways of rendering the author name "LeConte" (e.g.,
"Le Conte", "Leconte", "leConte", etc.)
- Later research may reveal a change in the information (e.g., maybe it was
really published in 1849), which will lead to a temptation to change the
text identifier (e.g.) "Feronia_sodalis_LeConte_1848" to
"Feronia_sodalis_LeConte_1849".  This defeats one of the main properties of
a GUID (i.e., they don't ever change).

There are other reasons as well.  But my general feeling is that it's best
to rely on taxonomists to sort out the subtlties of taxonomy, and it's best
to rely on computer scientists to sort out the subtlties of identifiers that
make sense for computer-computer communication.  If I've learned anything
over the past 2+ decades as a taxonomist with one foot in the computer
database world, it is that, as a general rule, biologists are terrible
database developers (I include myself as Exhibit "A").


More information about the Taxacom mailing list