[Taxacom] Chameleons, GBIF, and the Red List

Richard Pyle deepreef at bishopmuseum.org
Sat Aug 23 13:06:56 CDT 2014


Hi Jason,

Yes -- absolutely!!!  I think that one of the fundamental problems is the difference between identifiers that are friendly to humans (e.g., InstitutionCode+CollectionCode+CatalogNumber), and those that are friendly to computers (i.e., actually ARE globally unique, persistent, and actionable).  Even when attempting to address the computer-friendly kind, we still desperately cling to human-friendly-ish identifiers (e.g., integers).  Sometimes the human-friendly-ish ones can work (e.g., DOIs); sometimes not (e.g., LSIDs).

Even if Museums were to get serious about proper (i.e., globally unique, persistent, actionable) identifiers on their specimens, it's likely that multiple schemes would be adopted (some might use DOIs; some LSIDs, some home-grown, some UUID coupled with multiple dereferencing services, some ARK, etc.)  Therefore, I think GBIF and iDigBio should get together and develop a universal scheme that aggregators can adopt, which proxies the hodge-podge source identifiers through a common set of tools.  The LOD/triple-store folks will moan ("UGHH!!  More sameAs inferencing!!!"), but the rest of us will e happy.

One thing that is clear, though:  after more than a decade of our community focusing intensively on identifiers for our data objects, we haven't gotten very far on this front (certainly not as far as we could/should have by now).

Yet, I remain optimistic.... (perhaps naively so)

Aloha,
Rich

> -----Original Message-----
> From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
> Of JF Mate
> Sent: Friday, August 22, 2014 1:44 PM
> To: Taxacom
> Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> 
> As a total ignoramus on the subject, couldn't we just track the records by
> assigning unique identifiers to the actual specimens in the same way as we
> already do with Genebank?
> 
> Jason
> On 23/08/2014 1:37 AM, "Stephen Thorpe" <stephen_thorpe at yahoo.co.nz>
> wrote:
> 
> > And what happens when an old name is now split into two or more taxa?
> > For example, "cryptic species". What value/status does data have which
> > is associated with pre-split concepts? A name, Aus bus, could refer
> > either to a species complex before a split. or to one and only one of
> > the cryptic species after the split. What if some modern workers reject the
> split?
> >
> > Stephen
> >
> > --------------------------------------------
> > On Sat, 23/8/14, Richard Pyle <deepreef at bishopmuseum.org> wrote:
> >
> >  Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> >  To: "'TAXACOM'" <taxacom at mailman.nhm.ku.edu>
> >  Received: Saturday, 23 August, 2014, 5:06 AM
> >
> >  I have not followed this
> >  thread closely, but it seems to me that the main problems  people
> > complain about regarding data harvested by  aggregators like GBIF fall
> > into two broad categories:
> >  1) The indicated geographic location is bad
> >  2) The indicated taxon is bad
> >
> >  Bad geography comes in two
> >  basic forms:
> >  a) The stated geographic place
> >  is not correct.  This could be due to bad original data or  bad
> > digitization, but there is generally no way to fix this  other than
> > fixing it at the source.
> >
> >  b) The stated geographic place is correct, but  the associated
> > lat/long coordinates are either missing or  wrong.  This one could be
> > improved through various  georeferencing algorithms and tools and/or
> > crowd-sourcing.
> >
> >  Bad
> >  taxonomy also comes in two basic forms:
> >  a)
> >  The organism was misidentified. Again, there is no real way  to fix
> > this other than to fix it at the source.  Sometimes  a reasonable
> > inference can be made by a good taxonomist, but  that always comes
> > with risks.
> >
> >  b) The name used to represent the organism was  "correct" in the
> > context in which the organism was  identified, but the name is not
> > consistent with  "modern" representations of "accepted"
> >  taxonomy.  There are many reasons for this, such as  abbreviated or
> > misspelled names, names that are objectively  unavailable via the
> > relevant Code (e.g., not validly  published), names that are now
> > widely regarded as  heterotypic synonyms of other names, names that
> > are  classified in a different genus from what modern taxonomists
> > follow, and text-strings that are really not representative  of
> > Linnean-style scientific names at all.
> >
> >  Of these various categories of
> >  problems, I suspect it's the last one that represents  the largest
> > portion of the "mess".  The good news  is that help is on the way.
> >
> >  If you've got some time, and have an
> >  interest in this sort of thing, grab a cup of coffee and  read on.
> > Otherwise, hit "delete" now.
> >
> >  -------------------
> >
> >  Still here?  Cool.
> >
> >  OK, so one of the prototype
> >  services Rob Whitton and I developed through NSF funding of  the
> > Global Names Architecture is a service we call  "real-time taxonomic
> > translation".  Basically,  this is a service that "translates" taxon
> > names  into the "modern" equivalent. The best way to  demonstrate the
> > power of this service is through a specific  example.
> >
> >  When Rob and I are
> >  wearing our fish-nerd hats instead of our database-nerd  hats, we are
> > collaborating with colleagues at NOAA to  develop a comprehensive
> > checklist of the fishes of the  Northwestern Hawaiian Islands that is
> > "evidence-based" (i.e., occurrence-based with  explicit evidence
> > supporting each occurrence).  When this  is published later this year
> > (or possibly early next year),  I think it will represent a very cool
> > model for how all  regional organism checklists should be done in the
> > future.
> >  But for this Taxacom post, I want to focus on just one small
> > component of it:  how real-time taxonomic translation  works.
> >
> >  So, the
> >  "evidence" behind the occurrences we are using to  develop this
> > checklist come from various sources: Museum  specimens, recorded
> > observations, photos and videos, and, of  course, historical
> > literature reports.  On the literature  reports, so far we have
> > captured 2,856 Occurrence records  based on reports in 24 publications
> > going back 114 years.
> >  If we only look at the raw taxon names as they appeared in  these 24
> > publications, we get a list of 675 distinct  scientific names.
> > Obviously, the prevailing taxonomy has  changed over these 114 years,
> > so many of those names are not  consistent with the "modern"
> > interpretation of the  relevant taxonomy.  It would take many hours of
> > time from  multiple experts to review all of those 675 names and
> > figure  out all the corrected spellings, etc.  However, using the
> > real-time taxonomic translation service Rob and I developed,  we can
> > convert these 675 historical names into the 506  "accepted" names as
> > we would use them today.  And  it does so in a few seconds (i.e., in
> > "real  time").
> >
> >  A short
> >  explanation of how it works is as follows:
> >
> >  All 2,856 literature-based
> >  occurrence records are tied to a "Taxon Name  Usage" (TNU) instance
> > (i.e., the usage of a taxon name  within a publication). These
> > represent how the original  publication recorded the name.  For
> > example, what we now  call Acanthurus triostegus had been variously
> > recorded in  these literature citations by the following names:
> >  Acanthurus triostegus (Linnaeus, 1758)  Hepatus triostegus (Linnaeus,
> > 1758)  Acanthurus triostegus sandvicensis Streets,
> >  1877
> >  Hepatus sandvicensis (Streets, 1877)
> >  Teuthis sandvicensis (Streets, 1877)
> >
> >  Similarly, what we now call
> >  Coris flavovittata has been recorded variously as:
> >  Coris flavovittata (Bennett, 1828)
> >  Coris lepomis Jenkins, 1901
> >  Julis eydouxii Valenciennes in Cuvier &  Valenciennes, 1839  Julis
> > flavovittata  Bennett, 1828
> >
> >  ...and so on
> >  for all the different names.
> >
> >  Every TNU is linked to what we call the  "Protonym" of the name.
> > This is essentially  equivalent to the botanical "basionym", but
> > essentially represents the original description of the name.
> >  Taking the second example above, there are three distinct  Protonyms
> > represented among the four names used for Coris
> >  flavovittata:
> >  flavovittata Bennett, 1828
> >  eydouxii Valenciennes in Cuvier &
> >  Valenciennes, 1839
> >  lepomis Jenkins, 1901
> >
> >  The taxonomic translation
> >  service is built around the "Meta-Authority"
> >  (Authority of Authorities) concept.  A Meta-Authority is  any
> > organization or individual who wants to assert an  "accepted"
> > taxonomy.  For example, ITIS, CoL,  WoRMS, etc. are all
> > Meta-Authorities, because they assert an  "accepted" usage for each
> > taxon name.  For our  checklist paper, we have established our own
> > Meta-Authority  (technically now recorded as the "Rob Whitton
> > Meta-Authority, but functionally it is the Bishop Museum
> > Meta-Authority). Each Meta-Authority has a specific scope of  interest
> > -- which might be very large (ITIS, CoL, WoRMS,  etc.), or might be
> > very small (e.g., a single family or  geographic region).
> >
> >  In any
> >  case, what a Meta-Authority does is, for each name within  the scope
> > of interest, it makes a statement along the lines  "For Protonym A,
> > I/We follow the Treatment of Reference  X"
> >
> >  In this case, The
> >  Rob Whitton/Bishop Museum Meta-Authority has made these
> >  assertions:
> >  - For the protonym
> >  "flavovittata Bennett, 1828", we follow the  treatment of Randall
> > 2007 [who treats it as a valid species  within the genus Coris].
> >  - For the protonym
> >  "eydouxii Valenciennes in Cuvier & Valenciennes,  1839", we follow
> > the treatment of Eschmeyer 2004 [who  treats it as a junior synonym of
> > flavovittata Bennett,  1828].
> >  - For the protonym "lepomis
> >  Jenkins, 1901", we follow the treatment of Eschmeyer
> >  2004 [who treats it as a junior synonym of flavovittata  Bennett,
> > 1828].
> >
> >  This is how
> >  we are able to collapse those messy 675 names spanning 114  years of
> > taxonomic history into the 506 names that we (the  experts of the
> > fishes of the Northwestern Hawaiian Islands)  regard as "accepted" in
> > a few seconds.
> >
> >  If anyone wants more details
> >  on how it works, I'd be happy to explain further.
> >
> >  The main limitations of this
> >  services are:
> >  1) It's limited to the
> >  names within the Global Names Usage Bank (GNUB; currently
> >  543,989 TNUs linked to 195,369 Protonyms); and
> >  2) There is currently only one Meta-Authority  implemented
> >
> >  We already have
> >  funding from NSF to address limitation #1, by developing a  workflow
> > to capture millions of protonyms and tens of  millions of TNUs through
> > integrating GNUB, GNI, BHL, and  multiple other taxonomic data
> > sources.  We also plan to  expand the Meta-Authority list to include
> > the  "big" ones (e.g., IT IS/CoL, WoRMS, NCBI), and  develop tools to
> > make it easy for any individual or  organization to create their own
> > personal Meta-Authority.
> >  And, we just submitted a proposal to NSF to (among other
> >  things) develop this real-time taxonomic translation service  into a
> > set of tools that can be very easily applied to any  list of taxon
> > names.
> >
> >  If we
> >  are successful, users of GBIF data will have the option of  selecting
> > any Meta-Authority they want (one of the big ones,  or their own), and
> > then be able to translate (in real time)  all the taxon names as they
> > appear in the GBIF dataset into  the "accepted" modern/clean
> > equivalent names  according to the selected Meta-Authority.  And the
> > Meta-Authorities aren't just for species-level names --  they also
> > provide full "accepted" classifications  all the way up to Kingdom.
> >
> >  Obviously this won't solve all the problems  with aggregated data,
> > but it will help solve a lot of it.
> >
> >  OK, enough for now....
> >
> >  Aloha,
> >  Rich
> >
> >
> >  Richard L.
> >  Pyle, PhD
> >  Database Coordinator for Natural
> >  Sciences
> >  Associate Zoologist in
> >  Ichthyology
> >  Dive Safety Officer
> >  Department of Natural Sciences, Bishop  Museum
> >  1525 Bernice St., Honolulu, HI
> >  96817
> >  Ph: (808)848-4115, Fax:
> >  (808)847-8252
> >  email: deepreef at bishopmuseum.org
> >  http://hbs.bishopmuseum.org/staff/pylerichard.html
> >
> >  Note: This disclaimer formally
> >  apologizes for the disclaimer below, over which I have no  control.
> >
> >
> >  _______________________________________________
> >  Taxacom Mailing List
> >  Taxacom at mailman.nhm.ku.edu
> >  http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> >  The Taxacom Archive back to 1992 may be
> >  searched at: http://taxacom.markmail.org
> >
> >  Celebrating 27 years of
> >  Taxacom in 2014.
> >
> > _______________________________________________
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> > The Taxacom Archive back to 1992 may be searched at:
> > http://taxacom.markmail.org
> >
> > Celebrating 27 years of Taxacom in 2014.
> >
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> 
> Celebrating 27 years of Taxacom in 2014.




More information about the Taxacom mailing list