[Taxacom] Chameleons, GBIF, and the Red List

David Campbell pleuronaia at gmail.com
Thu Aug 21 17:04:07 CDT 2014

Many databases are valuable tools and compilations (though many appear to
merely copy data from other databases, making it harder to get rid of
errors while adding no new information; however, that impression is biased
by the fact that I am looking up obscure taxa).  However, most of them
contain very high levels of errors and omissions.  Thus, it requires a
reasonable level of taxonomic expertise to assess whether the database
you're using is reliable and what parts of the data are more or less
trustworthy (if the database gives enough evidence to tell).

However, many people/agencies apparently regard the databases as quicker
and cheaper substitutes for asking someone with appropriate taxonomic
expertise.  Unfortunately, databases make it extremely easy to generate
lots of data without really knowing what you are doing or whether any of it
is valid.

This ties into the often-bemoaned (among taxonomists, at any rate) problem
of the lack of employment for taxonomists.  Taxonomic expertise is often
not seen as necessary for the generation of the data nor the analysis of
the data.  In fact you can generate and analyze data without such
expertise as long as quality and accuracy are not major concerns.  But if
we want to have bioinformatics and not merely biomisinformatics, then it
will be necessary at some point for someone to actually fund the task of
having qualified people assess the data.

Better data capture would help a good deal.  I have no connection with the
relevant programming and thus do not appreciate the challenges, but it
seems as though optical character recognition needs a lot more work.  Part
of the problem, both for character recognition and data entry, is that
computers, along with far too many of those tasked with entering collection
data have no clue what the words mean.  If the items are perceived as
arbitrary combinations of letters (e.g., scientific names, localities,
authors, etc. unfamiliar to the person), then misinterpretations or errors
will seem like just another bunch of letters.

On Thu, Aug 21, 2014 at 2:52 PM, James Macklin <james.macklin at gmail.com>

> Hi Rod,
> Sorry, a little slow... I also think it is important to stress the data
> quality life cycle here. What we still as yet do not do well is connect the
> expert work done on these specimens or their digital derivatives (or
> observations, I guess), which are not done by the source/owner, back to
> them so the source/owner can clean/update the record and provide it to GBIF
> and/or other aggregators. The literature is one path where there is
> reference to the specimens used but as we know not everything ends up
> published this way. Further, extracting the information from the literature
> can be challenging even today. Lyubomir and Pensoft make this easy
> (thanks!) but we are still a long way from convincing other publishers to
> include the specimen data in a readily accessible form (or even mandating
> its presence as evidence). Another way to get expert knowledge back to the
> source is through annotation. Those of you who know me realize that my
> colleagues and I have spent a fair bit of time studying this problem and
> coming up with solutions (FilteredPush). I would say that in general there
> are now reasonable solutions for achieving distributed annotation at
> various levels of complexity but there is still a challenge/bottleneck in
> pushing these annotations back to the source and into their collection
> management system. The bottleneck is potentially at the source that must
> process the annotations. If we automate (or even semi-auto) the annotation
> process through curation workflows, something my colleagues and I are now
> focusing on, we could potentially flood the "curators" of the
> specimens/data. Then the question becomes how much the owners are committed
> to processing potentially valuable modifications/additions and adding them
> to their database. Certainly data curation and positions to support it are
> in their infancy. The annotations that are not processed by the source
> still have value and can inform the aggregators but have to be dealt with
> in a slightly different manner. So, this returns to the issue of when GBIF
> takes in a record update (or a new record), what metadata follows it to say
> it has been changed (created) based on some form of expertise...
> I think we also need to be careful of the use of the term "expert."  I
> think it is reasonable to assume that a taxonomist is not going to be any
> better at georeferencing a specimen based on the collecting event data
>  (assuming
> this person was not associated with the collecting event) than a
> geographer, historian or even a citizen that happens to live near where the
> event took place. So, in the case of the Chameleon paper, and others like
> it, the issue really relates to taxonomic expertise and thus the name that
> appears associated with the record and not the entire record necessarily.
> Papers like the Chameleon are quick to judge the end product but do not
> take into consideration what an achievement it is to simply have a GBIF
> resource and the challenges the greater "we" have overcome just to get this
> far! Let's stop highlighting the problem yet again and get to work on
> solving it and making the GBIF resource more valuable to all ;-)
> Best,  JAmes
> James Macklin, Ph.D.
> Research Scientist
> Botany and Biodiversity Informatics
> Associate Curator of the AAFC National Vascular Plant Collection (DAO)
> Agriculture and Agri-Food Canada
> Ottawa, Ontario, Canada
> On Thu, Aug 21, 2014 at 6:20 AM, Roderic Page <Roderic.Page at glasgow.ac.uk>
> wrote:
> > Just to follow up on this discussion:
> >
> > Stephen, I think I often come across as grumpy, but your cynicism makes
> me
> > look like a fanboy, so thank you for that ;) Can we maybe assume that
> > GBIF’s primary goal isn’t to keep bureaucrats happy, that it’s genuinely
> > trying to provide access to basic biodiversity information in one place
> > because that seems like a worthwhile goal - leaving aside whether GBIF is
> > the best way to tackle that goal.
> >
> > Bob, if I understand your argument correctly, it’s that access to mostly
> > unveiled biodiversity data isn’t much use, and in your view that’s mostly
> > what GBIF is serving up. Assuming that it would be nice to have access to
> > good-quality distributional data in one place, what if GBIF provided,
> say,
> > distributions of species that had been cleaned and had some degree of
> > expert scrutiny. In other words, say a researcher publishes an
> > evidence-based distribution map, what if that was stored on GBIF in a
> > citable form (e.g., had a DOI), and others could download that
> distribution
> > and make use of it?
> >
> > I guess this was the thinking behind the now abandoned SDR project (see
> > https://code.google.com/p/gbif-sdr/wiki/PortalIntegration ), and is
> > perhaps where the Map of Life http://mol.org is headed (although at the
> > moment it’s simply showing you a bunch of distributions from different
> > sources).
> >
> > Lyubo, I couldn’t agree more, having links to literature related to a
> > record would be great. Many of our online biodiversity databases are
> devoid
> > of links to the evidence for a particular assertion, but as more and more
> > literature comes online we can do something to fix that. +1 for
> extracting
> > from the literature, especially if we can automate this at scale
> (although
> > that will give Bob nightmares).
> >
> > Regards
> >
> > Rod
> >
> > ---------------------------------------------------------
> > Roderic Page
> > Professor of Taxonomy
> > Institute of Biodiversity, Animal Health and Comparative Medicine
> > College of Medical, Veterinary and Life Sciences
> > Graham Kerr Building
> > University of Glasgow
> > Glasgow G12 8QQ, UK
> >
> > Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
> > Tel:  +44 141 330 4778
> > Skype:  rdmpage
> > Facebook:  http://www.facebook.com/rdmpage
> > LinkedIn:  http://uk.linkedin.com/in/rdmpage
> > Twitter:  http://twitter.com/rdmpage
> > Blog:  http://iphylo.blogspot.com
> > ORCID:  http://orcid.org/0000-0002-7101-9767
> > Citations:
> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> >
> > _______________________________________________
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> > The Taxacom Archive back to 1992 may be searched at:
> > http://taxacom.markmail.org
> >
> > Celebrating 27 years of Taxacom in 2014.
> >
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> Celebrating 27 years of Taxacom in 2014.

Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Box 7270
Gardner-Webb University
Boiling Springs NC 28017

More information about the Taxacom mailing list