[Taxacom] Chameleons, GBIF, and the Red List

Richard Pyle deepreef at bishopmuseum.org
Sun Aug 24 17:05:06 CDT 2014


What you describe is EXACTLY what GBIF and others in the biodiversity informatics world are hoping to achieve.

My previous post (reply to Stephen) covered these parts of the process:
1) Data exist in thousands of databases around the world
2) Aggregators like GBIF make our lives MUCH easier in helping us to discover those data
3) We, the experts of the world, spend hours "cleaning" data after GBIF has so helpfully allowed us to locate it.

What you're talking about is the next step:

4) After we, the experts of the world, have spent hours "cleaning" the data, how do we allow those efforts to propagate back to the sources, so that the NEXT person who encounters those records through GBIF can benefit from the toils of us experts?

There are two basic roadblocks to achieving this final step.

First, as has been made ABUNDANTLY clear in this thread, the data do NOT belong to GBIF.  They belong to the hundreds (thousands?) of institutions around the world that manage those thousands of databases.  Ultimately, those corrections have to find their way back to the source databases, so that GBIF can re-index them with the corrections included.  And believe me, GBIF and others have tried to do this EXTENSIVELY -- for many years.  A lot of the mechanisms are being developed (e.g., FilteredPush), but so far there has been slow adoption of those mechanisms by the thousands of source databases.  There are many reasons for this, but I suspect the main reason is that institutions are barely keeping up day-to-day activities with ever-shrinking budgets, and simply do not have time or IT expertise to implement the corrections to the datasets that they manage. Thus, because the source data remain "unclean", the aggregated data in GBIF remains unclean.

The second major roadblock is the lack of "proper" identifiers (globally unique, persistent, actionable) for these occurrence records.  The only way that corrections that you make in your downloaded copy of GBIF data is if you can report back on exactly which records need cleaning (along with the corrected information).  GBIF does assign its own locally unique identifier (integer), which could be used for this purpose -- but only for piping the data back to GBIF.  GBIF can relay the corrections back to the source databases, but that will only be helpful to the rest of us if the source incorporates the fixes.

There is actually a third roadblock, which has the potential to become a major roadblock, but we haven't bumped into it yet so much because we still can't get past the first two roadblocks. And that is, institutions will not automatically assume that every "correction" that is sent to them is actually "correct".  Managers of those data will in almost all cases want to review the changes to ensure that they are appropriate for updating in the source database. And this process, of course, requires time and resources that most institutions simply do not have.

There may be another solution, however, which is for GBIF to cache corrections submitted by people like you and other experts, such that these annotations/corrections can be made visible to all users of GBIF data; not just the source datasets.  Perhaps this feature already exists.  Perhaps the politics of implementing such a feature are too daunting to overcome.

But the bottom line is that we really do need to address this fourth step, so that we can more effectively benefit from the work of others, and (conversely), so that our own efforts will benefit more than just ourselves.


> -----Original Message-----
> From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
> Of Bob Mesibov
> Sent: Sunday, August 24, 2014 11:28 AM
> To: Donat Agosti
> Cc: TAXACOM; quentin groom
> Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> Donat Agosti wrote:
> "I feel, the discussion is too much centered on data that has not the
> information content needed, like studying a Landsat image at 30 meter
> resolution and discussing what tree species is shown"
> Excellent metaphor! For most scientific uses, you need much more data than
> is provided by any available database. Can you get everything you need
> online? No. Do existing aggregators like GBIF offer a helpful starting point?
> For some people and some uses, yes.
> But now the important question: when you have all the information you
> need, and clean it and enrich it, do you publish it online in a usable form? I
> don't know what Quentin Groom's project was about, nor do I know if he
> published his final data.
> In my own case, every one of my 12123 locality records for Australian
> Millipedes is freely available in CSV format (and in abbreviated form in KML)
> from the 'Millipedes of Australia' website. This store is larger and more up to
> date and contains fewer errors than any aggregator store, or even, the
> combined data providers' stores (because certain providers have been slow
> to add my edits to particular records, or to upload them to their own or
> aggregator stores).
> But if people like me and Quentin publish data freely to the Web and
> aggregators don't use this improved/extended data, aggregation looks less
> and less useful.
> --
> Dr Robert Mesibov
> Honorary Research Associate
> Queen Victoria Museum and Art Gallery, and School of Land and Food,
> University of Tasmania Home contact:
> PO Box 101, Penguin, Tasmania, Australia 7316
> (03) 64371195; 61 3 64371195
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> Celebrating 27 years of Taxacom in 2014.

More information about the Taxacom mailing list