[Taxacom] Chameleons, GBIF, and the Red List

Michael Heads m.j.heads at gmail.com
Sun Aug 24 17:26:56 CDT 2014

Richard wrote:

As Donat has already explained, we're not talking about "GBIF data", we're
talking about data managed by hundreds (thousands?) of institutions in
thousands of databases around the planet.  GBIF is doing us the
(tremendously valuable) service of aggregating those data in on place, to
make it incredibly easy for us to locate it.

But it's not getting any data from many of the most important collections
in US, UK etc. or any from the largest country in the world, the one with
the largest expanse of forest etc. That data (in the collections of Moscow
etc) is incorporated in works such as 'Birds of Russia" and from there into
sites such as IUCN, but it's not in GBIF. So GBIF is not really a *global*
biodiversity information facility - in practice it doesn't supply reliable
information on global distributions, even in the best-known groups. IUCN is
much more useful.

On Mon, Aug 25, 2014 at 10:05 AM, Richard Pyle <deepreef at bishopmuseum.org>

> Bob,
> What you describe is EXACTLY what GBIF and others in the biodiversity
> informatics world are hoping to achieve.
> My previous post (reply to Stephen) covered these parts of the process:
> 1) Data exist in thousands of databases around the world
> 2) Aggregators like GBIF make our lives MUCH easier in helping us to
> discover those data
> 3) We, the experts of the world, spend hours "cleaning" data after GBIF
> has so helpfully allowed us to locate it.
> What you're talking about is the next step:
> 4) After we, the experts of the world, have spent hours "cleaning" the
> data, how do we allow those efforts to propagate back to the sources, so
> that the NEXT person who encounters those records through GBIF can benefit
> from the toils of us experts?
> There are two basic roadblocks to achieving this final step.
> First, as has been made ABUNDANTLY clear in this thread, the data do NOT
> belong to GBIF.  They belong to the hundreds (thousands?) of institutions
> around the world that manage those thousands of databases.  Ultimately,
> those corrections have to find their way back to the source databases, so
> that GBIF can re-index them with the corrections included.  And believe me,
> GBIF and others have tried to do this EXTENSIVELY -- for many years.  A lot
> of the mechanisms are being developed (e.g., FilteredPush), but so far
> there has been slow adoption of those mechanisms by the thousands of source
> databases.  There are many reasons for this, but I suspect the main reason
> is that institutions are barely keeping up day-to-day activities with
> ever-shrinking budgets, and simply do not have time or IT expertise to
> implement the corrections to the datasets that they manage. Thus, because
> the source data remain "unclean", the aggregated data in GBIF remains
> unclean.
> The second major roadblock is the lack of "proper" identifiers (globally
> unique, persistent, actionable) for these occurrence records.  The only way
> that corrections that you make in your downloaded copy of GBIF data is if
> you can report back on exactly which records need cleaning (along with the
> corrected information).  GBIF does assign its own locally unique identifier
> (integer), which could be used for this purpose -- but only for piping the
> data back to GBIF.  GBIF can relay the corrections back to the source
> databases, but that will only be helpful to the rest of us if the source
> incorporates the fixes.
> There is actually a third roadblock, which has the potential to become a
> major roadblock, but we haven't bumped into it yet so much because we still
> can't get past the first two roadblocks. And that is, institutions will not
> automatically assume that every "correction" that is sent to them is
> actually "correct".  Managers of those data will in almost all cases want
> to review the changes to ensure that they are appropriate for updating in
> the source database. And this process, of course, requires time and
> resources that most institutions simply do not have.
> There may be another solution, however, which is for GBIF to cache
> corrections submitted by people like you and other experts, such that these
> annotations/corrections can be made visible to all users of GBIF data; not
> just the source datasets.  Perhaps this feature already exists.  Perhaps
> the politics of implementing such a feature are too daunting to overcome.
> But the bottom line is that we really do need to address this fourth step,
> so that we can more effectively benefit from the work of others, and
> (conversely), so that our own efforts will benefit more than just ourselves.
> Aloha,
> Rich
> > -----Original Message-----
> > From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
> > Of Bob Mesibov
> > Sent: Sunday, August 24, 2014 11:28 AM
> > To: Donat Agosti
> > Cc: TAXACOM; quentin groom
> > Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> >
> > Donat Agosti wrote:
> >
> > "I feel, the discussion is too much centered on data that has not the
> > information content needed, like studying a Landsat image at 30 meter
> > resolution and discussing what tree species is shown"
> >
> > Excellent metaphor! For most scientific uses, you need much more data
> than
> > is provided by any available database. Can you get everything you need
> > online? No. Do existing aggregators like GBIF offer a helpful starting
> point?
> > For some people and some uses, yes.
> >
> > But now the important question: when you have all the information you
> > need, and clean it and enrich it, do you publish it online in a usable
> form? I
> > don't know what Quentin Groom's project was about, nor do I know if he
> > published his final data.
> >
> > In my own case, every one of my 12123 locality records for Australian
> > Millipedes is freely available in CSV format (and in abbreviated form in
> KML)
> > from the 'Millipedes of Australia' website. This store is larger and
> more up to
> > date and contains fewer errors than any aggregator store, or even, the
> > combined data providers' stores (because certain providers have been slow
> > to add my edits to particular records, or to upload them to their own or
> > aggregator stores).
> >
> > But if people like me and Quentin publish data freely to the Web and
> > aggregators don't use this improved/extended data, aggregation looks less
> > and less useful.
> > --
> > Dr Robert Mesibov
> > Honorary Research Associate
> > Queen Victoria Museum and Art Gallery, and School of Land and Food,
> > University of Tasmania Home contact:
> > PO Box 101, Penguin, Tasmania, Australia 7316
> > (03) 64371195; 61 3 64371195
> > _______________________________________________
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> > The Taxacom Archive back to 1992 may be searched at:
> > http://taxacom.markmail.org
> >
> > Celebrating 27 years of Taxacom in 2014.
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> Celebrating 27 years of Taxacom in 2014.

Dunedin, New Zealand.

My recent books:

*Molecular panbiogeography of the tropics.* 2012. University of California
Press, Berkeley. www.ucpress.edu/book.php?isbn=9780520271968

*Biogeography of Australasia:  A molecular analysis*. 2014. Cambridge
University Press, Cambridge. www.cambridge.org/9781107041028

More information about the Taxacom mailing list