[Taxacom] Chameleons, GBIF, and the Red List

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Sun Aug 24 17:29:09 CDT 2014

I agree with Rich that data cleaning would greatly enhance the value of GBIF, but I see a huge "political" roadblock here. The awkward dilemma for GBIF is to facilitate data cleaning while at the same time not publicly admitting "imperfection", for, like it or not, they are already "selling themselves" as a reliable source of data. Taxonomists like Rich won't be suckered in by this, but many 'crats working in local and national government (e.g. biosecurity agencies, etc.) will be so suckered, and quite possibly GBIF funding is driven by their needs/wants.


On Mon, 25/8/14, Richard Pyle <deepreef at bishopmuseum.org> wrote:

 Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
 To: "'Bob Mesibov'" <mesibov at southcom.com.au>, "'Donat Agosti'" <agosti at amnh.org>
 Cc: "'TAXACOM'" <taxacom at mailman.nhm.ku.edu>, "'quentin groom'" <quentin at br.fgov.be>
 Received: Monday, 25 August, 2014, 10:05 AM
 What you describe is EXACTLY what GBIF and
 others in the biodiversity informatics world are hoping to
 My previous post
 (reply to Stephen) covered these parts of the process:
 1) Data exist in thousands of databases around
 the world
 2) Aggregators like GBIF make our
 lives MUCH easier in helping us to discover those data
 3) We, the experts of the world, spend hours
 "cleaning" data after GBIF has so helpfully
 allowed us to locate it.
 What you're talking about is the next
 4) After we, the
 experts of the world, have spent hours "cleaning"
 the data, how do we allow those efforts to propagate back to
 the sources, so that the NEXT person who encounters those
 records through GBIF can benefit from the toils of us
 There are two
 basic roadblocks to achieving this final step.
 First, as has been made
 ABUNDANTLY clear in this thread, the data do NOT belong to
 GBIF.  They belong to the hundreds (thousands?) of
 institutions around the world that manage those thousands of
 databases.  Ultimately, those corrections have to find
 their way back to the source databases, so that GBIF can
 re-index them with the corrections included.  And believe
 me, GBIF and others have tried to do this EXTENSIVELY -- for
 many years.  A lot of the mechanisms are being developed
 (e.g., FilteredPush), but so far there has been slow
 adoption of those mechanisms by the thousands of source
 databases.  There are many reasons for this, but I suspect
 the main reason is that institutions are barely keeping up
 day-to-day activities with ever-shrinking budgets, and
 simply do not have time or IT expertise to implement the
 corrections to the datasets that they manage. Thus, because
 the source data remain "unclean", the aggregated
 data in GBIF remains unclean.
 The second major roadblock is the lack of
 "proper" identifiers (globally unique, persistent,
 actionable) for these occurrence records.  The only way
 that corrections that you make in your downloaded copy of
 GBIF data is if you can report back on exactly which records
 need cleaning (along with the corrected information).  GBIF
 does assign its own locally unique identifier (integer),
 which could be used for this purpose -- but only for piping
 the data back to GBIF.  GBIF can relay the corrections back
 to the source databases, but that will only be helpful to
 the rest of us if the source incorporates the fixes.
 There is actually a third
 roadblock, which has the potential to become a major
 roadblock, but we haven't bumped into it yet so much
 because we still can't get past the first two
 roadblocks. And that is, institutions will not automatically
 assume that every "correction" that is sent to
 them is actually "correct".  Managers of those
 data will in almost all cases want to review the changes to
 ensure that they are appropriate for updating in the source
 database. And this process, of course, requires time and
 resources that most institutions simply do not have.
 There may be another solution,
 however, which is for GBIF to cache corrections submitted by
 people like you and other experts, such that these
 annotations/corrections can be made visible to all users of
 GBIF data; not just the source datasets.  Perhaps this
 feature already exists.  Perhaps the politics of
 implementing such a feature are too daunting to overcome.
 But the bottom line is that we
 really do need to address this fourth step, so that we can
 more effectively benefit from the work of others, and
 (conversely), so that our own efforts will benefit more than
 just ourselves.
 > -----Original Message-----
 > From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu]
 On Behalf
 > Of Bob Mesibov
 > Sent: Sunday, August 24, 2014 11:28 AM
 > To: Donat Agosti
 > Cc:
 TAXACOM; quentin groom
 > Subject: Re:
 [Taxacom] Chameleons, GBIF, and the Red List
 > Donat Agosti
 > "I
 feel, the discussion is too much centered on data that has
 not the
 > information content needed,
 like studying a Landsat image at 30 meter
 > resolution and discussing what tree
 species is shown"
 > Excellent metaphor! For most scientific
 uses, you need much more data than
 > is
 provided by any available database. Can you get everything
 you need
 > online? No. Do existing
 aggregators like GBIF offer a helpful starting point?
 > For some people and some uses, yes.
 > But now the
 important question: when you have all the information you
 > need, and clean it and enrich it, do you
 publish it online in a usable form? I
 don't know what Quentin Groom's project was about,
 nor do I know if he
 > published his final
 > In my own
 case, every one of my 12123 locality records for
 > Millipedes is freely
 available in CSV format (and in abbreviated form in KML)
 > from the 'Millipedes of Australia'
 website. This store is larger and more up to
 > date and contains fewer errors than any
 aggregator store, or even, the
 > combined
 data providers' stores (because certain providers have
 been slow
 > to add my edits to particular
 records, or to upload them to their own or
 > aggregator stores).
 > But if people like me and Quentin
 publish data freely to the Web and
 aggregators don't use this improved/extended data,
 aggregation looks less
 > and less
 > --
 > Dr
 Robert Mesibov
 > Honorary Research
 > Queen Victoria Museum and Art
 Gallery, and School of Land and Food,
 University of Tasmania Home contact:
 > PO
 Box 101, Penguin, Tasmania, Australia 7316
 > (03) 64371195; 61 3 64371195
 > Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 > http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 > The Taxacom Archive back to 1992 may be
 searched at:
 > http://taxacom.markmail.org
 > Celebrating 27 years
 of Taxacom in 2014.
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org
 Celebrating 27 years of
 Taxacom in 2014.

More information about the Taxacom mailing list