[Taxacom] Chameleons, GBIF, and the Red List

Roderic Page Roderic.Page at glasgow.ac.uk
Sun Aug 24 18:38:17 CDT 2014


To pick up one thing Rich mentioned:


There may be another solution, however, which is for GBIF to cache corrections submitted by people like you and other experts, such that these annotations/corrections can be made visible to all users of GBIF data; not just the source datasets.  Perhaps this feature already exists.  Perhaps the politics of implementing such a feature are too daunting to overcome.


I think the idea of GBIF “caching corrections” is pretty much where we are heading, even if accidentally. Elsewhere I’ve discussed (and rejected) the idea of annotations as sticky notes on data bases http://iphylo.blogspot.co.uk/2014/04/more-on-annotating-biodiversity-data.html, and the “filtered push” model assumes, as Rich points out, that there’s somebody on the other end with the time to make use of all these annotations (see http://iphylo.blogspot.co.uk/2014/03/rethinking-annotating-biodiversity-data.html ).

Given that GBIF has duplicate specimen records already (e.g., occurrences that come directly from the original museum, and also via other projects such as FishBase, or from DNA barcoding projects), we can think of these different records as being “annotations” (e.g., this occurrence is what the museum says about this specimen, this other occurrence is what BOLD or GenBank say). So, we could cluster these and end up knowing that this museum specimen is the voucher for these DNA sequences.

If people have downloaded GBIF data, clean it, then send it back, we could simply cluster the new data with the old, and people can then see what has happened to the data when a user has scrutinised it.

Regards

Rod
---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ


On 24 Aug 2014, at 23:05, Richard Pyle <deepreef at bishopmuseum.org<mailto:deepreef at bishopmuseum.org>> wrote:

Bob,

What you describe is EXACTLY what GBIF and others in the biodiversity informatics world are hoping to achieve.

My previous post (reply to Stephen) covered these parts of the process:
1) Data exist in thousands of databases around the world
2) Aggregators like GBIF make our lives MUCH easier in helping us to discover those data
3) We, the experts of the world, spend hours "cleaning" data after GBIF has so helpfully allowed us to locate it.

What you're talking about is the next step:

4) After we, the experts of the world, have spent hours "cleaning" the data, how do we allow those efforts to propagate back to the sources, so that the NEXT person who encounters those records through GBIF can benefit from the toils of us experts?

There are two basic roadblocks to achieving this final step.

First, as has been made ABUNDANTLY clear in this thread, the data do NOT belong to GBIF.  They belong to the hundreds (thousands?) of institutions around the world that manage those thousands of databases.  Ultimately, those corrections have to find their way back to the source databases, so that GBIF can re-index them with the corrections included.  And believe me, GBIF and others have tried to do this EXTENSIVELY -- for many years.  A lot of the mechanisms are being developed (e.g., FilteredPush), but so far there has been slow adoption of those mechanisms by the thousands of source databases.  There are many reasons for this, but I suspect the main reason is that institutions are barely keeping up day-to-day activities with ever-shrinking budgets, and simply do not have time or IT expertise to implement the corrections to the datasets that they manage. Thus, because the source data remain "unclean", the aggregated data in GBIF remains unclean.

The second major roadblock is the lack of "proper" identifiers (globally unique, persistent, actionable) for these occurrence records.  The only way that corrections that you make in your downloaded copy of GBIF data is if you can report back on exactly which records need cleaning (along with the corrected information).  GBIF does assign its own locally unique identifier (integer), which could be used for this purpose -- but only for piping the data back to GBIF.  GBIF can relay the corrections back to the source databases, but that will only be helpful to the rest of us if the source incorporates the fixes.

There is actually a third roadblock, which has the potential to become a major roadblock, but we haven't bumped into it yet so much because we still can't get past the first two roadblocks. And that is, institutions will not automatically assume that every "correction" that is sent to them is actually "correct".  Managers of those data will in almost all cases want to review the changes to ensure that they are appropriate for updating in the source database. And this process, of course, requires time and resources that most institutions simply do not have.

There may be another solution, however, which is for GBIF to cache corrections submitted by people like you and other experts, such that these annotations/corrections can be made visible to all users of GBIF data; not just the source datasets.  Perhaps this feature already exists.  Perhaps the politics of implementing such a feature are too daunting to overcome.

But the bottom line is that we really do need to address this fourth step, so that we can more effectively benefit from the work of others, and (conversely), so that our own efforts will benefit more than just ourselves.

Aloha,
Rich


-----Original Message-----
From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
Of Bob Mesibov
Sent: Sunday, August 24, 2014 11:28 AM
To: Donat Agosti
Cc: TAXACOM; quentin groom
Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List

Donat Agosti wrote:

"I feel, the discussion is too much centered on data that has not the
information content needed, like studying a Landsat image at 30 meter
resolution and discussing what tree species is shown"

Excellent metaphor! For most scientific uses, you need much more data than
is provided by any available database. Can you get everything you need
online? No. Do existing aggregators like GBIF offer a helpful starting point?
For some people and some uses, yes.

But now the important question: when you have all the information you
need, and clean it and enrich it, do you publish it online in a usable form? I
don't know what Quentin Groom's project was about, nor do I know if he
published his final data.

In my own case, every one of my 12123 locality records for Australian
Millipedes is freely available in CSV format (and in abbreviated form in KML)
from the 'Millipedes of Australia' website. This store is larger and more up to
date and contains fewer errors than any aggregator store, or even, the
combined data providers' stores (because certain providers have been slow
to add my edits to particular records, or to upload them to their own or
aggregator stores).

But if people like me and Quentin publish data freely to the Web and
aggregators don't use this improved/extended data, aggregation looks less
and less useful.
--
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and School of Land and Food,
University of Tasmania Home contact:
PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu>
http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
The Taxacom Archive back to 1992 may be searched at:
http://taxacom.markmail.org

Celebrating 27 years of Taxacom in 2014.

_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu>
http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

Celebrating 27 years of Taxacom in 2014.




More information about the Taxacom mailing list