[Taxacom] Chameleons, GBIF, and the Red List

Donat Agosti agosti at amnh.org
Sun Aug 24 06:52:00 CDT 2014

What is useful? What is an error.

I haven't seen neither a definition of what GBIF data ought to be no for what use cases it has been collected. Only then might it be possible to make the bold statements of whether the data is wrong or useful. Having this defined would then also allow to define, whether a research project is properly designed or not, that is makes assumptions regarding the data to be used that might be wrong. Thus, the problem is not on GBIF side but rather on the researchers side. 

But one might also ask whether the assumption "GBIF-data" is correct. GBIF is not a primary data provider, nor are often the underlying institutions in the sense that the data they provide have been collected with the same protocol, the same purpose, detail. There will thus never be "GBIF-data" but always a collection of data or different origins, purposes, etc.

GBIF provides a fantastic overview, the only one that exists, of what data is somewhere available.

It also offers a unique chance to criticize data that exists, because each data point (observation record) is put into a context and thus allows to scrutinize it.

But probably most importantly, studies such as the Chameleon conservation are very helpful for what data is needed. Obviously, but turning to GBIF data, the authors have not better data at hand, and are nevertheless very critical. This points out probably the really important point, what data is needed for successful conservation work. It thus should be used to guide future collection of data that are informed from such studies and lead to detailed meta data.

GBIF at the same time could categorize data based on their collecting protocol and other characteristics of data. Since this is in  most cases not available, we as a community should rather look into the future how we can collect such data. The 500,000,000 records might be a hint, that is bird observation data from ongoing surveys. There is another huge potential pool, that is specimen images put on Flickr, iNature with adequate metadata (GPS record, time stamp, scientific or vernacular name that) and the possibility to re-identifiy.

I feel, the discussion is too much centered on data that has not the information content needed, like studying a Landsat image at 30 meter resolution and discussing what tree species is shown: We now there are now used to see looking at high Resolution remote sensing images. We better make sure, our observation records are also high resolution.


-----Original Message-----
From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Bob Mesibov
Sent: Sunday, August 24, 2014 11:22 AM
Cc: quentin groom
Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List

For some reason, Quentin Groom's posts to TAXACOM don't seem to be showing up. Here is his latest post:

Date: Sun, 24 Aug 2014 09:19:59 +0100
From: "quentin groom" <quentin at br.fgov.be>
To: Bob Mesibov <mesibov at southcom.com.au>
Cc: Roderic Page <Roderic.Page at glasgow.ac.uk>,TAXACOM <taxacom at mailman.nhm.ku.edu>
Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List

Hi Bob

I don't know anything about chameleons, but for a recent vascular plant project I was working on about 60% of the data from GBIF was useful. I have not looked into the case of chameleons, but I'm sure someone could explain why these data are so bad and I expect many of the problems will not be general difficulties with museum data, but specific problems of chameleons.
> "GBIF is just raw data."
> Misleading. GBIF contains abbreviated/truncated data. In my own 
> published GBIF audit, 45% of the records lacked the locality text from 
> the provider's database. Other fields are missing from other 
> providers' databases. GBIF is also seriously incomplete, because not 
> all potential providers have uploaded data, existing providers haven't 
> uploaded all their data, and updating is slow.

Yes, this shows a lack of commitment by the providers to give good quality data, nevertheless, what they give us is still data. You should really be complaining to your provider to get your data updated on GBIF.
> "If I use data from GBIF I am sceptical and some data is more reliable 
> than others."
> Good, you're a Skeptical User. In that sentence you could have said 
> 'from [any source]'.
> "What is the alternative? Should I go to all the different providers 
> individually and collect the same data individually. It would be an 
> impossible task, even to discover who had what."
> It has never been easy to discover who had what, and it still isn't, 
> because GBIF is incomplete. I can't speak for every taxonomist and 
> conservation biologist, but I do my taxonomy without attempting to 
> track every possible specimen in every possible repository around the 
> world. I find the largest relevant collections and the key specimens 
> from the literature, and start from there. 'Start' means contacting 
> the collections, not relying on what GBIF tells me is registered.

In the project I mentioned above I did exactly what you are suggesting. It took me a few minutes to get the data from GBIF providers and then a few days to clean that up. It then took me two years to get the other 50% of the data from various herbaria and the literature, which all needed digitizing and georeferencing. As I'm not a polyglot nor well travelled I'm sure I did a worse job of this than a local could have done. What you are suggesting would be excessively expensive for all bus a few rare species.
> If the goal is gathering up specimens or records, GBIF offers one of 
> several possible approaches, *depending on your subject matter*. The 
> alternatives for many taxa are expert-compiled and -vetted online 
> resources. Evidently your subject matter isn't covered by those 
> alternatives. As I said earlier, those alternatives offer better data, 
> more of it, contactable compilers and (usually) better updating.

Such things just don't exist except for a tiny fraction of rare species.

> "Progress of GBIF is slow, because of the massive political 
> challenges, the tiny budget and the inaction of providers, but we 
> would be significantly impoverished without it."
> We're back to market research. Who is the "we" in that sentence, and 
> what does "significantly" mean for the purposes that "we" use it?
> You personally find GBIF very useful. I don't, and neither do the 
> Europeans I mentioned in my first post, nor the chameleon investigators.
> We would be significantly *enriched* if either (a) there was more 
> support for expert-managed, taxon- or area-specific online resources,  
> or (b) there was more support for better data curation at provider 
> level. The marginal benefit of feeding either (a) or (b) would be 
> greater than feeding the same resources into GBIF. As for GBIF's 'tiny 
> budget', it's E4+ million per year, of which ca 1/3 goes to

If that 4 million was dispersed to all the museums and herbaria it wouldn't be enough to curate more than a handful of specimens. However, using GBIF you can quickly spotted errors in your data and correct them, just as you have done.
I expect the large costs for management comes from the difficulties of getting countries to open up their data. I'm sure they would prefer to spend their time creating useful tools for science rather than lobbying.
All the best
Dr. Quentin Groom
(Botany and Information Technology)
Botanic Garden, Meise
Domein van Bouchout
B-1860 Meise
ORCID: 0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45
E-mail:     quentin.groom at br.fgov.be
Skype name: qgroom
Website:    www.botanicgarden.be
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and School of Land and Food, University of Tasmania Home contact:
PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

Celebrating 27 years of Taxacom in 2014.

More information about the Taxacom mailing list