[Taxacom] These discussions about GBIF

Bob Mesibov mesibov at southcom.com.au
Fri Aug 22 19:28:03 CDT 2014

...here and on Rod Page's iPhylo blog, aren't getting very far, as usual. In fact, the best summary of what I see as the key issues comes from Rod Page in his recent iPhylo comments, to wit:

"Part of the problem, I suspect, is that while aggregators may feel that a global database is by definition a good thing, it's not at all obvious to everyone else."

"We don't make it easy to get data directly into GBIF, which has a cumbersome, hierarchical data submission process, and no mechanism for data citation. We should be honking about ways to make submission of expert-curated data a no-brainer so that articles that use GBIF data do not end up becoming articles about the quality of that data."

"GBIF doesn't have an easy mechanism for people to directly contribute expert-curated data, nor does it provide a mechanism for making such data citable (and hence give contributors metrics on how the data they've contributed is being used). I think part of the problem with the "moral" argument for data sharing is that it also happens to benefit the aggregator that says "it's your duty to share". The benefits for those doing the sharing are less obvious, so it's in the interests of the aggregators to ensure there are real, tangible benefits to sharing."

So while on the one hand you have people like Stephen Thorpe and the chameleon folk saying GBIF is pretty useless (but see my Taxacom posts about Casual vs Skeptical Users), you have others in the biodiversity informatics community spruiking some new shiny API under development (see iPhylo) that'll cure the malaise and make everyone happy to fix everyone else's data. From the middle ground (e.g. GBIF director Donald Hobern) we get platitudes.

The core failure of the aggregators since the start of aggregation in the 1990s has been an unwillingness to understand why and how people look for biodiversity data. Digital compilations began much earlier as expert-driven projects for expert uses. The ease of using, checking and updating such compilations made them clearly superior to anything on paper. Compilations like these have since gone online as what I've been calling 'bottom-up' resources, and what Hobern calls 'expert-managed silos'. ('Silos' because users can't directly contribute.) Their numbers continue to increase. They're used by the same customers who looked for authoritative paper sources in the 1980s.

The aggregators' mistake was to try to scale this up to include all biodiversity and to offer single-portal Web interfaces for searching, querying and analysing all biodiversity data. Yes, it *could* be done, but for whom and for what purposes? Has any aggregator ever done any marketing research? It's a new product, its development costs millions and you just throw it into the marketplace and hope someone buys it, because *you* think it's a good idea? And you're disappointed that everyone isn't dropping whatever else they're doing to make it bigger and shinier?

The customers still want information about specifics: particular taxa and particular places. They want some assurance that the information is correct and up to date. Their best search strategy is to look on the Web to see what's available. If the choice is between an expert-driven project for expert uses whose builders can be directly contacted, and an aggregator with less data, lower data quality and (after what? 10 years?) no effective feedback to compilers — which is the better choice?

GBIF is just another silo. It's bigger than any other but its size has been achieved at the cost of data quality, and it's still a long, long way from complete and up to date. Tinkering around the edges with new APIs doesn't fix the core problem, any more than spending big bucks on advertising will sell a dud product. Expert-driven projects for expert uses from contactable experts will be around for as long as the Web makes their distribution cheap and easy. To work effectively (no platitudes, please) towards its daydream, GBIF needs to spend many more millions, and will still wind up with mostly unvetted data, just as EoL will wind up with mostly empty pages.

If half the money that GBIF has gobbled up could have gone to data providers to employ data curators or to develop data curation programs, GBIF would be a useful source for information not available from expert-managed silos. It didn't, and it won't. Why am I not surprised?
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Land and Food, University of Tasmania
Home contact:
PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195

More information about the Taxacom mailing list