[Taxacom] Data quality in aggregated datasets
stephen_thorpe at yahoo.co.nz
Sun Apr 21 18:42:43 CDT 2013
I think it needs to be pointed out that we need BOTH: (1) (interested third party) > (data provider) > aggregator; AND (2) aggregator > (data provider) > aggregator; and ALSO (interested third party) > aggregator > (data provider) > aggregator! Only with all of these mechanisms do we have a hope of clean data...
From: Robert Mesibov <mesibov at southcom.com.au>
To: Dean Pentcheff <pentcheff at gmail.com>
Cc: "taxacom at mailman.nhm.ku.edu" <taxacom at mailman.nhm.ku.edu>
Sent: Monday, 22 April 2013 11:33 AM
Subject: Re: [Taxacom] Data quality in aggregated datasets
Dean Pentcheff wrote:
There is much more than silence, but making a working system takes both an initial effort and changes in the way provider systems work. It will take time to take effect."
My question was "What mechanisms are there for detecting and fixing errors besides (interested third party) > (data provider) > aggregator?". Isn't FilteredPush a project to streamline just that mechanism, with cooperating data providers?
The 'silence' I referred to is coming from the aggregators, who seem committed to ignoring this kind of error-fixing mechanism: aggregator > (data provider) > aggregator.
GBIF likes to emphasise that it is only a facilitator. It doesn't own the data it publishes, it merely provides a place for data holders to 'expose' what they have. It is resolutely ignoring the opportunity provided in this for an outside party (GBIF or a GBIF-contracted service) to do some basic record-checking, then collaborate with the data holder to make corrections or add 'Queried' flags. There would be benefits in this for all interested parties: the data holders, GBIF as publisher, and end-users. This isn't happening.
GBIF has been going for how many years? And has finally gotten around to talking about offering advice to participants about data quality: http://community.gbif.org/pg/groups/21292/biodiversity-data-quality-interest-group/
As I suggested in an earlier post and in my ZooKeys paper, the barriers to data-checking at aggregator level aren't technical. Call them 'policy' or 'attitudinal' barriers, they're not unlike Person A being reluctant to tell Person B that they've made a mistake, because A wants to remain friends with B and doesn't want to upset B, and anyway, what's a little mistake?
The analogy fails because the aggregators (A) are multi-million dollar organisations hoping to service a global community, and dealing with multi-million dollar organisations (B) whose 'mission statements' probably talk about a commitment to 'continuous improvement'.
Note: I say all this (and I published the ZooKeys paper) without much hope of seeing reform. As explained in the paper, I've created an alternative for my little basket of the world's species occurrence records, and unlike the aggregators, I write directly to data providers with messages like 'Could you please check [records]? It looks like there are errors in the lat/lon's, which probably should be [X]. Many thanks.'
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
The Taxacom Archive back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org/
(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
Celebrating 26 years of Taxacom in 2013.
More information about the Taxacom