[Taxacom] Data quality in aggregated datasets

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Fri Apr 19 20:48:27 CDT 2013

Again, Bob: VERIFIABILITY is the key! An error isn't a problem if it is noticed, it is only a problem if it goes unnoticed! So, allow the user to detect the errors, and then find some way to fix them, but the first step is most important! For the purposes of this step, the data provider can count as a user, but sometimes errors are better detected by people other than those who made the errors! For one example, we have the phenomenon of people reading what they intended to write, isnstead of what they actually did write!

From: Robert Mesibov <mesibov at southcom.com.au>
To: Doug Yanega <dyanega at ucr.edu> 
Cc: TAXACOM <taxacom at mailman.nhm.ku.edu> 
Sent: Saturday, 20 April 2013 1:23 PM
Subject: Re: [Taxacom] Data quality in aggregated datasets


I couldn't agree more with your summary. The main point I tried to make in the paper is that an alternative to (interested third party) > (data provider) > aggregator 
is aggregator > (data provider) > aggregator, and despite much conferencing, workshopping and spending of money over the years, the aggregators are not yet prepared to deal directly with the data quality issue. As you indicate, the problem isn't primarily a technical one, it's a policy and funding and willingness one.

It may be unfair, but I see the situation this way: On the one hand, the aggregators sell themselves to funders and the user world by saying 'Look how much data we have! Just think how useful all that information could be!' They persuade data providers to hand over data with 'We've got a great interface for presenting your data! And you can retain data rights!'

An additional selling point would be 'We collaborate with data providers in checking and upgrading data! These are demonstrably the best data you can find!' But although the aggregators know perfectly well that the data they receive needs cleaning, they apparently feel that they have nothing to gain by helping the data providers do the cleaning, or doing it themselves with data provider help. The aggregators have already made their sales pitch, and they've bought/hired their servers, hired their programmers and staff, and issued their disclaimers: 'We take no responsibility whatsoever for data quality or fitness for use.' Why do more?
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom Archive back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org/

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

Celebrating 26 years of Taxacom in 2013.

More information about the Taxacom mailing list