[Taxacom] Data quality in aggregated datasets

Robert Mesibov mesibov at southcom.com.au
Fri Apr 19 20:23:38 CDT 2013


I couldn't agree more with your summary. The main point I tried to make in the paper is that an alternative to (interested third party) > (data provider) > aggregator 
is aggregator > (data provider) > aggregator, and despite much conferencing, workshopping and spending of money over the years, the aggregators are not yet prepared to deal directly with the data quality issue. As you indicate, the problem isn't primarily a technical one, it's a policy and funding and willingness one.

It may be unfair, but I see the situation this way: On the one hand, the aggregators sell themselves to funders and the user world by saying 'Look how much data we have! Just think how useful all that information could be!' They persuade data providers to hand over data with 'We've got a great interface for presenting your data! And you can retain data rights!'

An additional selling point would be 'We collaborate with data providers in checking and upgrading data! These are demonstrably the best data you can find!' But although the aggregators know perfectly well that the data they receive needs cleaning, they apparently feel that they have nothing to gain by helping the data providers do the cleaning, or doing it themselves with data provider help. The aggregators have already made their sales pitch, and they've bought/hired their servers, hired their programmers and staff, and issued their disclaimers: 'We take no responsibility whatsoever for data quality or fitness for use.' Why do more?
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195

More information about the Taxacom mailing list