[Taxacom] Data quality in aggregated datasets

Alastair Culham a.culham at reading.ac.uk
Sat Apr 20 02:10:42 CDT 2013

A group of us reviewed GBIF data quality many years back - http://www.plosone.org/article/info:doi/10.1371/journal.pone.0001124

Some of those issues have been addressed and some remain.

Aggregated data have problems resulting from aggregation but also from variable quality source data that are common in large datasets.

Dr Alastair Culham
Centre for Plant Diversity and Systematics
Harborne Building, School of Biological Sciences
University of Reading, Whiteknights, Reading, RG6 6AS

Curator, Reading University Herbarium (RNG)
Associate Editor, Botanical Journal of the Linnean Society
Programme Director, MSc Plant Diversity
i4Life Coordinator

From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] on behalf of Robert Mesibov [mesibov at southcom.com.au]
Sent: 19 April 2013 23:04
Subject: [Taxacom] Data quality in aggregated datasets

There have been occasional grumblings here on Taxacom about data quality in the aggregator world, e.g. in GBIF, but what would happen if you methodically audited a sample of aggregated species occurrence records? What sorts of errors would you find? Would they be rare? Frequent?

I've done an audit of this kind for Australian millipede records in GBIF and the Atlas of Living Australia (ALA) and published the results in ZooKeys: http://www.pensoft.net/journals/zookeys/article/5111/a-specialist

The audit results can't be generalised to all taxa and all parts of the world, but they're pretty disappointing. GBIF and ALA, however, disclaim all responsibility for data problems. If there's an error, it's the fault of the data provider. So how do errors in online databases get discovered and fixed?

In this particular case, an interested third party (me) finds problems and alerts the data provider directly. The data provider fixes the errors and in the fullness of time sends corrected records to the aggregator. (Although I found evidence that erroneous records can persist through an update.)

What about aggregated datasets in general? What mechanisms are there for detecting and fixing errors besides (interested third party) > (data provider) > aggregator?

[Long silence.]
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom Archive back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

Celebrating 26 years of Taxacom in 2013.

More information about the Taxacom mailing list