[Taxacom] Data quality of aggregated datasets

Poly, William WPoly at calacademy.org
Fri May 3 07:28:31 CDT 2013

And as error-ridden data are disseminated, used in analyses, and published, it will be more difficult to 
correct and purge the errors and conclusions based on them.

From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of mesibov at southcom.com.au [mesibov at southcom.com.au]
Sent: Friday, May 03, 2013 12:10 AM
Subject: Re: [Taxacom] Data quality of aggregated datasets

It's a nice visualisation, but if it leads people to think 'Aw, heck, most
of the records are more or less in the right place, what's the diff?',
then they've missed the point (and would be surprised at the sharpness of
millipede range boundaries). But I don't think 15%-off-by-at-least-5-km is
good enough.

The point of my paper, which Rod has noted on his iPhylo blog, is that
aggregator-published errors need fixing, and there isn't a working
mechanism to do that. All the displacements except those from provider G
(see my paper) are fixed now at source (data provider) because I contacted
the sources with corrections and queries, and the sources collaborated
with me. That's a fix from an interested outsider, and neither GBIF nor
ALA were involved. The correct data are in the sources' databases and on
my Millipedes of Australia website. When they get to GBIF and ALA is
anybody's guess.

On Taxacom, at least, GBIF and ALA are holding firm to the views that (a)
they aren't responsible for errors they perpetuate by publishing on the
Web, and (b) error detection and fixing needs to be done by Somebody Else
for their benefit as data publishers.

The idea that an aggregator can check and upgrade/correct the data it
publishes by collaborating directly with the source is evidently
incomprehensible to aggregator management. No queries go back to the
sources, no data-cleaning protocols are insisted upon by aggregators
before the sources upload data, and the only error-detection and -fixing
mechanism in sight is vague hand-waving by aggregators about the whole
biodiversity community sharing the responsibility for getting things
right. Also not good enough.

(Posted from the New South Wales bush, where I've been collecting
millipedes, of course.)

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom Archive back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

Celebrating 26 years of Taxacom in 2013.

More information about the Taxacom mailing list