[Taxacom] Data quality of aggregated datasets

mesibov at southcom.com.au mesibov at southcom.com.au
Thu May 2 23:10:45 CDT 2013


It's a nice visualisation, but if it leads people to think 'Aw, heck, most
of the records are more or less in the right place, what's the diff?',
then they've missed the point (and would be surprised at the sharpness of
millipede range boundaries). But I don't think 15%-off-by-at-least-5-km is
good enough.

The point of my paper, which Rod has noted on his iPhylo blog, is that
aggregator-published errors need fixing, and there isn't a working
mechanism to do that. All the displacements except those from provider G
(see my paper) are fixed now at source (data provider) because I contacted
the sources with corrections and queries, and the sources collaborated
with me. That's a fix from an interested outsider, and neither GBIF nor
ALA were involved. The correct data are in the sources' databases and on
my Millipedes of Australia website. When they get to GBIF and ALA is
anybody's guess.

On Taxacom, at least, GBIF and ALA are holding firm to the views that (a)
they aren't responsible for errors they perpetuate by publishing on the
Web, and (b) error detection and fixing needs to be done by Somebody Else
for their benefit as data publishers.

The idea that an aggregator can check and upgrade/correct the data it
publishes by collaborating directly with the source is evidently
incomprehensible to aggregator management. No queries go back to the
sources, no data-cleaning protocols are insisted upon by aggregators
before the sources upload data, and the only error-detection and -fixing
mechanism in sight is vague hand-waving by aggregators about the whole
biodiversity community sharing the responsibility for getting things
right. Also not good enough.

(Posted from the New South Wales bush, where I've been collecting
millipedes, of course.)





More information about the Taxacom mailing list