[Taxacom] Data quality of aggregated datasets

David Campbell pleuronaia at gmail.com
Fri May 3 12:38:57 CDT 2013


If several kilometers error in a range doesn't seem like much (and such
distances are quite significant for snails as well as for millipedes), how
about the several hundred million year errors I noticed in a published
paper that took the "what's the oldest date in the Paleobiology Database
listed for this higher taxon" approach to calibrating their molecular
clock?  These problems resulted from at least four causes:
Homonym
About 90 year old data with an unduly broad interpretation of an extant
higher taxon
Not knowing that early fossils related to one of the extant higher taxon of
interest are assigned to a different, paraphyletic higher taxon
Generally poor data for the class in question in that database-no one has
taken on that group.

Without evidence as to the quality of the data, there's no reason to trust
the biodiversity databases nor results based on them.  Ironically, by
failing to support the work of expert taxonomists to check the data, the
system has produced a situation where only expert taxonomists can make much
use of the databases, because they have the knowledge to judge what's
reasonable and what's not.



On Fri, May 3, 2013 at 8:28 AM, Poly, William <WPoly at calacademy.org> wrote:

>
> And as error-ridden data are disseminated, used in analyses, and
> published, it will be more difficult to
> correct and purge the errors and conclusions based on them.
>
>
>
> ________________________________________
> From: taxacom-bounces at mailman.nhm.ku.edu [
> taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of mesibov at southcom.com.au [
> mesibov at southcom.com.au]
> Sent: Friday, May 03, 2013 12:10 AM
> To: TAXACOM
> Subject: Re: [Taxacom] Data quality of aggregated datasets
>
> It's a nice visualisation, but if it leads people to think 'Aw, heck, most
> of the records are more or less in the right place, what's the diff?',
> then they've missed the point (and would be surprised at the sharpness of
> millipede range boundaries). But I don't think 15%-off-by-at-least-5-km is
> good enough.
>
> The point of my paper, which Rod has noted on his iPhylo blog, is that
> aggregator-published errors need fixing, and there isn't a working
> mechanism to do that. All the displacements except those from provider G
> (see my paper) are fixed now at source (data provider) because I contacted
> the sources with corrections and queries, and the sources collaborated
> with me. That's a fix from an interested outsider, and neither GBIF nor
> ALA were involved. The correct data are in the sources' databases and on
> my Millipedes of Australia website. When they get to GBIF and ALA is
> anybody's guess.
>
> On Taxacom, at least, GBIF and ALA are holding firm to the views that (a)
> they aren't responsible for errors they perpetuate by publishing on the
> Web, and (b) error detection and fixing needs to be done by Somebody Else
> for their benefit as data publishers.
>
> The idea that an aggregator can check and upgrade/correct the data it
> publishes by collaborating directly with the source is evidently
> incomprehensible to aggregator management. No queries go back to the
> sources, no data-cleaning protocols are insisted upon by aggregators
> before the sources upload data, and the only error-detection and -fixing
> mechanism in sight is vague hand-waving by aggregators about the whole
> biodiversity community sharing the responsibility for getting things
> right. Also not good enough.
>
> (Posted from the New South Wales bush, where I've been collecting
> millipedes, of course.)
>
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
>



-- 
Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Gardner-Webb University
Boiling Springs NC 28017



More information about the Taxacom mailing list