[Taxacom] Data quality of aggregated datasets

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Mon May 6 15:50:16 CDT 2013


Yeah, Donald, that is the theory, but in practice we never seem to get very far beyond the raw data stage, and sites like GBIF are presenting that raw data in graphical form to the "general public". For another example, I always sigh when I read a taxonomic revision aimed at a wider audience and see lots of "distribution maps" at the end, many with just one or two plotted points, but only because the number of positive samples for that species is low (and, for some groups, like mites and nematodes, because there are so few people working on them and the critters are so labour intensive to slide mount and identify). I think that there is an issue here. Besides, a simple point plotted on a map is pretty meaningless without (A) a time coordinate (i.e. was it collected there in 2013 or 1758?), (B) a reliability score on the likelihood of the data being correct, and (C) a context in which the map is supposed to represent natural distribution, man-influenced
 distribution, historical distribution (perhaps back into prehistory), interception records, etc., etc. There just seems to be a rush to "do stuff" (e.g. use GPS, etc.) without much thinking going on to make what we are doing meaningful. There is a similar dilemma with host plant documentation (of insects). If you document every species of plant that a species has been found on, you can get a whole load of incidental records, which can create quite spurious "host plant relationships" in people's minds. One school of thought says that you do record everything, and then use that data to work out the true host plants. The other school of thought says that one should not mention a plant unless one is certain that there is a true host plant relationship, for fear of drowning in incidental and quite meaningless information ...
Cheers, Stephen

From: Donald Hobern [GBIF] <dhobern at gbif.org>
To: taxacom at mailman.nhm.ku.edu 
Sent: Tuesday, 7 May 2013 7:32 AM
Subject: Re: [Taxacom] Data quality of aggregated datasets


You're right, Stephen.  We are dealing with raw data.  The work of the
"aggregators" should first be to organise all of these raw data as an
evidence-base for understanding the recorded distribution of any species in
time and space, and secondly to provide the tools to support
automated/expert/community evaluation and where appropriate correction of
this evidence.  The result of this work should be an assessed and filtered
version of all available evidence that can then serve as the foundation for
understanding and modelling our best estimate of the actual distribution and
abundance of each species through time and space.

Donald

----------------------------------------------------------------------
Donald Hobern - GBIF Director - dhobern at gbif.org 
Global Biodiversity Information Facility http://www.gbif.org/ 
GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
Tel: +45 3532 1471  Mob: +45 2875 1471  Fax: +45 2875 1480
----------------------------------------------------------------------

-----Original Message-----
Date: Mon, 6 May 2013 00:54:28 -0700 (PDT)
From: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Subject: Re: [Taxacom] Data quality of aggregated datasets

Perhaps there ought to be a paradigm shift here? Georeferenced specimen
records are raw data, that's all, and should not be offered as anything
other than raw data. A map of plotted georeferenced specimen records is
pretty useless, and often misleading or just plain wrong. What we (usually)
actually want to know is the postulated present distribution of the species.
This is an area, not a set of points. It also has fuzzy boundaries, and is
sometimes better expressed in words rather than visually, e.g. the species
is widespread in the South Island, east of the Southern Alps.

Cheers, Stephen


_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom Archive back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org/

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

Celebrating 26 years of Taxacom in 2013.


More information about the Taxacom mailing list