[Taxacom] Data quality of aggregated datasets

Tom Schweich tas27 at schweich.com
Tue May 7 16:44:53 CDT 2013

On 5/6/2013 12:32 PM, It was written:
> You're right.  We are dealing with raw data.  The work of the
> "aggregators" should first be to organise all of these raw data as an
> evidence-base for understanding the recorded distribution of any species in
> time and space,
I would like to challenge the assertion that the "aggregators ... 
organize ... [the] raw data."   I agree that maybe that's what they 
*should* do, but something entirely different is actually happening.

My case in point is the distribution of Frasera paniculata. The 
aggregators will show that this taxon occurs in the state of Nevada.  I 
disagree.  You may remember that I posted previously about this, stating 
that my comments on GBIF and EOL from three years ago have had no 
effect.  The staff at EOL responded very kindly saying that my comment 
still lived there. I also learned that EOL gets its data from 
DiscoverLife, which gets its data from GBIF, which gets its data from 
USDA Plants.   They also gave me a URL through which I could direct 
comments to USDA Plants.

Following the chain of data aggregation, I went to the USDA Plants web 
site.   There, I learned the source of "raw data" from which it was 
determined that F. paniculata occurred in Nevada was: Kartesz, J.T. 
1988. A flora of Nevada. Ph.D. Dissertation.  This morning I went to the 
library and read what the dissertation says about F. paniculata.  It 
says "... reported from the Pahranagat Mts., Lincoln Co."     There is 
no reported collector, no collection number, no date of collection, and 
no voucher.

I would assert that this is not RAW data.  It is, at best, aggregated 
data, i.e., the writer aggregated or summarized what other unidentified 
person(s) said.   At worst, it is completely false, perhaps caused by 
the common error of mistaking F. albomarginata for F. paniculata.   Even 
if you pick a middle point and say something like "it's a reasonable 
speculation,"  by the time you get through the four layers of 
aggregation, it's presented as fact.

This just doesn't sound like "aggregators ... organize ... [the] raw 
data" to me.  It sounds more like "aggregators aggregate other 
aggregators' aggregated data."

Whether USDA Plants will review the source (raw vs. aggregate) and 
veracity of their data remains to be seen.  I posted a comment at USDA 
plants. Now waiting for a response.

Tom Schweich KJ6BIT tas27 at schweich.com

More information about the Taxacom mailing list