[Taxacom] Data quality of aggregated datasets
tas27 at schweich.com
Tue May 7 16:44:53 CDT 2013
On 5/6/2013 12:32 PM, It was written:
> You're right. We are dealing with raw data. The work of the
> "aggregators" should first be to organise all of these raw data as an
> evidence-base for understanding the recorded distribution of any species in
> time and space,
I would like to challenge the assertion that the "aggregators ...
organize ... [the] raw data." I agree that maybe that's what they
*should* do, but something entirely different is actually happening.
My case in point is the distribution of Frasera paniculata. The
aggregators will show that this taxon occurs in the state of Nevada. I
disagree. You may remember that I posted previously about this, stating
that my comments on GBIF and EOL from three years ago have had no
effect. The staff at EOL responded very kindly saying that my comment
still lived there. I also learned that EOL gets its data from
DiscoverLife, which gets its data from GBIF, which gets its data from
USDA Plants. They also gave me a URL through which I could direct
comments to USDA Plants.
Following the chain of data aggregation, I went to the USDA Plants web
site. There, I learned the source of "raw data" from which it was
determined that F. paniculata occurred in Nevada was: Kartesz, J.T.
1988. A flora of Nevada. Ph.D. Dissertation. This morning I went to the
library and read what the dissertation says about F. paniculata. It
says "... reported from the Pahranagat Mts., Lincoln Co." There is
no reported collector, no collection number, no date of collection, and
I would assert that this is not RAW data. It is, at best, aggregated
data, i.e., the writer aggregated or summarized what other unidentified
person(s) said. At worst, it is completely false, perhaps caused by
the common error of mistaking F. albomarginata for F. paniculata. Even
if you pick a middle point and say something like "it's a reasonable
speculation," by the time you get through the four layers of
aggregation, it's presented as fact.
This just doesn't sound like "aggregators ... organize ... [the] raw
data" to me. It sounds more like "aggregators aggregate other
aggregators' aggregated data."
Whether USDA Plants will review the source (raw vs. aggregate) and
veracity of their data remains to be seen. I posted a comment at USDA
plants. Now waiting for a response.
Tom Schweich KJ6BIT tas27 at schweich.com
More information about the Taxacom