[Taxacom] Data quality of aggregated datasets

Richard Pyle deepreef at bishopmuseum.org
Tue May 7 05:04:56 CDT 2013


The problem with using number of decimal places to represent accuracy is
that you're limited to such representation only in powers of ten. Moreover,
assuming you store these values as numbers, ten percent of your values will
be one order of magnitude off in terms of precision (i.e., ten percent of
values will have the final digit as "0", which numeric data fields will
trim); one percent will be off by two orders of magnitude, and so on.  The
combination of an arbitrarily precise point plus a radius, when used as Bob
describes below, is a far more flexible & powerful method for representing
both place and accuracy.  As Bob says, the correct way to interpret a
point+radius is as the definition of a circle, within which there is a high
probability for the occurrence to have happened. There are methods for
calculating the radius, such that unknown datum, datum error, and other
factors are taken into account.

Note: Place some emphasis on the word "arbitrarily" above when I talk about
"arbitrarily precise"  As long as an appropriate radius value is provided,
there is absolutely no harm in representing the "point" part via arbitrarily
precise numbers, such as -17.6000003814697, 145.699996948242.  There is,
however, non-trivial harm when doing so while relying on the number of
included digits as a representation of accuracy.

Note also the difference between precision and accuracy.  The location of a
collected insect could (theoretically) be represented by coordinates of
precision to a few cm to a few mm (depending on what kind of insect we're
talking about).  The point+radius approach is intended to represent
accuracy, not precision.  The precision of the true location will always be
limited by the physical size of the organism; but the accuracy will
generally be much larger than this.

Aloha,
Rich

> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-
> bounces at mailman.nhm.ku.edu] On Behalf Of Robert Mesibov
> Sent: Monday, May 06, 2013 11:20 PM
> To: Quentin Groom
> Cc: TAXACOM
> Subject: Re: [Taxacom] Data quality of aggregated datasets
> 
> Quentin Groom wrote:
> 
> "This is the problem I've always had with the point-radius method. It
> encourages people to document a very precise coordinate and then account
> for the error in the radius. The error should be obvious from the number
of
> decimal places you write, just like any other measurement."
> 
> I think that depends on how you understand the point-radius method. The
> idea is that there's a circle which completely contains the area searched
or
> sampled. The point is simply an estimate of that circle's centre, and is
not
> meant to be an estimate of the location of the actual collecting site
(assuming
> there was just one) plus a measurement error. The point+radius define a
> circular *area* containing the collecting site in an easily understandable
way.
> 
> How many decimal places you use for the point's location should obviously
> (to me, anyway) depend on the magnitude of the radius. Recording
> 22°06'57.54"S 117°53'15.31"E +/- 100 m is, I think, bizarre. [We've had a
> discussion about this on Taxacom before, and some listers think that
> rounding off is throwing away data.]
> 
> You can also define a collecting *area* by the implied uncertainty in a
single
> point estimate, as you suggest. However, I don't think many people on this
> list would know the uncertainty at a glance in 22.116°S 117.888°E. A
computer
> can pull it out, but eyeballing the area in a point-radius record is
easier.
> 
> Another difficulty with implied uncertainty occurs with the
above-mentioned
> computer when UTM data are converted to lat/lon, or vice-versa, or for
that
> matter with lat/lon format conversions. In my audit paper in ZooKeys I
cite a
> wonderful GBIF/ALA example where '12 km SE of Millaa Millaa' (Queensland,
> 1971) got processed from 17°36'S 145°42'E to -17.6000003814697
> 145.699996948242. Implied uncertainty of a few atomic radii, maybe?
> --
> Dr Robert Mesibov
> Honorary Research Associate
> Queen Victoria Museum and Art Gallery, and School of Agricultural Science,
> University of Tasmania Home contact: PO Box 101, Penguin, Tasmania,
> Australia 7316
> Ph: (03) 64371195; 61 3 64371195
> 
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
> 
> (1) by visiting http://taxacom.markmail.org
> 
> (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> 
> Celebrating 26 years of Taxacom in 2013.





More information about the Taxacom mailing list