[Taxacom] Data quality of aggregated datasets

Doug Yanega dyanega at ucr.edu
Tue May 7 15:21:17 CDT 2013

On 5/7/13 12:04 PM, Dave Vieglais wrote:
> In addition, the confidence associated with the error estimate should also be recorded. For example, does "+/- 100m" refer to the 95% confidence interval? Circular Error Probable (50%)? 3 sigma ellipse (98.9%)? etc.
This sounds like what I mean by false precision. If your error radius is 
"The ballpark estimate for the wandering radius of this particular 
entomologist after he parked his car" then there is NO utility in trying 
to assign confidence limits and probability values to that radius. If 
you just say "We arbitrarily draw a 2 km radius in order to accommodate 
for all reasonable sources of error LESS than 2 km in extent", then, 
likewise, no further parameters are necessary or appropriate. Even for 
georeferences that I personally recorded using a GPS, I will at least 
double the actual distance I walked from that point when providing an 
error radius, rounded to the nearest multiple of 100 meters, just so it 
is NOT necessary to specify anything more detailed - because it is 
vastly simpler to just use a large enough error value to SUBSUME any of 
the error values one could painstakingly calculate or quantify using 
other means. Why would anyone bother trying to calculate a confidence 
interval for a label that says only "6.35 miles S of Chicago, IL", or 
"Delhi, India"? Worrying about false precision just wastes time, since 
it doesn't increase the value of one's data; that is, in effect, the 
criterion that distinguishes *false* precision. More numbers, or more 
decimal places, is valuable *only up to a point*. What is important, as 
Rich Pyle notes, is ACCURACY.

On 5/7/13 12:35 PM, Dean Pentcheff wrote:
> I agree. I see those "elaborate best practices" (as encoded by Chapman 
> & Wieczorek in the Geomancer document) as a codification of 
> well-throught-through rules of thumb that can be applied in the 
> absence of other information. The key (in my mind) is that the 
> "objective, and very precise" estimates always yield to additional 
> information.
> In your example, the critical piece of additional information is 
> "campground". If the hypothetical label was just "4 mi E Logan", I 
> don't think you could do much better than the automatic estimated 
> location. But "campground 4 mi E Logan" lets you (yes you, an expert 
> :) snuffle around for that feature, find it, assess whether it's 
> likely to be the campground in question (how many other campgrounds 
> are in that area?), and if it seems reasonable, assign that as the 
> high-probability collection location.
Actually, a human would still do much better than Biogeomancer if the 
label was just "4 mi E Logan", because it would start from the center of 
the city, where a human would start from the edge AND measure along a 
road. The bigger the city, and the more roads deviate from perfectly 
straight, and along cardinal compass headings, the worse an automated 
system will perform. One does not have to be an expert, one simply has 
to realize that (1) human beings driving cars measure distances from 
boundaries and landmarks, using their odometers, (2) roads can curve, 
and (3) roads can go at angles other than increments of 45 degrees 
relative to a starting point. Show me an automated georeferencing tool 
that incorporates all three of those realities and I'll be the first 
person to hail that tool as the answer to our prayers. For the time 
being, I simply don't accept that you can get high-quality georef data 
except by human analysis. Remember, empirical results indicate only 
about a 40% match between the two protocols, and only 60% of records 
overlap within a reasonable error radius.


Doug Yanega      Dept. of Entomology       Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314     skype: dyanega
phone: (951) 827-4315 (disclaimer: opinions are mine, not UCR's)
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82

More information about the Taxacom mailing list