Imprecise locality records

Doug Yanega dyanega at POP.UCR.EDU
Tue Aug 22 15:56:18 CDT 2000

Bob Mesibov wrote:

>A widely used procedure for dealing with uncertain locality records in
>databases is to specify them as well-defined points in the main locator
>fields, then to add an appropriate uncertainty to a precision field.
>This uncertainty entry can be something like (plus or minus) '1 mile' or
>(plus or minus) '50 miles', and details (as Peter Rauch says) can be
>spelled out in the metadata.

You *can* do this, but does everyone actually go to the trouble? Isn't
there anyone here with a database of collection localities that does NOT
have some sort of "precision" field? I know at least one MAJOR collection
of this type, and there must be others. "Widely used" and "universal" are
two very different things, especially when ambitious folks talk of
integrating all the collection databases in the world into one functional
whole. They CANNOT be integrated if some databases lack fields such as
this, which are essential for certain highly-desirable purposes such as
actual/theoretical distribution maps. Obviously, a record such as "Palm
Canyon" may well incorporate a variety of habitat types, elevations, etc.,
and simply choosing an arbitrary lat/long could be quite misleading - a
record such as this is better excluded from certain types of analyses, and
my question was along the lines of can we come up with a standard way of
indicating this sort of thing. A "linear displacement error" field will
*NOT* necessarily accomplish this - points 1 mile apart in the middle of a
homogenous habitat at a uniform elevation are not likely to represent a
problem, while points 1 mile apart in an area of high topographic relief
can be like night and day. You can't just look at linear error and make an
arbitrary cutoff. Sure, for MOST purposes an error of a mile is probably
okay, including simple plots of distribution, but not for ALL purposes.
To do it right requires that you look at everything within that error zone
and evaluate the possible values for parameters you might assume to be
uniform (if they are, then it's safe to include that point). But that's an
incredible LOT of work. It's extremely hard to accommodate, but there's a
whole planet full of potential database users for whom this could all be
important, so it's worth thinking about whether we'll ever be able to deal
with this practically, and if not, what sort of practical solution *can* we
come up with to maximize the utility of these databases and simultaneously
minimize the errors that can arise from using them "as is".

>A harder but more satisfactory approach is to upgrade the precision of
>the original record. The taxonomic literature is rich in instances of
>locality corrections and clarifications, the result of taxonomists
>chasing up information in expedition reports, field notebooks and
>people's memories. It's the kind of tedious, nit-picking work that should only
>be done when precise locality information is taxonomically important.

That's one of the most important aspects: getting accurately georeferenced
data, even when the locality info is nearly exact, is still not a process
that can be automated. I have yet to see a program that can take a label
like "23 km S Tamazanchale" and do what a human can do: first catch the
typo in the town name (essential), then look at a road map, trace the road
(which in this case goes more W than S), and give a close match. Most of us
do not have the personnel to devote to this sort of time-consuming
detective work, and we also might not have (in the present example) a map
of Mexico that indicates minutes of lat/long, only degrees, so even if we
DO the legwork, our resources may be inadequate to give a *precise* data
point (does anyone know of any on-line or *easily-acquired* printed maps of
Mexico that go to minutes, incidentally? I could REALLY use some...).

>You're out of luck if the collector is long dead, there were never any
>field notes and the label says something like 'Brushy Creek', which
>wanders for 100 miles through a landscape lacking any other named
>features to which the collector could have referred. I've seen published
>maps in which such localities are shown as very large question marks
>over arbitrary points!

As well it should be - if a point is arbitrary, the users of that data need
to know it.


Doug Yanega        Dept. of Entomology         Entomology Research Museum
Univ. of California - Riverside, Riverside, CA 92521
phone: (909) 787-4315 (standard disclaimer: opinions are mine, not UCR's)
  "There are some enterprises in which a careful disorderliness
        is the true method" - Herman Melville, Moby Dick, Chap. 82

More information about the Taxacom mailing list