[Taxacom] Inappropriate accuracy of locality data

Arthur Chapman taxacom3 at achapman.org
Tue Nov 30 17:01:47 CST 2010

Thanks Richard

I may talk to you off line about this sometime.

One other thing we are exploring in conjunction with others is the idea 
of foot-printed/temporal gazetteers. Most gazetteers in the past have 
only given the one lat/long for each location, however there are some 
gazetteers that are moving toward polygons. For our biodiversity 
purposes, however - if we are using a collection like "20 km NW of  
Toowoomba" - there is an uncertainty of what is meant by Toowoomba - the 
centre, the Post Office, the Railway Station, or did they mean from the 
outskirts.  So the footprint of Toowoomba adds to the uncertainty (as 
described in the Best Practices Document).  But, now if the collection 
was made in 1865, then Toowoomba was an entirely different entity with 
an entirely different footprint then than it has now - so there are 
distinct implications with respect to uncertainty.  It can be quite 
complicated, and if people wish to continue to use the point-radius 
method, then they can do so (I only wish they would - most of the data 
supplied to GBIF has the uncertainty field left blank), but by using 
foot-prints we are hopeful of being able to reduce the area implied by 
the uncertainty.

Another issue that has been discussed is the lack of documented 
gazetteer locations for many historic collection localities.  One 
suggestion has been to harvest these locations from GBIF and look at the 
lat/longs given there - and modify if necessary. One such locality in 
Australia was a certain mile peg on the Great Northern Highway in 
Western Australia.  It was where many collectors turned off the highway 
- so you get collections labelled things like "50km NNW of Mile Peg 484 
on Great Northern Highway"  That mile peg doesn't exist any more - the 
road may even be in a different place - and it doesn't exist in any 
gazetteer.  It has also been suggested that we could develop a wiki-like 
gazetteer of collecting localities that could be edited on-line by 
individuals somewhat like Stephen's wiki approach.  Such a gazetteer 
could then be linked into the BioGeomancer workbench for others to use.

Just a few thoughts on where we may be heading


Arthur Chapman
Toowoomba, Queensland, Australia

On 1/12/2010 8:22 AM, Richard Pyle wrote:
>> We wish to cover
>> uncertainty in more depth, especially the possibility of using
> foot-printed
>> uncertainty rather than just the point-radius method most are using now.
>> This would use polygons and include things like buffering, roads, etc.
> I generated a system years ago for buffering lines and polygons. Basically,
> a line or polygon is defined by a sequence of points, and if each point has
> an error radius around it, that radius can be translated into a buffer zone
> around the margins of the line or polygon.  By attaching the error distance
> to each point (rather than abstracted to the polygon as a whole), there can
> be different error/buffer zones on different parts of a polygon. For
> example, if a terrestrial plant specimen is vaguely ascribed from some area
> that abuts a water body, and you can say with confidence that the plant
> would not have occurred over the water body, then the points along the
> polygon that border the water body can have relatively small error; whereas
> the borders of the same polygon with less distinct boundaries could have
> larger error values associated with the relevant points.
> I eventually mothballed this, for several reasons:
> 1) I could never decide whether the error around the border of a polygon
> only applied externally to the defined polygon border, or simultaneously
> internal and external (error values on points that describe a line would
> obviously apply to both sides of the line)
> 2) I did not have any tools for easily generating or visualizing these
> buffered lines&  polygons
> 3) Nobody else seemed very interested.
> The value of the point-radius approach is that it's the easiest and least
> time-consuming way to translate the existing human-friendly locality
> descriptors attached to most of our data content into something that can be
> processed by a computer.  The only caveat is that the scale of the error
> radius needs to be acknowledged when interpreting the values for analytical
> purposes.
> For example, if I have a fish specimen with locality data "Oahu, Hawaiian
> Islands", then a point-radius result would put a point at the geographic
> center of Oahu (where the fish was surely not captured), and describe a
> circle that included the entire island plus enough of the surrounding water
> to encompass the likely collecting place for the specimen.  Such information
> would be useless for modeling where on Oahu that species occurs.  But it's
> reasonably useful for modeling where, within the Hawaiian Archipelago, the
> species occurs -- and it would be very precise information on the scale of
> modeling where within the Pacific Ocean the species occurs.
> I don't fault the point-radius approach taken so far; but I agree it would
> be valuable to also develop a standard for describing lines and polygons
> that is practical to use within our community.
> Aloha,
> Rich

More information about the Taxacom mailing list