Wed Aug 17 10:22:28 CDT 1994

   The availability of electronic and on-line gazetteers is certainly a
great advance and makes life much easier for those of us dealing with
specimen-based data.  However, I get the feeling that the connection between
the actual biological object we have in hand and the information returned
from the gazetteer has become obscured.

   As I see it, the information that is critical is the actual spot on the
ground where the specimen came from.  What we need is to document that spot
as accurately and unambiguously as possible.  There are several methods to
do this, with the most common being to reference a grid system (e.g.
latitude/longitude) or to refer to some identifiable, named place (e.g. a
town, river, or mountain).  In the past, the system used was as much a
matter of convenience as anything: named places were relatively easy to get
and close enough to the actual spot for 99% of the scientific studies being
undertaken.  Today, the use of computer analysis and the large amounts of
data available make coding with reference to a map-grid system essential.
Thus the emphasis on geocoding old place-name based records with lat-long.

   The problem I've run into while doing this geocoding is that many of the
old place-name based records don't translate directly without introducing a
significant amount of error.  For example, it's common to have specimens
reported as from "10 miles north of Town X".  It's very tempting to look up
Town X in the gazetteer, calculate an offset for 10 miles north, modify the
latitude, and call it good.  It's quick, easy and probably pretty close.
Unfortunately, this can give a false sense of accuracy for the location.  As
often as not, checking a detailed map will show that the road that leads out
of Town X to the north actually goes northwest, not north, and that there is
no easy access to the area directly north of the town.  We now have two
possibilities: First, the collector actually drove northwest out of town for
10 miles, pulled off on the side of the road and picked up the specimens; or
second, that they actually did travel over hill and dale exactly 10 miles on
compass bearing 0 degrees from the centre of town and collected on that
apparently random spot in the middle of nowhere.  Having been around insect
collectors for a few years now I would say that the first scenario is much
more likely.

    Collecting from rivers posses the same sort of problem.  The chances are
very high that collecting will be done near where a road crosses the river,
and this can only be determined from a very specific data label or by having
a look at a map.  The use of a gazetteer is especially risky in this case
because the mapping agency who geocoded the river will pick a biologically
arbitrary spot to represent the river in their gazetteer and it's unlikely
that will correspond to likely collecting points on the river.

    In summary, here is what I've found after geocoding some 20,000 ant
records from Australia.  On-line gazetteers are a great research tool and
can save a tremendous amount of time.  However, they are only a tool and
what is critical is the actual spot on the ground where the specimen came
from, and this is independent of the quality or completeness of any
gazetteer.  Also, every locality record has some level of error associated
with it (even GPS boxes have errors, even if they are only on the order of
metres).  It is very helpful (and more honest) to include some error
estimate along with the geocoded data.  This is especially important if the
records are going to be used for environmental modelling or if GIS systems
are going to be used to draw conclusions from the data (for example, to
estimate the elevation or vegetation type from the lat/long. alone).  It's
important to know that a locality originally recorded as "New York City"
could be anywhere within a 30 mile radius, while "35.34'51"S 147.28'10"E" is
probably within a few metres of the actual spot on the ground.

