[Taxacom] Algorithms for misspelled taxon names (was: the hurdle for all biodiv informatics initiatives)

Tony.Rees at csiro.au Tony.Rees at csiro.au
Fri Feb 26 16:54:26 CST 2010

Paul (dipteryx at freeler.nl]) wrote:

I am glad to see that Tony is "happy to speak further to this topic,"; this is
quite informative. This approach is different in focus from what I had in mind.
The errors that are made in filling in a query window (or a spreadsheet as in
the case of the fossil molluscs) look substantially different in nature from
the variation I would expect in the literature and in databases. These are
indeed misspellings, resulting from the interaction between a person
(apparently in haste) and a keyboard. On the other hand the variation I saw
in the GNI has a much narrower range (in line with what I encounter in the
literature) and looks much more predictable.



Actually I think you will find many examples of misspelled names (not just authorities) in the GNI also - it's just that normally, one searches the GNI by an input name, and variant spellings therefore typically do not therefore show up (except at species level if one has not specified a species).

In any case, you are probably right in that the errors made by non-experts and/or OCR may be less predictable than those made by professional taxonomists in the recent literature, but bear in mind that at least some of the misspellings reflect old, often hand written labels which themselves may be misspelled (but faithfully transcribed), or mis-transcribed, or perhaps both.

I'd certainly be interested in the experiences of others in this area,

Regards - Tony


Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

More information about the Taxacom mailing list