[Taxacom] Errors in compilations
stephen_thorpe at yahoo.co.nz
Thu Jan 16 13:09:45 CST 2014
One possibly significant assumption you are making is that the actual final error rate is a known quantity. In my experience, this is not always the case.
From: Bob Mesibov <mesibov at southcom.com.au>
To: Francisco Welter-Schultes <fwelter at gwdg.de>
Cc: TAXACOM <taxacom at mailman.nhm.ku.edu>
Sent: Thursday, 16 January 2014 9:46 PM
Subject: Re: [Taxacom] Errors in compilations
"In an excellently compiled list of available molluscan genera (MNHN Paris, Bouchet & Rocroi) I controlled some 4500 entries and documented the error classes. Only 10 names were misspelled in the final list, and a few of them were debatable (o/oe problem). In 140 cases author and date were incorrectly cited, in 55 cases the name was unavailable. 50 concerned an incorrectly given original source, 200 entries had problems with the page number, 200 genera had incorrect type species, 400 an incorrectly given mode of type designation. I found only 8 overlooked names, but had no method to obtain a reliable figure on that error class."
Many thanks for letting us know the error categories in this particular list. So these are errors you found in the list *after* basic data cleaning - they are errors in what you might call the 'meaningful content' of the cleaned list, and they amount to *at most* ca 20-25% of the 4500 entries (some entries might have more than one of the error categories), i.e. no more than 10+140+55+50+200+200+400/4500.
Also, you suggest that nearly 100% of these errors arise at the time that data are manually entered, but looking at your categories I can imagine that many of them actually arise before the entries were compiled, and that some of them come from the publications used as sources. I apologise for using the phrase 'original publications', which is ambiguous. What I meant was 'source publications', i.e. the publications used by the compiler.
And you estimate that even after correcting the errors noted above, there is still something like a 2-5% error rate? Would that consist entirely of overlooked names or publications?
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
The Taxacom Archive back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org/
(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
Celebrating 26 years of Taxacom in 2013.
More information about the Taxacom