[Taxacom] Errors in compilations
mesibov at southcom.com.au
Thu Jan 16 02:46:08 CST 2014
"In an excellently compiled list of available molluscan genera (MNHN Paris, Bouchet & Rocroi) I controlled some 4500 entries and documented the error classes. Only 10 names were misspelled in the final list, and a few of them were debatable (o/oe problem). In 140 cases author and date were incorrectly cited, in 55 cases the name was unavailable. 50 concerned an incorrectly given original source, 200 entries had problems with the page number, 200 genera had incorrect type species, 400 an incorrectly given mode of type designation. I found only 8 overlooked names, but had no method to obtain a reliable figure on that error class."
Many thanks for letting us know the error categories in this particular list. So these are errors you found in the list *after* basic data cleaning - they are errors in what you might call the 'meaningful content' of the cleaned list, and they amount to *at most* ca 20-25% of the 4500 entries (some entries might have more than one of the error categories), i.e. no more than 10+140+55+50+200+200+400/4500.
Also, you suggest that nearly 100% of these errors arise at the time that data are manually entered, but looking at your categories I can imagine that many of them actually arise before the entries were compiled, and that some of them come from the publications used as sources. I apologise for using the phrase 'original publications', which is ambiguous. What I meant was 'source publications', i.e. the publications used by the compiler.
And you estimate that even after correcting the errors noted above, there is still something like a 2-5% error rate? Would that consist entirely of overlooked names or publications?
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195
More information about the Taxacom