[Taxacom] A new way to view taxonomic publications
r.page at bio.gla.ac.uk
Fri Jun 21 22:57:49 CDT 2013
Sent from my iPhone
On 22 Jun 2013, at 03:29, Donat Agosti <agosti at amnh.org> wrote:
> For my purpose I want to have a OCR accuracy rate between 99.9 and 99.99%
So this is the crux of the problem. You set a very high bar that BHL will struggle to meet in a lot of cases. This then sets limits on what you can achieve.
An alternative is to accept that things will be messier than that, and set your expectations appropriately. Plus we can think about ways to cope with messy text. It strikes me that there is a misplaced obsession with "clean" data that gets in the way of making progress. You want the world to be one way, but it's the other way.
More information about the Taxacom