[Taxacom] A new way to view taxonomic publications

Roderic Page r.page at bio.gla.ac.uk
Sat Jun 22 04:01:13 CDT 2013


Hi Brian,

Sure, but this is not a problem we face alone. For example, Google "ocr german gothic" yields

http://www.frakturschrift.com/en:start

http://aclweb.org/anthology/W/W11/W11-4115.pdf

http://superuser.com/questions/244170/how-to-ocr-pdf-files-with-old-german-gothic-fraktur-text

Regards

Rod

On 22 Jun 2013, at 09:46, Dr Brian Taylor wrote:

> 
> Rod, Donat et al,
> 
> However good OCR and scanning get there will still be problems with old
> literature.  For example, archaic German - see
> http://hol.osu.edu/literature-viewer.html?id=4833&page=15
> 
> On of my human helpers struggled to read it  let alone make a translation
> into Emglish.
> 
> Asian scripts ??
> 
> Brian
> 
> 
> 
> On 22/06/2013 04:57, "Roderic Page" <r.page at bio.gla.ac.uk> wrote:
> 
>> Hi Donat,
>> 
>> Sent from my iPhone
>> 
>> On 22 Jun 2013, at 03:29, Donat Agosti <agosti at amnh.org> wrote:
>> 
>>> For my purpose I want to have a OCR accuracy rate between 99.9 and 99.99%
>> 
>> So this is the crux of the problem. You set a very high bar that BHL will
>> struggle to meet in a lot of cases. This then sets limits on what you can
>> achieve.
>> 
>> An alternative is to accept that things will be messier than that, and set
>> your expectations appropriately. Plus we can think about ways to cope with
>> messy text. It strikes me that there is a misplaced obsession with  "clean"
>> data that gets in the way of making progress. You want the world to be one
>> way, but it's the other way.
>> 
>> Regards
>> 
>> Rod
>> _______________________________________________
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>> 
>> The Taxacom Archive back to 1992 may be searched with either of these methods:
>> 
>> (1) by visiting http://taxacom.markmail.org
>> 
>> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom
>> your search terms here
>> 
>> Celebrating 26 years of Taxacom in 2013.
> 
> 
> 

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
Skype: rdmpage
Facebook: http://www.facebook.com/rdmpage
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page
Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ORCID id: http://orcid.org/0000-0002-7101-9767




More information about the Taxacom mailing list