[Taxacom] A new way to view taxonomic publications

David Campbell pleuronaia at gmail.com
Sat Jun 22 06:48:29 CDT 2013


>
> Sadly OCR does struggle in two very useful sections of a document: the
> table of contents and the index. Part of the problem lies with a page full
> of 'funny' words not in the software's dictionary ;-) Then there are other
> problems usually to do with layout such as non-aligned columns and leading
> lines which break the OCR accuracy.
>

My impression from BHL runs counter to this-I often get hits for taxon
names in the index but not on the corresponding text page.  This doesn't
reflect a systematic search for such cases, but simply checking references
that pop up to see what hits it has and often only getting a link to the
index.  Thus, I probably did not notice when a text citation was found but
the index wasn't.  Certainly the layout issues often create havoc with OCR
of indices (based on my personal attempts at making clean OCR), but
messed-up layout doesn't affect taxon name searches.  Of course, OCR won't
help much if the scanning was poorly done-another point at which bucket
labels are needed.

Identification of bucket quality will tend to clash with the self-interest
of promoting how great one's project is or how well the grand promises of a
grant application have been fulfilled.


>
> --
>
Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Gardner-Webb University
Boiling Springs NC 28017



More information about the Taxacom mailing list