Richard Pyle deepreef at BISHOPMUSEUM.ORG
Mon Jan 3 20:00:34 CST 2005

> We are playing with image manipulation
> functions in php to dynamically/interactively crop and navigate the
> page image so that the current record is at the top of the image just
> under the converted data record.

Yeah, that's exactly what I figured the tricky bit would be.  I gather that
there's no easy eay to deduce the physical Y-mapping of each name on each
page based on the name-sequence alone, because not all names occupy the same
number of lines.

> It's interesting to note that the conversion service we chose that
> purported to do double-keying actually initially used OCR tools as
> well.

Yeah, that was pretty evident after a quick glance of Page 1. I'm surprised
they even pretended they didn't use it.  It's pretty damn hard to
"accidentally" enter a Copyright symbol, when one intends to type in a "e"
character.  Other characters like "A", "?", "$" aren't exactly common
mis-type characters either...


More information about the Taxacom mailing list