[Taxacom] BHL survey: scan quality

Chris Freeland Chris.Freeland at mobot.org
Fri May 7 18:48:12 CDT 2010

These are all great suggestions, and exactly the kind of detail we want and need to improve BHL.  At issue is that not all of BHL scanned books have b&w PDFs made available by Internet Archive. But, if that's a format that more people want ready access to, we'll make that happen.

Chris Freeland
Technical Director, BHL

-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Neal Evenhuis
Sent: Fri 5/7/2010 3:48 PM
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] BHL survey: scan quality
At 10:13 AM -1000 5/7/10, Dean Pentcheff wrote:
>I will confirm what Karl Magnacca had to say.
>The scanning strategy for most materials at BHL (via Internet Archive,
>of course) seems to be to make the visual appearance of the pages as
>close to the original as possible, including yellowed paper, etc. This
>comes at the expense (since there are always tradeoffs) of highly
>resolved text. When it comes to plates, the results are visually
>appealing, but often of poor actual resolution.
>The PDFs are generated using "Luradocument", which achieves excellent
>compression of those images, but with the cost of long rendering times
>for the pages (again, as mentioned by Karl Magnacca). The result is
>that the PDFs can be very cumbersome to use on anything but the very
>fastest desktop computers.

I have had the same problems with speed of rendering for each page of 
BHL pdfs and it is indeed frustrating at times. But the solution is 
already there. It is just not available on the BHL website.

Internet Archive (http://www.archive.org) has all of the BHL 
documents (since they are sent there from BHL nodes for compression 
and pdf-ing, etc.) and also has these documents in various formats. 
When one googles a particular book title, a link to the document on 
the Internet Archive site comes up. When you are there, your book 
defaults to the scanned text file, but the navigation box to the left 
has all the file formats available. One of these is b/w mode for pdfs.

I have downloaded these whenever I can in place of the default color 
pdf (no other choice) on the BHL website and the speed of rendering 
pages once downloaded is much faster, although still not nearly as 
fast as Google books for the same document (probably because no 
compression is done of the b/w pages by Internet Archive).

My suggestion is to make available on the BHL website more file 
formats than what is there to download now (i.e., "PDF" (color 
only),"OCR" (= scanned textfiles not proofread), "Images", or "All").

Seems simple to do -- just add more links to the downloadable file 
selection menu for each on the BHL site.

Why are these other file types only available on the Internet Archive 
site and not the BHL site?



Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

More information about the Taxacom mailing list