[Taxacom] BHL and print on demand publishers

Curtis Clark lists at curtisclark.org
Sun Apr 1 12:19:28 CDT 2012

On 3/31/2012 8:41 AM, Roderic Page wrote:
> I think what meets your need is versioning, e.g. this is the original 
> scan, here is the original OCR, this is the version with the OCR 
> corrected, this is the version with the image contrast improved, 
> resulting in cleaner OCR, etc. You want to be able to roll back to the 
> original, but also enable people to clean up, fix, or annotate the data.

I'm concerned that this is the world as you'd like it to be, rather than 
as it is. If a musical recording is by-sa (requiring attribution and 
share-alike), I can take a chunk of it, modify it however I want, and 
use it in a new composition. I'm required by the license to indicate the 
source musical work, but there's no requirement that I indicate exactly 
which 12 second part I used, or how I transformed it, and I may have 
transformed it heavily. Only a very sophisticated analysis of the 
original and my derivative might have any chance at all of finding the 
portion of the original that I used.

If I take a description of a new name from a long monographic work that 
is licensed by-sa and change the wording (especially if it is a plant 
description in Latin, where discrepancies are by most of us harder to 
spot), a simple fuzzy search might well find the original passage, and 
highlight my change ("Did you mean 'frutex *gracilis*'?), but a user 
might never think to look, and if I change every word (allowed by the 
license), I could effectively create a new description out of air.

Of course we want to clean up the data, but the license only requires 
citing the original, not anything that a reasonable user might call 

> As an aside, I also think scanning nomenclatural-only portions would 
> be a mistake, and one BHL hasn't made. There are economies of effort 
> by scanning in bulk irrespective of whether the item scanned is 
> relevant to a particular group. A bit like whole genome sequencing, 
> you go after everything, not the bits some feel will be relevant.

I certainly agree about the original digitization, but, just as I might 
pull a single gene from a genome sequence for analysis, an aggregator 
might want only the nomenclatural bits. And, absent a requirement to 
provide explicit backward links to the relevant sections, I have to 
trust in the aggregator not making changes. As Donald Rumsfeld said, 
"Trust, but verify". The easier it is to verify, the easier it is to trust.

> Lastly, Creative Commons rests on copyright, and it's not clear to me 
> how you copyright a fact, nor whether you'd want to.

I'm not sure what part of this is a fact, beyond that a specific new 
name was described in the literature. We might all agree that a 
protologue is a fact, but I'm not sure that has ever been tested in court.

Curtis Clark        http://www.csupomona.edu/~jcclark
After 2012-01-02:
Biological Sciences                   +1 909 869 4140
Cal Poly Pomona, Pomona CA 91768

More information about the Taxacom mailing list