[Taxacom] BHL and print on demand publishers
lists at curtisclark.org
Sun Apr 1 12:19:28 CDT 2012
On 3/31/2012 8:41 AM, Roderic Page wrote:
> I think what meets your need is versioning, e.g. this is the original
> scan, here is the original OCR, this is the version with the OCR
> corrected, this is the version with the image contrast improved,
> resulting in cleaner OCR, etc. You want to be able to roll back to the
> original, but also enable people to clean up, fix, or annotate the data.
I'm concerned that this is the world as you'd like it to be, rather than
as it is. If a musical recording is by-sa (requiring attribution and
share-alike), I can take a chunk of it, modify it however I want, and
use it in a new composition. I'm required by the license to indicate the
source musical work, but there's no requirement that I indicate exactly
which 12 second part I used, or how I transformed it, and I may have
transformed it heavily. Only a very sophisticated analysis of the
original and my derivative might have any chance at all of finding the
portion of the original that I used.
If I take a description of a new name from a long monographic work that
is licensed by-sa and change the wording (especially if it is a plant
description in Latin, where discrepancies are by most of us harder to
spot), a simple fuzzy search might well find the original passage, and
highlight my change ("Did you mean 'frutex *gracilis*'?), but a user
might never think to look, and if I change every word (allowed by the
license), I could effectively create a new description out of air.
Of course we want to clean up the data, but the license only requires
citing the original, not anything that a reasonable user might call
> As an aside, I also think scanning nomenclatural-only portions would
> be a mistake, and one BHL hasn't made. There are economies of effort
> by scanning in bulk irrespective of whether the item scanned is
> relevant to a particular group. A bit like whole genome sequencing,
> you go after everything, not the bits some feel will be relevant.
I certainly agree about the original digitization, but, just as I might
pull a single gene from a genome sequence for analysis, an aggregator
might want only the nomenclatural bits. And, absent a requirement to
provide explicit backward links to the relevant sections, I have to
trust in the aggregator not making changes. As Donald Rumsfeld said,
"Trust, but verify". The easier it is to verify, the easier it is to trust.
> Lastly, Creative Commons rests on copyright, and it's not clear to me
> how you copyright a fact, nor whether you'd want to.
I'm not sure what part of this is a fact, beyond that a specific new
name was described in the literature. We might all agree that a
protologue is a fact, but I'm not sure that has ever been tested in court.
Curtis Clark http://www.csupomona.edu/~jcclark
Biological Sciences +1 909 869 4140
Cal Poly Pomona, Pomona CA 91768
More information about the Taxacom