[Taxacom] Sherborn & literature code challenge

Chris Thompson xelaalex at cox.net
Thu Sep 1 18:01:47 CDT 2011


Chuck:

I fully understand what you are saying, but I need to note in regards to 
your statement:

"... accept that the volume of those results could not be achieved by the 
available imperfect humans in several short lifetimes ..."

That there was one man, who in a half of a life time, indexed all the 
Zoological literature published between 1758 and 1850. That was Charles 
Davies Sherborn, who produced the Index Animalium.

We will celebrate him and his great achievement this coming October. See the 
announcement at the ICZN website (http://iczn.org/ )

Cheers

Chris

-----Original Message----- 
From: Chuck Miller
Sent: Thursday, September 01, 2011 2:49 PM
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Life and Literature Code Challenge

"you can't expect anything sensible to result from automation ..."
" I have discovered a few BHL bibliographies for taxa in EoL which are 
totally irrelevant to the taxon"

So, "a few" errors makes automated results not "sensible"?  It certainly 
makes them not "perfect".  But, is perfection required for results to be 
sensible?  Even published bibliographies can have errors.  So does that mean 
we can't expect anything sensible from imperfect humans either?

Rather than obsess about imperfection, perhaps we should be more "sensible" 
about the usefulness of automated results and accept that the volume of 
those results could not be achieved by the available imperfect humans in 
several short lifetimes.  There are nearly 35 million pages in BHL now. 
Better and smarter automation is a sensible way to deal with that volume.

Chuck Miller

-----Original Message-----
From: Stephen Thorpe [mailto:stephen_thorpe at yahoo.co.nz]
Sent: Wednesday, August 31, 2011 4:04 PM
To: Chris Thompson; John Mignault; taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Life and Literature Code Challenge

BHL uses uncorrected OCR which misses many names, even when they are in 
standard binomial form! Also, I have discovered a few BHL bibliographies for 
taxa in EoL which are totally irrelevant to the taxon, and just the result 
of some sort of confusion! At best, BHL bibliographies should only be 
considered to be raw materials for a sensible bibliography which highlights 
the important references. This must, of course, be done manually, and the 
Wiki sites are a good place for this to be done ...

you can't expect anything sensible to result from automation ...

Stephen

From: Chris Thompson <xelaalex at cox.net>
To: John Mignault <john at mignault.net>; taxacom at mailman.nhm.ku.edu
Sent: Thursday, 1 September 2011 8:28 AM
Subject: Re: [Taxacom] Life and Literature Code Challenge

John:

I appreciate the CHALLENGE, but I should remind ALL that "supposedly" the 
BHL literature is automatically linked to species pages of the Encyclopedia 
of Life (EoL). That is, as BHL literature is digitized, the contents are 
scanned by uBio for scientific names. And then a link is made to the 
appropriate species page.

So, the real challenge is getting the programmers of EoL to find a way so as 
to properly prioritize the order in which references to BHL literature is 
listed.

And that may be in part something that taxonomists must do manually. That 
is, make a taxonomic judgment about what are the most important references 
beyond the obvious first (original description) and see that the links 
appear in the proper order from most important to insignificant.

For you all who do not know our Encyclopedia of Life, go to www.eol.org and 
look, for example, at the species page for Musca domestica Linnaeus, the 
common house fly and click on the BHL link [beware, the EoL will be changing 
soon]

http://www.eol.org/pages/730039

You will see more than a hundred or so links, but none to the original 
description (Linnaeus 1758) simply because Linnaeus NEVER made the 
combination Musca domestica in the TEXT. The genus name is in the running 
header and the epithet is left justified in the margin! So, the combination 
is not picked up in the automatic scanning by the uBio people, etc.

But also just look at the mass of links. Sam six-pack who might was to learn 
what was buzzing about his bud would be totally confused!

Sincerely,

Chris Thompson
from home

-----Original Message-----
From: John Mignault
Sent: Wednesday, August 31, 2011 3:40 PM
To: taxacom at mailman.nhm.ku.edu
Subject: [Taxacom] Life and Literature Code Challenge

The Biodiversity Heritage Library is sponsoring a Code Challenge as part of 
the Life and Literature conference being held in Chicago November 14-15.

The Biodiversity Heritage Library (BHL) is a consortium of 12 natural 
history and botanical libraries that cooperate to digitize and make 
accessible the legacy literature of biodiversity held in their collections 
and to make that literature available for open access and responsible use as 
a part of a global “biodiversity commons.” BHL also serves as the 
foundational literature component of the Encyclopedia of Life (EOL). BHL 
content may be freely viewed through the online reader or downloaded in part 
or as a complete work in PDF, OCR text, or
JPG2000 file formats.

Your challenge is to provide

    a new, innovative way to use, disseminate or display BHL data
    a description of what your project is trying to accomplish
    the source code to reproduce the application
    any libraries or supporting code needed to reproduce the application
    any build instructions or scripts are needed to build application or 
instructions how to run it
    any notes about your experience implementing this code: how you came up 
with your design, blind alleys you went up, or surprising problems you ran 
into or anything else you want to share.


The dataset
Through local and global digitization efforts, BHL has digitized over
32 million pages of taxonomic literature, representing over 45,000 titles 
and 87,000 volumes (January 2011). The entire -corpus- dataset is freely 
available and accessible via many open methods.


Timeline

Deadline for entries is October 17, 2011. The winner will be announced on 
November 1, 2011.

More details are available on our website at 
http://www.lifeandliterature.org/p/code-challenge.html

Thanks, and enter!

--j

--
John Mignault
Systems Librarian
The LuEsther T Mertz Library
The New York Botanical Garden

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these
methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here


_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom 
your search terms here
_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom 
your search terms here 





More information about the Taxacom mailing list