[Taxacom] Life and Literature Code Challenge

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Thu Sep 1 16:33:22 CDT 2011


no, no! I was trying to make the point that the automation just creates the raw materials, not the finished product, and it requires work that only humans can do to get that last step done right...


From: Chuck Miller <Chuck.Miller at mobot.org>
To: taxacom at mailman.nhm.ku.edu
Sent: Friday, 2 September 2011 6:49 AM
Subject: Re: [Taxacom] Life and Literature Code Challenge

"you can't expect anything sensible to result from automation ..."
" I have discovered a few BHL bibliographies for taxa in EoL which are totally irrelevant to the taxon"

So, "a few" errors makes automated results not "sensible"?  It certainly makes them not "perfect".  But, is perfection required for results to be sensible?  Even published bibliographies can have errors.  So does that mean we can't expect anything sensible from imperfect humans either?

Rather than obsess about imperfection, perhaps we should be more "sensible" about the usefulness of automated results and accept that the volume of those results could not be achieved by the available imperfect humans in several short lifetimes.  There are nearly 35 million pages in BHL now.  Better and smarter automation is a sensible way to deal with that volume.

Chuck Miller

-----Original Message-----
From: Stephen Thorpe [mailto:stephen_thorpe at yahoo.co.nz] 
Sent: Wednesday, August 31, 2011 4:04 PM
To: Chris Thompson; John Mignault; taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Life and Literature Code Challenge

BHL uses uncorrected OCR which misses many names, even when they are in standard binomial form! Also, I have discovered a few BHL bibliographies for taxa in EoL which are totally irrelevant to the taxon, and just the result of some sort of confusion! At best, BHL bibliographies should only be considered to be raw materials for a sensible bibliography which highlights the important references. This must, of course, be done manually, and the Wiki sites are a good place for this to be done ...
 
you can't expect anything sensible to result from automation ...
 
Stephen

From: Chris Thompson <xelaalex at cox.net>
To: John Mignault <john at mignault.net>; taxacom at mailman.nhm.ku.edu
Sent: Thursday, 1 September 2011 8:28 AM
Subject: Re: [Taxacom] Life and Literature Code Challenge

John:

I appreciate the CHALLENGE, but I should remind ALL that "supposedly" the BHL literature is automatically linked to species pages of the Encyclopedia of Life (EoL). That is, as BHL literature is digitized, the contents are scanned by uBio for scientific names. And then a link is made to the appropriate species page.

So, the real challenge is getting the programmers of EoL to find a way so as to properly prioritize the order in which references to BHL literature is listed.

And that may be in part something that taxonomists must do manually. That is, make a taxonomic judgment about what are the most important references beyond the obvious first (original description) and see that the links appear in the proper order from most important to insignificant.

For you all who do not know our Encyclopedia of Life, go to www.eol.org and look, for example, at the species page for Musca domestica Linnaeus, the common house fly and click on the BHL link [beware, the EoL will be changing soon]

http://www.eol.org/pages/730039 

You will see more than a hundred or so links, but none to the original description (Linnaeus 1758) simply because Linnaeus NEVER made the combination Musca domestica in the TEXT. The genus name is in the running header and the epithet is left justified in the margin! So, the combination is not picked up in the automatic scanning by the uBio people, etc.

But also just look at the mass of links. Sam six-pack who might was to learn what was buzzing about his bud would be totally confused!

Sincerely,

Chris Thompson
from home

-----Original Message-----
From: John Mignault
Sent: Wednesday, August 31, 2011 3:40 PM
To: taxacom at mailman.nhm.ku.edu
Subject: [Taxacom] Life and Literature Code Challenge

The Biodiversity Heritage Library is sponsoring a Code Challenge as part of the Life and Literature conference being held in Chicago November 14-15.

The Biodiversity Heritage Library (BHL) is a consortium of 12 natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” BHL also serves as the foundational literature component of the Encyclopedia of Life (EOL). BHL content may be freely viewed through the online reader or downloaded in part or as a complete work in PDF, OCR text, or
JPG2000 file formats.

Your challenge is to provide

    a new, innovative way to use, disseminate or display BHL data
    a description of what your project is trying to accomplish
    the source code to reproduce the application
    any libraries or supporting code needed to reproduce the application
    any build instructions or scripts are needed to build application or instructions how to run it
    any notes about your experience implementing this code: how you came up with your design, blind alleys you went up, or surprising problems you ran into or anything else you want to share.


The dataset
Through local and global digitization efforts, BHL has digitized over
32 million pages of taxonomic literature, representing over 45,000 titles and 87,000 volumes (January 2011). The entire -corpus- dataset is freely available and accessible via many open methods.


Timeline

Deadline for entries is October 17, 2011. The winner will be announced on November 1, 2011.

More details are available on our website at http://www.lifeandliterature.org/p/code-challenge.html 

Thanks, and enter!

--j

--
John Mignault
Systems Librarian
The LuEsther T Mertz Library
The New York Botanical Garden

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom 

The Taxacom archive going back to 1992 may be searched with either of these
methods:

(1) by visiting http://taxacom.markmail.org 

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here 


_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom 

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org 

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here


More information about the Taxacom mailing list