[Taxacom] Towards a consensus higher classificationoforganisms(was: List of Orders of the world), misspellings, etc...

Chuck Miller Chuck.Miller at mobot.org
Tue Jun 24 14:45:32 CDT 2008


I have to make a shameless plug here for Botanicus (www.botanicus.org)
which is integrated with Tropicos (www.tropicos.org).  

The Botanicus project has been digitizing 18th, 19th & early 20th
century botanical reference literature for several years now and
continuing.  Today there are 1,927 volumes, 882,409 pages with 171,410
protologue links from Tropicos. The pages are all OCRed using Prime. The
protologue pages are cross-referenced in Tropicos.  The OCRed text is
also scanned by uBio to detect scientific names as best they can be
found with those algorithms. 

Example for Poa annua L.
http://www.tropicos.org/Name/25509881
http://www.botanicus.org/page/358087

The Botanicus digitized pages are also included within the Biodiversity
Heritage Library.  http://www.biodiversitylibrary.org/item/13829

Chuck

Chuck Miller
VP-IT & CIO
Missouri Botanical Garden
4344 Shaw Boulevard
St. Louis, MO 63110 USA

-----Original Message-----
From: Jim Croft [mailto:jim.croft at gmail.com] 
Sent: Monday, June 23, 2008 5:50 PM
To: Jerry Cooper
Cc: Taxacom
Subject: Re: [Taxacom] Towards a consensus higher
classificationoforganisms(was: List of Orders of the world),
misspellings, etc...

> IndexFungorum (and the decades of printed indexes on which it is
based)
> is a very good example of the index that Rod describes (and IPNI
isn't).
> The fact that protologues are linked to IndexFungorum as jpegs of page
> scans, as opposed to OCR'd documents, is therefore largely irrelevant.
> From a nomenclatural standpoint the combination of the name index
> and the page scans satisfies most needs.

We have been experimenting with this as part of the Australian Plant
Name Index and to our surprise found it was possible to sort of OCR
the document as it was being PDFed so instead of a just a graphic you
ended up with a facsimile that was sort of searchable on the text.  We
were looking for an escape route that would enable the protologue to
be parsed and endatabased some time in the future when we had the
time, the staff and the technology.  We were able to convince
ourselves that making the pdf's was not a sunk investment of time
because the text could indeed be extracted for when we needed it.

Now all we need is a bunch of slaves chained to scanners and a library
starting at 1753...

jim






More information about the Taxacom mailing list