AW: Text Extraction Again (from Taxonomic e-text)

agosti agosti at AMNH.ORG
Fri Jan 23 09:48:05 CST 2004

The AMNH / Ohio State University / University of Massachusetts /
Magdeburg University (Germany) was awarded a NSF/DFG bilateral grant to
work on such issues over the next three years.

One goal is specifically to further develop XML mark up for systematics
publications, developed over the last years, and then to develop tools
to extract taxonomic and geographic names to be used to build up
specific thesauri and gazeteers. Tools will also be created for more
sophisticated searches in this legacy data, as well as searching ways to
mine this whealth of data.

Target groups are ants, the entire set of AMNH publications, bugs, with
the intention to make it as general as possible.

Further information can be obtained via the PI, Tom Moritz
(tmoritz at, and a first attempt to use the XML on

Donat Agosti

Dr. Donat Agosti
Senior NRC Fellow
Jet Propulsion Laboratory
California Institute of Technology
Mail Stop 300-227
4800 Oak Grove Drive
Pasadena, CA 91109-8099
phone: 1-818-354 1219 (w); 323 - 465 1956 (h)

-----Urspr√ľngliche Nachricht-----
Von: Taxacom Discussion List [mailto:TAXACOM at LISTSERV.NHM.KU.EDU] Im
Auftrag von Beach, James H
Gesendet: Friday, January 23, 2004 9:23 AM
Betreff: Text Extraction Again (from Taxonomic e-text)

Does anyone have information on recent attempts to use text extraction
software on taxonomic e-texts and databases for the purposes of
extracting taxonomic names or other taxon attribute data?

I recall there was an Australian project 2-3 years ago, that has some
success extracting names and character data for the purpose of
automating diagnostic key construction.

We are interested in the possibility of using data extraction techniques
to populate prototype taxon concept databases we are building for our
semantic web "SEEK" Project.

Any pointers would be appreciated.  Many thanks,

Jim B.
James H. Beach
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4645, Fax: 785 864-5335
Televideocon: (H.323):

More information about the Taxacom mailing list