Text Extraction Again (from Taxonomic e-text)

Mike Dallwitz mike.dallwitz at NETSPEED.COM.AU
Sun Jan 25 00:04:42 CST 2004


- From: "Beach, James H" <beach at KU.EDU>

> Does anyone have information on recent attempts to use text
> extraction software on taxonomic e-texts and databases for the
> purposes of extracting taxonomic names or other taxon attribute data?

Here are some programs that extract taxon attribute data from
natural-language descriptions.

Diederich, J., Fortuner, R. & Milton, J. (1999). Computer-assisted
data extraction from the taxonomical literature.
http://math.ucdavis.edu/~milton/genisys.html.

Gouda, E. J. TAXASOFT DELTA Programs (DDCONV).
http://botu07.bio.uu.nl/taxasoft/

Taylor, A. (1996). Extracting Knowledge from Biological Descriptions.
http://www.cse.unsw.edu.au/~andrewt/papers/nlp_vlkb95/nlp_vlkb95.html

I'm sceptical about such programs, because most conventional
descriptions are so bad (i.e. non-comparative) that even people find it
difficult to extract useful information from them.

When using these programs, keep in mind that a character list is not
just a list of words or phrases that have been used to describe a group
of organisms. Constructing a character list requires taxonomic wisdom
and judgement.

The first test I would apply to a program for creating a descriptive
database from descriptions would be to see whether it can reconstruct a
DELTA database from natural-language descriptions generated from that
database.

--
Mike Dallwitz
13 Warrambool Close, Giralang ACT 2617, Australia
Phone: +61 2 6241 2884
Email: mike.dallwitz at netspeed.com.au  Internet: http://delta-intkey.com




More information about the Taxacom mailing list