Text Extraction Again (from Taxonomic e-text)

Mike Dallwitz mike.dallwitz at NETSPEED.COM.AU
Sun Jan 25 00:04:42 CST 2004

- From: "Beach, James H" <beach at KU.EDU>

> Does anyone have information on recent attempts to use text
> extraction software on taxonomic e-texts and databases for the
> purposes of extracting taxonomic names or other taxon attribute data?

Here are some programs that extract taxon attribute data from
natural-language descriptions.

Diederich, J., Fortuner, R. & Milton, J. (1999). Computer-assisted
data extraction from the taxonomical literature.

Gouda, E. J. TAXASOFT DELTA Programs (DDCONV).

Taylor, A. (1996). Extracting Knowledge from Biological Descriptions.

I'm sceptical about such programs, because most conventional
descriptions are so bad (i.e. non-comparative) that even people find it
difficult to extract useful information from them.

When using these programs, keep in mind that a character list is not
just a list of words or phrases that have been used to describe a group
of organisms. Constructing a character list requires taxonomic wisdom
and judgement.

The first test I would apply to a program for creating a descriptive
database from descriptions would be to see whether it can reconstruct a
DELTA database from natural-language descriptions generated from that

Mike Dallwitz
13 Warrambool Close, Giralang ACT 2617, Australia
Phone: +61 2 6241 2884
Email: mike.dallwitz at netspeed.com.au  Internet: http://delta-intkey.com

More information about the Taxacom mailing list