Taxonomic Name Extraction | TaxonGrab v1.1

Roderic Page at BIO.GLA.AC.UK
Tue Jun 7 22:53:40 CDT 2005

This looks nice, and there is a great need for this sort of tool. For
example, in the recent ECAT e-conference, I suggested using text
extraction tools to get names from PubMed, as almost every week there
are new taxon names appearing in PubMed, all tied to a publication.

As one example, consider this paper:

The abstract is:

Aploparaksis demshini n. sp. is described from a woodcock Scolopax
rusticola L. from different parts of the Palaearctic (Lithuania,
Karelia, the Urals, Primorskiy Kray). It differs from the most similar
species A. belopolskajae Bondarenko, 1988, a parasite of snipes
Gallinago spp., in the form and length of the rostellar hooks and the
smaller cirrus, and from two other similar species, A. clavata
Spasskaya, 1966 and A. schilleri Webster, 1955, by having an
embryophore with polar thickenings and a spindle-shaped cirrus. The
life-cycle of the parasite was studied under experimental conditions.
The metacestodes were commonly located under the chlorogogenous tissue
of the intestine of the earthworms Eisenia foetida(Savigny),
Dendrobaena octaedra (Savigny) and E. nordenskioldi(Eisen), and in the
wall of the intestine of the enchytraeid Briodrilus arcticus(Bell). The
metacestodes exhibit a pattern of postembryonal development typical for
the cysticercoid modification termed an 'ovoid diplocyst'.

The NLP tool results in:
A. belopolskajae
A. clavata
A. schilleri
Aploparaksis demshini
Dendrobaena octaedra
Scolopax rusticola

Note that it missed Briodrilus arcticus(Bell), Eisenia
foetida(Savigny), and E. nordenskioldi(Eisen) -- I guess because of the
missing space between name and authority -- and also Gallinago (in the
abstract as "Gallinago spp."). I wonder whether it can also figure out
that A. clavata is actually Aploparaksis clavata?

Another example is

Here, NLP picks up P. juxtanucleare and Plasmodium dominicana, but
misses Diptera: Culicidae: Culicinae  [written in the oft-used format
(Diptera: Culicidae: Culicinae)], and Galliformes.

So, an impressive start, but I guess this problem will need more work.

One other comment -- the site displays the OSI logo, but the software
doesn't seem to be available (apart from a link on NLP Analysis - File
upload window which is to a fragment of PHP code).

I don't mean these comments to be negative, I think this is very timely



On 7 Jun 2005, at 20:37, Drew Koning wrote:

> Greetings.
> In conjunction with the NSF, I've written a web-based NLP solution to
> extract taxonomic names from text. I would greatly appreciate any
> feedback
> your community can provide.
> This tool was written under a National Science Foundation Grant:
> "Collaborative Research: Development of new digital library
> applications in
> the context of a basic ontology for biosystematics information using
> the
> literature of entomology"
> Cheers,
> Drew Koning
> +1 212.496.3569
> Informatics - American Museum of Natural History
> Central Park West @ 79th Street
> New York, NY 10024
Professor Roderic D. M. Page
Editor, Systematic Biology
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email: at

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:
Search for taxon names at

More information about the Taxacom mailing list