Taxonomic Name Extraction | TaxonGrab v1.1
drew at AMNH.ORG
Tue Jun 7 21:08:36 CDT 2005
I'm glad you appreciated the application. As for the various issues you
raised, syntax will always be an issue. I tried to create rules that would
adhere to the majority of scientific publications, but individual tweaking
will always be needed for more 'unusual' formatting.
Also, the fragment of PHP code available is the nuts and bolts of the
system, but the rest of the source code will be available on SourceForge,
as soon as I finalize the current version.
> This looks nice, and there is a great need for this sort of tool. For
> example, in the recent ECAT e-conference, I suggested using text
> extraction tools to get names from PubMed, as almost every week there
> are new taxon names appearing in PubMed, all tied to a publication.
> As one example, consider this paper:
> The abstract is:
> Aploparaksis demshini n. sp. is described from a woodcock Scolopax
> rusticola L. from different parts of the Palaearctic (Lithuania,
> Karelia, the Urals, Primorskiy Kray). It differs from the most similar
> species A. belopolskajae Bondarenko, 1988, a parasite of snipes
> Gallinago spp., in the form and length of the rostellar hooks and the
> smaller cirrus, and from two other similar species, A. clavata
> Spasskaya, 1966 and A. schilleri Webster, 1955, by having an
> embryophore with polar thickenings and a spindle-shaped cirrus. The
> life-cycle of the parasite was studied under experimental conditions.
> The metacestodes were commonly located under the chlorogogenous tissue
> of the intestine of the earthworms Eisenia foetida(Savigny),
> Dendrobaena octaedra (Savigny) and E. nordenskioldi(Eisen), and in the
> wall of the intestine of the enchytraeid Briodrilus arcticus(Bell). The
> metacestodes exhibit a pattern of postembryonal development typical for
> the cysticercoid modification termed an 'ovoid diplocyst'.
> The NLP tool results in:
> A. belopolskajae
> A. clavata
> A. schilleri
> Aploparaksis demshini
> Dendrobaena octaedra
> Scolopax rusticola
> Note that it missed Briodrilus arcticus(Bell), Eisenia
> foetida(Savigny), and E. nordenskioldi(Eisen) -- I guess because of the
> missing space between name and authority -- and also Gallinago (in the
> abstract as "Gallinago spp."). I wonder whether it can also figure out
> that A. clavata is actually Aploparaksis clavata?
> Another example is
> Here, NLP picks up P. juxtanucleare and Plasmodium dominicana, but
> misses Diptera: Culicidae: Culicinae [written in the oft-used format
> (Diptera: Culicidae: Culicinae)], and Galliformes.
> So, an impressive start, but I guess this problem will need more work.
> One other comment -- the site displays the OSI logo, but the software
> doesn't seem to be available (apart from a link on NLP Analysis - File
> upload window which is to a fragment of PHP code).
> I don't mean these comments to be negative, I think this is very timely
> On 7 Jun 2005, at 20:37, Drew Koning wrote:
>> In conjunction with the NSF, I've written a web-based NLP solution to
>> extract taxonomic names from text. I would greatly appreciate any
>> your community can provide.
>> This tool was written under a National Science Foundation Grant:
>> "Collaborative Research: Development of new digital library
>> applications in
>> the context of a basic ontology for biosystematics information using
>> literature of entomology"
>> Drew Koning
>> +1 212.496.3569
>> Informatics - American Museum of Natural History
>> Central Park West @ 79th Street
>> New York, NY 10024
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page at bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
More information about the Taxacom