[Taxacom] HHDB: hemihomonyms

Pat LaFollette pat at lafollette.com
Tue Jan 19 02:39:28 CST 2010

>>>While at it, the works might also be coded
>>>by discipline: entomology, ornithology, malacology, etc.
>>If you can explain a librarian the difference between malacology and
>>entomology... this will be a funny job...
>Furthermore, this may have unintended consequences. For the 
>terrestrial biota it is not uncommon to have plant and animal names 
>in the same article, even if the journal is, say, zoological. In my 
>efforts to extract articles from BHL 
>(<http://biostor.org/>http://biostor.org/ ) I've come across 
>numerous articles on insects that are full of the names of host plants.
I agree that it is important avoid unintended consequences, and it is 
true that animal names are occasionally mentioned in botanical 
journals, and plant names in zoological journals.  I think coding 
journals as botanical or zoological as an aid in resolving 
hemihomonyms would have few adverse results, however.  Only 
hemihomonyms would be affected.  If the name of a host plant 
mentioned in an entomological article is not a hemihomonym, it would 
be indexed as a plant.  The few errors would be trivial compared to 
those resulting from, for example, OCR errors.

If the works on BHL were additionally coded so that searches could be 
restricted by discipline, search results would be much better 
focused.  As an example, looking at just the first 20 titles returned 
by a recent search for some marine molluscan genera, 7 titles were 
botanical, 4 entomological, 1 zoophytes, and 1 protozoa.  The 
remaining 7 titles were on topic, 1 molluscan, 1 paleontological, and 
5 mixed content journals with molluscan articles.  The 13 off-topic 
titles were retrieved for a diversity of reasons, mostly beyond 
anyone's control: hemihomonyms, homonyms (junior and senior), species 
epithets matching generic names (Cordia myxa for Myxa, etc.), 
line-wrap hyphenation (Oscilla-toriaceae for Oscilla, etc.), 
placenames, descriptive terms, and OCR errors.

Hopefully, OCR errors will be reduced in time, and some text 
filtering could help with line-wrap hyphenation, but the only path I 
can see to dramatically improve search results is discipline coding 
of works.  Not every title needs be coded to improve search results, 
just the obvious ones: Fauna coleopterorum helvetica, Flora of 
tropical Africa, List of the shells of South America, etc.  Anyone 
with biological training and language skills could code the majority 
of items in BHL by just scanning the title.


>Roderic Page
>Professor of Taxonomy
>Graham Kerr Building
>University of Glasgow
>Glasgow G12 8QQ, UK
>Email: <mailto:r.page at bio.gla.ac.uk>r.page at bio.gla.ac.uk
>Tel: +44 141 330 4778
>Fax: +44 141 330 2792
>AIM: <mailto:rodpage1962 at aim.com>rodpage1962 at aim.com
>Twitter: <http://twitter.com/rdmpage>http://twitter.com/rdmpage
>Blog: <http://iphylo.blogspot.com>http://iphylo.blogspot.com
>Home page: 

Patrick I LaFollette
Research Associate in Malacology
Natural History Museum of Los Angeles County
pat at lafollette.com 

More information about the Taxacom mailing list