[Taxacom] Machine-tagging Flickr images with taxonomic names

D Patterson dpatterson at mbl.edu
Thu Feb 7 08:00:54 CST 2008


Within EOL we'll create gap and punctuation free versions of names within reconciliation groups, thereby providing a context  for disambiguation should any additional ambiguities arise.  This system should become available for 3rd party indexing within 2008

D J Patterson


-----Original Message-----
From: "Roderic Page" <r.page at bio.gla.ac.uk>
To: "Ken-ichi" <kenichi.ueda at gmail.com>
Cc: taxacom at mailman.nhm.ku.edu; "Patrick Leary" <pleary at mbl.edu>
Sent: 2/7/08 6:12 AM
Subject: Re: [Taxacom] Machine-tagging Flickr images with taxonomic names

This is encouraging. I suspect a search of uBio might turn up some  
genuine cases (if only because there are homonyms of species names).

Flicr's API documentation  
(http://www.flickr.com/services/api/misc.tags.html ) makes clear that  
they strip spaces and punctuation, but store the original tag as well  
(see also  
http://weblog.terrellrussell.com/2007/06/clean-and-store-your-raw-tags- 
like-flickr/ ). The fact that names in the Catalogue of Life contain  
additional whitespace is an argument for the utility of Flickr's  
approach.

Regards

Rod


On 7 Feb 2008, at 08:25, Ken-ichi wrote:

> I've been working with the Catalogue of Life 2007 data
> (http://www.catalogueoflife.org), so I figured I'd run a query to see
> how many name collisions came up when you concatenate binomials.
> Assuming I did this correctly, there are actually very few collisions
> within the CoL data for binomials.  The only ones I found seem to be
> typos, mostly involving whitespace.  The CoL data certainly aren't
> comprehensive at the species level, but they do have 978,880 distinct,
> accepted names.
>
> Here what I did, and the result (let me know if I screwed up):
>
>
> mysql> SELECT
>     ->   LOWER(REPLACE(name, ' ', '')) as tag,
>     ->   COUNT(name) as count,
>     ->   GROUP_CONCAT(name SEPARATOR ', ') as names
>     -> FROM
>     ->   (SELECT DISTINCT name FROM col_taxa WHERE taxon='Species' AND
> is_accepted_name=1) as s
>     -> GROUP BY tag
>     -> HAVING count > 1
>     -> ORDER BY count DESC;
> +--------------------------+------- 
> +-------------------------------------------------------+
> | tag                      | count | names
>                     |
> +--------------------------+------- 
> +-------------------------------------------------------+
> | apomecynaflavomarmorata  |     2 | Apomecyna flavomarmorata,
> Apomecyna  flavomarmorata   |
> | astathesholorufa         |     2 | Astathes  holorufa, Astathes
> holorufa                 |
> | curculiosaltatoralni     |     2 | Curculio saltator alni, Curculio
> saltatoralni         |
> | curculiosaltatorsalicis  |     2 | Curculio saltator salicis,
> Curculio saltatorsalicis   |
> | curculiosaltatorulmi     |     2 | Curculio saltator ulmi, Curculio
> saltatorulmi         |
> | dymasiusangustatus       |     2 | Dymasius  angustatus, Dymasius
> angustatus             |
> | jordanoleiopusgardneri   |     2 | Jordanoleiopus  gardneri,
> Jordanoleiopus gardneri     |
> | merionoedatosawai        |     2 | Merionoeda tosawai, Merionoeda
> tosawai               |
> | mesosaindica             |     2 | Mesosa indica, Mesosa  indica
>                     |
> | mimozotaleminuta         |     2 | Mimozotale  minuta, Mimozotale
> minuta                 |





More information about the Taxacom mailing list