[Taxacom] Machine-tagging Flickr images with taxonomic names
D Patterson
dpatterson at mbl.edu
Thu Feb 7 08:00:54 CST 2008
Within EOL we'll create gap and punctuation free versions of names within reconciliation groups, thereby providing a context for disambiguation should any additional ambiguities arise. This system should become available for 3rd party indexing within 2008
D J Patterson
-----Original Message-----
From: "Roderic Page" <r.page at bio.gla.ac.uk>
To: "Ken-ichi" <kenichi.ueda at gmail.com>
Cc: taxacom at mailman.nhm.ku.edu; "Patrick Leary" <pleary at mbl.edu>
Sent: 2/7/08 6:12 AM
Subject: Re: [Taxacom] Machine-tagging Flickr images with taxonomic names
This is encouraging. I suspect a search of uBio might turn up some
genuine cases (if only because there are homonyms of species names).
Flicr's API documentation
(http://www.flickr.com/services/api/misc.tags.html ) makes clear that
they strip spaces and punctuation, but store the original tag as well
(see also
http://weblog.terrellrussell.com/2007/06/clean-and-store-your-raw-tags-
like-flickr/ ). The fact that names in the Catalogue of Life contain
additional whitespace is an argument for the utility of Flickr's
approach.
Regards
Rod
On 7 Feb 2008, at 08:25, Ken-ichi wrote:
> I've been working with the Catalogue of Life 2007 data
> (http://www.catalogueoflife.org), so I figured I'd run a query to see
> how many name collisions came up when you concatenate binomials.
> Assuming I did this correctly, there are actually very few collisions
> within the CoL data for binomials. The only ones I found seem to be
> typos, mostly involving whitespace. The CoL data certainly aren't
> comprehensive at the species level, but they do have 978,880 distinct,
> accepted names.
>
> Here what I did, and the result (let me know if I screwed up):
>
>
> mysql> SELECT
> -> LOWER(REPLACE(name, ' ', '')) as tag,
> -> COUNT(name) as count,
> -> GROUP_CONCAT(name SEPARATOR ', ') as names
> -> FROM
> -> (SELECT DISTINCT name FROM col_taxa WHERE taxon='Species' AND
> is_accepted_name=1) as s
> -> GROUP BY tag
> -> HAVING count > 1
> -> ORDER BY count DESC;
> +--------------------------+-------
> +-------------------------------------------------------+
> | tag | count | names
> |
> +--------------------------+-------
> +-------------------------------------------------------+
> | apomecynaflavomarmorata | 2 | Apomecyna flavomarmorata,
> Apomecyna flavomarmorata |
> | astathesholorufa | 2 | Astathes holorufa, Astathes
> holorufa |
> | curculiosaltatoralni | 2 | Curculio saltator alni, Curculio
> saltatoralni |
> | curculiosaltatorsalicis | 2 | Curculio saltator salicis,
> Curculio saltatorsalicis |
> | curculiosaltatorulmi | 2 | Curculio saltator ulmi, Curculio
> saltatorulmi |
> | dymasiusangustatus | 2 | Dymasius angustatus, Dymasius
> angustatus |
> | jordanoleiopusgardneri | 2 | Jordanoleiopus gardneri,
> Jordanoleiopus gardneri |
> | merionoedatosawai | 2 | Merionoeda tosawai, Merionoeda
> tosawai |
> | mesosaindica | 2 | Mesosa indica, Mesosa indica
> |
> | mimozotaleminuta | 2 | Mimozotale minuta, Mimozotale
> minuta |
More information about the Taxacom
mailing list