[Taxacom] Machine-tagging Flickr images with taxonomic names
Ken-ichi Ueda
kueda at ischool.berkeley.edu
Thu Feb 7 02:30:19 CST 2008
(Sent from the wrong address, sorry about the dupes, Patrick and Roderick)
I've been working with the Catalogue of Life 2007 data
(http://www.catalogueoflife.org), so I figured I'd run a query to see
how many name collisions came up when you concatenate binomials.
Assuming I did this correctly, there are actually very few collisions
within the CoL data for binomials. The only ones I found seem to be
typos, mostly involving whitespace. The CoL data certainly aren't
comprehensive at the species level, but they do have 978,880 distinct,
accepted names.
Here what I did, and the result (let me know if I screwed up, sorry
about the formatting):
mysql> SELECT
-> LOWER(REPLACE(name, ' ', '')) as tag,
-> COUNT(name) as count,
-> GROUP_CONCAT(name SEPARATOR ', ') as names
-> FROM
-> (SELECT DISTINCT name FROM col_taxa WHERE taxon='Species' AND
is_accepted_name=1) as s
-> GROUP BY tag
-> HAVING count > 1
-> ORDER BY count DESC;
+--------------------------+-------+-------------------------------------------------------+
| tag | count | names
|
+--------------------------+-------+-------------------------------------------------------+
| apomecynaflavomarmorata | 2 | Apomecyna flavomarmorata,
Apomecyna flavomarmorata |
| astathesholorufa | 2 | Astathes holorufa, Astathes
holorufa |
| curculiosaltatoralni | 2 | Curculio saltator alni, Curculio
saltatoralni |
| curculiosaltatorsalicis | 2 | Curculio saltator salicis,
Curculio saltatorsalicis |
| curculiosaltatorulmi | 2 | Curculio saltator ulmi, Curculio
saltatorulmi |
| dymasiusangustatus | 2 | Dymasius angustatus, Dymasius
angustatus |
| jordanoleiopusgardneri | 2 | Jordanoleiopus gardneri,
Jordanoleiopus gardneri |
| merionoedatosawai | 2 | Merionoeda tosawai, Merionoeda
tosawai |
| mesosaindica | 2 | Mesosa indica, Mesosa indica
|
| mimozotaleminuta | 2 | Mimozotale minuta, Mimozotale
minuta |
| mimozotaletrivittata | 2 | Mimozotale trivittata,
Mimozotale trivittata |
| mispilacoomani | 2 | Mispila coomani, Mispila coomani
|
| monochamusspectabilis | 2 | Monochamus spectabilis,
Monochamus spectabilis |
| oplatoceraoberthuri | 2 | Oplatocera oberthuri, Oplatocera
oberthuri |
| plagithmysusswezeyi | 2 | Plagithmysus swezeyi,
Plagithmysus swezeyi |
| prosopoceralepesmei | 2 | Prosopocera lepesmei,
Prosopocera lepesmei |
| pterolophiaalbosignata | 2 | Pterolophia albosignata,
Pterolophia albosignata |
| pterolophiamediomaculata | 2 | Pterolophia mediomaculata,
Pterolophia mediomaculata |
| pterolophiapedongana | 2 | Pterolophia pedongana,
Pterolophia pedongana |
| pterolophiayunnanensis | 2 | Pterolophia yunnanensis,
Pterolophia yunnanensis |
| xoanoderavitticollis | 2 | Xoanodera vitticollis, Xoanodera
vitticollis |
+--------------------------+-------+-------------------------------------------------------+
21 rows in set (3 min 29.94 sec)
-Ken-ichi
On Feb 6, 2008 12:21 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:
> Flickr allows you to tag photos with tags that include spaces, but
> collapses them when saving the tag. You can still retrieve the photo
> with the tag that includes white space. For example
>
> http://www.flickr.com/photos/tags/diomedeaexulans/
>
> and
>
> http://www.flickr.com/photos/tags/diomede%20aexulans/
>
> retrieve albatross photos. I can insert spaces pretty much wherever I
> like and still get the same pictures. Hence, users could still
> recover pictures using the original binomial tag (i.e., the species
> name).
>
> It would be interesting to know how many collisions this might cause.
> In other words, how many times does deleting the white space from two
> different binomials result in the same text string? Sounds like
> something uBio could answer very quickly. If the answer is "not
> many", then I don't see a problem with people simply tagging photos
> with regular tags (and/or bionomials as machine tags).
>
> Regards
>
> Rod
>
>
>
>
>
> On 6 Feb 2008, at 00:10, Andy Mabbett wrote:
>
> > In message
> > <1a9849d0802051347w766bf234n7c9de04ea1b5ee00 at mail.gmail.com>,
> > Ken-ichi <kenichi.ueda at gmail.com> writes
> >
> >> This works:
> >>
> >> taxonomy:binomial=Alcedo_atthis
> >>
> >> (see http://flickr.com/photos/ken-ichi/2240715004/)
> >
> > Thank you, but that too gets collapsed, as "helvellalacunosa"; try
> > selecting the tag's link and you'll get:
> >
> > <http://flickr.com/photos/ken-ichi/tags/taxonomy%3Abinomial%
> > 3Dhelvellalacunosa/>
> >
> > Nice picture, BTW!
> >
> > Incidentally, several people kindly made the same suggestion, but by
> > writing to me directly. Perhaps the list can be reconfigured to
> > default
> > to replying to the group; or people could just double check before
> > sending!
> >
> >> I'd also be interested to hear about any conventions or emerging
> >> conventions on taxonomic tagging in folksonomies like Flickr. I tend
> >> to just tag with genus and the binomial on Flickr, but some
> >> machine tag
> >> standard would definitely add a lot of value to Flickr as a
> >> biodiversity informatics resource.
> >
> > I've also tagged my image with:
> >
> > taxonomy:genus=Alcedo
> >
> > and:
> >
> > taxonomy:specific=atthis
> >
> > but I suppose others might use:
> >
> > taxonomy:epithet=atthis
> >
> > and even "binominal" instead of "binomial",
> >
> >
> > Oh why do we make things so complicated!
> >
> > --
> > Andy Mabbett
> >
> > * Are you using Microformats, yet: <http://
> > microformats.org/> ?
> >
> > _______________________________________________
> > Taxacom mailing list
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
>
> ----------------------------------------
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page at bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> iChat: aim://rodpage1962
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
> Find out what we know about a species: http://ispecies.org
> Rod's rants on phyloinformatics: http://iphylo.blogspot.com
> Rod's rants on ants: http://semant.blogspot.com
>
>
>
>
> _______________________________________________
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
More information about the Taxacom
mailing list