[Taxacom] Wikipedia classification

Tony.Rees at csiro.au Tony.Rees at csiro.au
Tue Jun 30 21:38:13 CDT 2009


One thing I forgot. In the blog mentioned, Rod picks up that there is a problem with the Wikipedia style page URL e.g. http://en.wikipedia.org/wiki/Latreillia in the case of genus level homonyms (in this instance: Diptera or Brachyura?), where it is possible to go to the wrong instance, and does in this case, following  a link from the crab family page http://en.wikipedia.org/wiki/Latreilliidae . There are LOTS of these at genus level - in IRMNG my current count of non-unique genus names is currently 68730 and rising. So the problem is non-trivial; see for example http://www.marine.csiro.au/mirrorsearch/ir_search.go?searchtxt=Ceratium&hlevel=genus  or, perhaps, http://www.marine.csiro.au/mirrorsearch/ir_search.go?searchtxt=Wagneria&hlevel=genus (using David Remsen's favourite example).

The problem is less severe when the hierarchy is embedded as the URL, but then a set of different problems is engaged, as previously noted...

- Tony

-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Tony.Rees at csiro.au
Sent: Wednesday, 1 July 2009 11:55 AM
To: r.page at bio.gla.ac.uk; taxacom at mailman.nhm.ku.edu
Subject: [ExternalEmail] Re: [Taxacom] Wikipedia classification

Dear Rod,

This is really a follow-up to your comments as per the i-phylo blog http://tinyurl.com/lcxn2s - however I thought I would post it here rather than there as the issues are certainly relevant to this group.

Most of the problems you encounter with the wikipedia/wikispecies type of collation are the result of (1) uncontrolled data entry (anyone can enter anything they want, also change previously entered content ad hoc), (2) lack of a relational DB back end to enable any required linkages (such as true parent and child records and re-use of content already entered at a different level of the hierarchy), and (3) - in wikispecies at least - hard wiring the species pages to a taxonomic hierarchy in the form of the URL - which is therefore not a stable identifier e.g. if you want to shift a species page or concept around in the taxonomic hierarchy in the light of new information or a changed opinion, to support multiple alternative classifications, or simply to correct an error. Without addressing these issues I believe that you will never get a decent scalable and maintainable system (which is why databases get the usage they do :)  )

On the other hand I agree that (a) there is nevertheless a lot of valuable content in these community-driven sites that would be nice to be leveraged somehow, (b) many hands make light work i.e. can accomplish much more than a single individual or small group, and (c) community driven sites can be very responsive to new information as it is released.

Perhaps a hybrid approach is feasible - use a proper relational system to drive the taxonomic backbone but import relevant content from the wikiXXX world. However issue (1) above does not go away - the only way to deal with that is to have a moderating process so that content does not go live until it has been reviewed by some trusted party, and also there are structural issues with the free text approach versus a greater degree of atomised content and, as appropriate, controlled terms in at least a subset of the fields. Whether wikiXXX can be morphed into such a beast remains an open question, of course - probably not very likely in my view - systems like WoRMS (e.g. see http://www.marinespecies.org/about.php) are much closer to that ideal already, and thus will always have more authoritatative, structured and relational content, as well as buy-in from relevant experts to contribute.

Just some food for thought, possibly.

FYI if you want to see undesirable things that can happen when arguments develop over taxonomic issues in wikiXXX space, take a look at http://en.wikipedia.org/wiki/Talk:Sperm_Whale#catodon . It's enough to put you off contributing really - (I did elsewhere and decided to stop when things got too annoying).

- Tony


-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Roderic Page
Sent: Tuesday, 30 June 2009 2:46 AM
To: TAXACOM
Subject: [Taxacom] Wikipedia classification

As an exercise I attempted to recreate the Wikipedia classification of  
life by extracting information from all pages containing a "Taxobox" .  
I've posted a blog post about the results ( http://iphylo.blogspot.com/2009/06/wikipedia-taxonomy-good-bad-and-very.html 
   http://tinyurl.com/lcxn2s ). You can go directly to the very crude  
browser I knocked together to navigate the Wikipedia pages here: http://bioguid.info/demos/wikipedia/

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html







_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here




More information about the Taxacom mailing list