[Taxacom] PS: saturday morning fun
p.kirk at cabi.org
Mon Nov 29 16:06:01 CST 2010
I tried this for Drosophila but all I got was flies when, in fact, it's also a genus of mushrooms :-)
From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Stephen Thorpe
Sent: Mon 29/11/2010 21:58
To: David Remsen (GBIF)
Cc: taxacom at mailman.nhm.ku.edu
Subject: [Taxacom] PS: saturday morning fun
PS: you see the bit where I become suspicious and start doubting your apparently
pure motives is this: it would be very technically easy and fully consistent
with your stated aims and motivations to simply put on each of your taxon pages
a link to the corresponding Wikispecies page (for example, for Mimus,
http://species.wikimedia.org/wiki/Mimus, and note that all Wikispecies pages
have this simple URL structure http://species.wikimedia.org/wiki/NAME_OF_TAXON),
perhaps saying something like "here you may find useful data on this taxon,
though being open edit, we cannot vouch for its accuracy". But this would
require (1) genuinely pure motives on your part; and (2) a grasp of the
difference between theory and reality (i.e., in theory Wikispecies is unreliable
and a pointless waste of your time, but in *reality* ...)
From: David Remsen (GBIF) <dremsen at gbif.org>
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
Sent: Tue, 30 November, 2010 10:30:05 AM
Subject: Re: [Taxacom] saturday morning fun
Thanks for the summary. I'd be interested to hear what various Catalogue of
Life providers think of all this. I know some taxonomic sectors, like the
Lepidoptera, derived from LepIndex NHM-London, have not been thoroughly
reviewed, falling into your 'raw' category.
You hit the nail on the head when you say it provides you with a starting point.
We use it as a starting point too. We could forego this and simply leave the
raw data as it is but it seemed an improvement to go with it. We are trying to
expand the capacity to access other, perhaps more comprehensive or refined
sources, should they be offered or available. At the moment, that starting
place is the one of the few places we can go. Of course, flaking together
disparate sets of even high quality data introduces additional complications but
I'd be happy to take them on.
I'm sure we (at least I) have not fully grasped all the ramifications of this.
Ive tried to relay some of the complexities and a rationale behind what we are
faced with and do. I failed to mention the constraints we are under to improve
the issues raised this weekend. Until very recently we have had 2.5 programmers
working on the entirety of our infrastructure with nearly no resources for the
portal to fix these problems. This will change in 2011.
On Nov 29, 2010, at 9:47 PM, Stephen Thorpe wrote:
You mention some key issues here. Let me focus on just one of them for the
moment, namely COL and its suitability as a data provider for GBIF. I suspect
that GBIF have basically just thought something like "well, COL is an
aggregation of trusted specialist databases in a form that GBIF can use" - but
the reality is *way* more complex. For me, when starting to compile a
Wikispecies page, I will often use COL as a *starting point only*, actually
little more than a convenient way of getting big lists of taxa formatted and put
on Wikispecies pages for further scrutiny. Sometimes, the COL data is so
obviously worse than useless, that I don't use it at all, not *even* as a
starting point. The data providers from COL vary widely in nature. Some of them
are near complete for their group, others are highly fragmentary. Some are
*very* raw, others are quite well polished. Sometimes, there are problems in the
way that COL interprets the data from sources, so all sorts of synonyms get
interpreted as valid, etc. Another issue, which I don't fully understand yet,
and I could perhaps be mistaken (???), is that even in COL 2010, much of the
data seems to have been harvested in 2008 ... I would have thought that COL 2010
would have harvested its data in 2010. If not, then COL is running a couple of
years behind its own data providers, who will typically not be completely
up-to-date either. So, in summary, I would say that COL is nothing more than a
convenient *starting point* for building solid biodiversity data, and it
requires a fair amount of careful and informed interpretation, not to mention a
great deal of manual work to improve on it. I'm not sure that GBIF has fully
grasped this? For example, in COL, the family Scarabaeidae is actually what
would almost universally be called the subfamily Scarabaeinae of the family
Scarabaeidae, and this is not at all obvious. So, COL is actually quite good if
you want data on Scarabaeinae, but completely lacking in any data whatsoever on
the *huge* scarabaeid subfamilies Melolothinae and Rutelinae.
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
The Taxacom archive going back to 1992 may be searched with either of these methods:
(1) http://taxacom.markmail.org <http://taxacom.markmail.org/>
Or (2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
The information contained in this e-mail and any files transmitted with it is confidential and is for the exclusive use of the intended recipient. If you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is prohibited.
Whilst CAB International trading as CABI takes steps to prevent the transmission of viruses via e-mail, we cannot guarantee that any e-mail or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions.
If you have received this communication in error, please notify us by e-mail at cabi at cabi.org or by telephone on +44 (0)1491 829199 and then delete the e-mail and any copies of it.
CABI is an International Organization recognised by the UK Government under Statutory Instrument 1982 No. 1071.
More information about the Taxacom