[Taxacom] saturday morning fun

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Sun Nov 28 18:56:56 CST 2010

>The bottom line is that the problems are ours, not GBIF's. GBIF is just showing 
>where they are

Not so fast, Crofty! 
GBIF pretends to be better than it is - that is the real problem. Have you ever 
tried to actually use GBIF? Much of it is gibberish, for example, a search for 
Coleoptera is here:

* it would be nice if the common name beetle was listed in the common names 
section! Not even on the page for the order Coleoptera 
http://data.gbif.org/species/13141193/ (and here, wtf are "type specimens" of 
Coleoptera, and why are hypotypes [=A specimen of a species, which, though not a 
member of the original type series, is known from a published description or 
listing] relevant here??? So hypotypes are type specimens that are not type 

* family and genus Coleoptera within the order - wtf!

* the data is just sooo fragmentary and inconsistent, compare 
http://data.gbif.org/species/browse/taxon/13148844 with 
http://species.wikimedia.org/wiki/Jacobsoniidae, particularly noting that only 
Wikispecies enables the user to find good references to verify the data

the nature of things is such that GBIF will never be anywhere near complete and 
consistent if it continues to try to aggregate "bits and bobs" - the job isn't 
as simple as assembling a jigsaw puzzle from its pieces, but more like having 
the same picture cut up into many different jigsaws, and all the pieces mixed 
together in one bag ...


From: Jim Croft <jim.croft at gmail.com>
To: David Remsen (GBIF) <dremsen at gbif.org>
Cc: "taxacom at mailman.nhm.ku.edu" <Taxacom at mailman.nhm.ku.edu>; Tim Robertson 
<trobertson at gbif.org>
Sent: Mon, 29 November, 2010 1:04:59 PM
Subject: Re: [Taxacom] saturday morning fun

On Mon, Nov 29, 2010 at 4:28 AM, David Remsen (GBIF) <dremsen at gbif.org> wrote:
>  While Stephen laments "
> 'official' databases like GBIF ... feeding us sh!t" we should
> collectively recall the old saw that when you point a finger at
> someone remember there are three pointing back at you.

To be fair, the only reason GBIF is 'feeding us shit' is because
'shit' is what we gave them.

As I said off list in a couple of places, triumphfully pointing out
errors in biodiversity databases is a bit of a cheap shot and not
particularly helpful. Who among us can honestly say we have not
created or perpetuated some of the aforementioned 'shit'?

Really looking forward to the day when systems are in place that will
enable correction rather than complaint.

> 1.  There is no "GBIF" data and GBIF is not a database.

Ah ha! As we suspected all along... GBIF is a state of mind...

>  It tells me that even Lynn Margulis
> would be surprised at over 9000 distinct Kingdom values for these
> data

How the hell does that happen?  Even assuming deliberate sabotage of
the data, there seems to be an extra couple of zeros on the number.

> "Phantom" genera appear where the same genus is listed in different
> higher taxa according to different sources.   Genera appear listed as
> Families or Orders.   How should we know what are legit?   Download
> the complete and official list of genera. Or the one of families or
> orders?  Where are they?   Do we teach our programmes the higher taxa
> suffixes and let them sort it out?  I say "Hah," sirs, simply "Hah!"

Humans... We have found that certain strains of humans are really good
at sorting out this sort of stuff.  :)

> So, while your Saturday morning fun appears to make what we do seem
> sort of senseless and value-less, my Sunday evening response is not
> based on that view.

Yes. It is not meaningless as senseless. It is problematic. There is a

> You are seeing a processed version of what we
> see that originates in the big dirty messy world.   I think GBIF
> (Copenhagen) is remiss in that we have not properly put the noise in
> it's proper proportion and done a better job of faceting and sorting
> the majority "good" data from the bad.   We work in a shifting
> landscape of noisy stakeholders all shouting that everything is a
> priority.   I'm sorry we still haven't gotten this straightened out
> but we hope to have a better solution very soon.

With 269 million records we expect a bit of noise and it it is good to
see where the noise is and how much there is of it. Dealing with the
noise and turning it into signal (or trashing it it) is the business
of taxonomy.

The c. 50% noise may seem a little frightening. But this is at the
unique name level.  At the occurrence level it will be much, much less
(I hope!). It would be interesting to know what proportion of these
records at the occurrence level would go straight through to the
keeper vs those requiring attention.

Thanks for pulling this together Dave; still to digest the stats...

The bottom line is that the problems are ours, not GBIF's. GBIF is
just showing where they are.  We can fix them, or, we can shoot the

jim (resisting biological imperative to beat the crap out of the
bearer of bad news)

Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
'A civilized society is one which tolerates eccentricity to the point
of doubtful sanity.'
 - Robert Frost, poet (1874-1963)

Please send URIs, not attachments:


Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom archive going back to 1992 may be searched with either of these 

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  
your search terms here


More information about the Taxacom mailing list