[Taxacom] saturday morning fun
jim.croft at gmail.com
Sun Nov 28 18:04:59 CST 2010
On Mon, Nov 29, 2010 at 4:28 AM, David Remsen (GBIF) <dremsen at gbif.org> wrote:
> While Stephen laments "
> 'official' databases like GBIF ... feeding us sh!t" we should
> collectively recall the old saw that when you point a finger at
> someone remember there are three pointing back at you.
To be fair, the only reason GBIF is 'feeding us shit' is because
'shit' is what we gave them.
As I said off list in a couple of places, triumphfully pointing out
errors in biodiversity databases is a bit of a cheap shot and not
particularly helpful. Who among us can honestly say we have not
created or perpetuated some of the aforementioned 'shit'?
Really looking forward to the day when systems are in place that will
enable correction rather than complaint.
> 1. There is no "GBIF" data and GBIF is not a database.
Ah ha! As we suspected all along... GBIF is a state of mind...
> It tells me that even Lynn Margulis
> would be surprised at over 9000 distinct Kingdom values for these
How the hell does that happen? Even assuming deliberate sabotage of
the data, there seems to be an extra couple of zeros on the number.
> "Phantom" genera appear where the same genus is listed in different
> higher taxa according to different sources. Genera appear listed as
> Families or Orders. How should we know what are legit? Download
> the complete and official list of genera. Or the one of families or
> orders? Where are they? Do we teach our programmes the higher taxa
> suffixes and let them sort it out? I say "Hah," sirs, simply "Hah!"
Humans... We have found that certain strains of humans are really good
at sorting out this sort of stuff. :)
> So, while your Saturday morning fun appears to make what we do seem
> sort of senseless and value-less, my Sunday evening response is not
> based on that view.
Yes. It is not meaningless as senseless. It is problematic. There is a
> You are seeing a processed version of what we
> see that originates in the big dirty messy world. I think GBIF
> (Copenhagen) is remiss in that we have not properly put the noise in
> it's proper proportion and done a better job of faceting and sorting
> the majority "good" data from the bad. We work in a shifting
> landscape of noisy stakeholders all shouting that everything is a
> priority. I'm sorry we still haven't gotten this straightened out
> but we hope to have a better solution very soon.
With 269 million records we expect a bit of noise and it it is good to
see where the noise is and how much there is of it. Dealing with the
noise and turning it into signal (or trashing it it) is the business
The c. 50% noise may seem a little frightening. But this is at the
unique name level. At the occurrence level it will be much, much less
(I hope!). It would be interesting to know what proportion of these
records at the occurrence level would go straight through to the
keeper vs those requiring attention.
Thanks for pulling this together Dave; still to digest the stats...
The bottom line is that the problems are ours, not GBIF's. GBIF is
just showing where they are. We can fix them, or, we can shoot the
jim (resisting biological imperative to beat the crap out of the
bearer of bad news)
Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
'A civilized society is one which tolerates eccentricity to the point
of doubtful sanity.'
- Robert Frost, poet (1874-1963)
Please send URIs, not attachments:
More information about the Taxacom