[Taxacom] Chameleons, GBIF, and the Red List
pleuronaia at gmail.com
Mon Aug 25 13:31:54 CDT 2014
Overall, I haven't had much occasion to use GBIF-type data. As far as I
can tell, it's a good compiler of data and useful for seeing what has been
reported. The virus/insect confusion might point to flaws in data
aggregation. I encountered a case (in another database) where a snail was
being misreported as a sponge; an error report yielded the reply to go
contact the source datasets. After some effort in finding how to get to
those, it turned out that the datasets were correct and the flaw was a
confusion between homonyms in the aggregator.
But the problem is not so much that GBIF is flawed. Rather, the overall
data model is flawed. Part of the reason that GBIF gets perceived as not
very useful is that searching for a taxon may yield dozens of hits,
but practically all are different versions of aggregators each citing no
more than a supposed taxonomic name, possibly correctly placed in higher
taxonomy. Having a good data aggregator is useful; having dozens of
varying quality is not.
Instead of the model being something like:
Expert | Expert
\ | /
Source ---> GBIF - Expert
/ | \
Expert | Expert
the model at present is often
Source ---> aggregation ----> non-expert thinks he's getting expert
No one wants to fund or take responsibility for data accuracy. Projects
are all about how many records are entered, and the less time you spend on
checking accuracy, the more records you can enter. There's not much
incentive for experts to volunteer to fix someone else's data, especially
when there's no funding to generate good data.
Freshwater mollusks are perhaps the most urgent conservation need out there
(in terms of percent imperilment of a fairly diverse group). But a quick
look at Unionidae on GBIF reveals several problems:
Nomenclatural: Wrong author and date for the family (should be Rafinesque,
1820); incorrect original spelling "Christadens" of the unnecessary
replacement name Cristadens is listed as an accepted genus... various
misspellings, gender agreement...
Some cited genera belong in other families. This may account for the fact
that it is mapped in South America, New Guinea, Australia, and New Zealand,
where only other families occur. It doesn't account for the oceanic dots
that have nothing similar to Unionidae present there. Conversely, unionids
occur almost anywhere with fairly reliable freshwater in North America and
Eurasia, so there are a lot of blanks where there shouldn't be (again,
Russia is poorly represented)
Poor updating: several faunas have received significant taxonomic attention
in the past decade or so, but the data don't reflect those well. Some of
the taxonomic assignments are a century or more out of date.
Of course, these all reflect problems in GBIF's input rather than in GBIF
itself as an aggregator. But it suggests that the percentage of inaccurate
data is quite high, enough to give misleading results if you treated it as
up to data and authoritative. Particularly as Unionidae have a high level
of attention as often being imperiled, the results don't lead to optimism
about less well-studied groups.
GBIF of itself is a useful tool, but the quality of data being aggregated
is too low to produce reliable results. Unless there's funding for
taxonomic expertise to review data before it is uploaded, the problem will
persist. Even when corrections are made, the error has already made it
into the flock of aggregators and thus continues to be available to other
On Mon, Aug 25, 2014 at 2:58 AM, Richard Pyle <deepreef at bishopmuseum.org>
> Hi Bob,
> > What's wrong with
> > sources > GBIF > expert > 'Checklist of fishes of the Northwestern
> > Islands'
> > (freely available online, readily update-able, compilers easily
> Because that's a very one-dimensional view of the data. It's not as if the
> only value of the occurrence records we cite is to fulfill our one narrow
> goal (species occurrence in one geographic area), so why should the "end
> game" for this information be such a specific synthesis?
> The model should be something like:
> Expert | Expert
> \ | /
> Source ---> GBIF - Expert
> / | \
> Expert | Expert
> [Not sure how that ASCII art will hold up through Taxacom....]
> Each expert gains something from the data (in our case, to help flesh out
> our checklist). The open question is how the value-added
> information/cleanup by the experts can trickle back to the source -- either
> directly, or via GBIF. I suspect the general picture Rod outlined may be
> what ends up happening
> > or
> > sources > GBIF > expert > sources (straightaway) *and* 'Checklist' > GBIF
> > (eventually)
> Yes -- that's closer to it, I think.
> > Please think first of users, not GBIF.
> When did I ever give you the impression that I thought anything other than
> > You admitted that you hadn't been following this thread closely, so:
> > (1) from a single source, like a museum, GBIF may have *less information*
> > than the source database. That might be because the source didn't use the
> > GBIF IPT, or because a Procrustean truncator like OZCAM/ALA sat in-
> > between, or because it's too hard for staff to finagle their database
> > into Darwin Core format. If the source database is or could be online,
> it's an
> > alternative and superior data source *for the same record*, and GBIF
> > link to it.
> > (2) the single-source record included in GBIF may be wrong. It's been
> > corrected (for taxonomy, or geography, or whatever) and is now available
> > upgraded form at another location on the Web. Until your 'sources > GBIF
> > expert > GBIF (eventually) > sources (ultimately)' pipeline completes,
> > should link to the other location.
> OK, thanks for clarifying. Agreed.
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> The Taxacom Archive back to 1992 may be searched at:
> Celebrating 27 years of Taxacom in 2014.
Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Boiling Springs NC 28017
More information about the Taxacom