[Taxacom] new GBIF dataportal

Shorthouse, David dps1 at ualberta.ca
Sat Aug 4 19:04:53 CDT 2007

> David has captured most of my thoughts on this issue -- which I've spent a
> lot of time thinking about lately.  The basic issue is a trade-off between
> data accuracy, and data accessibility.  These two things tend to work
> against each other:  Do I expose all my data now, so that people can
> the parts they need?  Or do I wait until I've verified all the content?
> it's too messy when released, it quickly gets branded as garbage and
> useless, and it's very difficult to overcome such a branding once applied,
> even when the data are later cleaned up.  If you wait to make sure it's
> clean before releasing, then it can remain inaccessible for years.

After having sent my note & reading Rich's response, I realized now that I
should have stressed that it's not just MZNA that serves questionable data.
In no way did I mean to brand MZNA. Rather, it was one small example. ALL
providers are currently serving erroneous data to GBIF to some extent.
Though contentious, another approach may be for GBIF to produce a
dynamically-generated table of providers that contains a column of clean vs.
dirty records. If done well, this could act as a tool to promote healthy
competition among providers. Funding agencies would of course be able to see
the problem in black & white.

Yet another approach is to build some real-time functions into the data
entry procedure rather than mopping up the big messes after the fact. This
requires an immense amount of infrastructure at large scales, but is
perfectly feasible at small scales (e.g.

David P. Shorthouse
Department of Biological Sciences
CW-403, Biological Sciences Centre
University of Alberta
Edmonton, AB   T6G 2E9
mailto:dps1 at ualberta.ca

More information about the Taxacom mailing list