[Taxacom] new GBIF dataportal
Robert.Guralnick at colorado.edu
Sun Aug 5 16:15:09 CDT 2007
> After having sent my note & reading Rich's response, I realized now that I
> should have stressed that it's not just MZNA that serves questionable data.
> In no way did I mean to brand MZNA. Rather, it was one small example. ALL
> providers are currently serving erroneous data to GBIF to some extent.
> Though contentious, another approach may be for GBIF to produce a
> dynamically-generated table of providers that contains a column of clean vs.
> dirty records. If done well, this could act as a tool to promote healthy
> competition among providers. Funding agencies would of course be able to see
> the problem in black & white.
> Yet another approach is to build some real-time functions into the data
> entry procedure rather than mopping up the big messes after the fact. This
> requires an immense amount of infrastructure at large scales, but is
> perfectly feasible at small scales (e.g.
Hi all ---
Since I own the word product on this recently published Open Access
paper in /Ecology Letters/
and since it seems very relevant to the discussion, I am going to
verbatim quote the relevant section of that paper:
METHODS FOR INCREASING BIODIVERSITY DATA QUALITY
/Taxonomic Data Quality Assessment and Validation/
The more difficult problem of misidentified taxa is being addressed by
two linked endeavors that will help to mitigate the problem. First,
digitization and automated markup of all taxonomic literature to
delineate important taxonomic, anatomical, locality, and other
information in original descriptions and revisions of a taxon have been
proven in concept (Koning et al. 2005). The major natural history museum
libraries are collaborating to digitize their works through the
Biodiversity Heritage Library Project (http://www.bhl.si.edu). Once this
process is underway, species-occurrence records in biodiversity data
portals can be linked to taxon names in the literature so that data
stewards and users can better estimate specimen identification accuracy.
Second, an important but still nascent next step will be to improve
dissemination of corrected identifications. Taxonomic experts often
check and correct misidentifications in collections, but traditional
practice has not been conducive to reporting these corrections to the
community at large. This is the reason that, ideally, GBIF data
providers themselves maintain the occurrence data that they share with
the network, and as needed update the names applied to those records
with an annotation as to the recency of update. The taxonomic community
also acknowledges that some groups are more taxonomically difficult than
others. This is metadata that should be linked to species-occurrence
records so that potentially naïve end-users would be able to determine
which groups have well vetted taxonomies and highly accurate specimen
identifications. This solution is preferable to withholding data for
problematic groups, because identifications at a more inclusive
taxonomic rank (e.g. genus or family) may be accurate, even though
included species-level identification(s) is/are not.
full reference: Guralnick, R. P., A. W. Hill and M. Lane. 2007.
Towards a collaborative, global infrastructure for biodiversity
assessment. Ecology Letters 10(8):663-672 [doi:
Note, the full paper provides one view of how the field of biodiversity
informatics and data portals may continue to grow in the near future.
I'd be interested to hear any thoughts on the "vision" laid out there.
Assoc. Professor and Curator
Dept. of Ecol. and Evol. Biology
University of Colorado Museum of Natural History
University of Colorado Boulder
Boulder, CO 80309-0265
More information about the Taxacom