[Taxacom] new GBIF dataportal

Rob Guralnick Robert.Guralnick at colorado.edu
Sun Aug 5 16:15:09 CDT 2007

> After having sent my note & reading Rich's response, I realized now that I
> should have stressed that it's not just MZNA that serves questionable data.
> In no way did I mean to brand MZNA. Rather, it was one small example. ALL
> providers are currently serving erroneous data to GBIF to some extent.
> Though contentious, another approach may be for GBIF to produce a
> dynamically-generated table of providers that contains a column of clean vs.
> dirty records. If done well, this could act as a tool to promote healthy
> competition among providers. Funding agencies would of course be able to see
> the problem in black & white.
> Yet another approach is to build some real-time functions into the data
> entry procedure rather than mopping up the big messes after the fact. This
> requires an immense amount of infrastructure at large scales, but is
> perfectly feasible at small scales (e.g.
> http://ispiders.blogspot.com/2007/06/digir-for-collectors.html).
    Hi all ---

    Since I own the word product on this recently published Open Access 
paper in /Ecology Letters/ 
and since it seems very relevant to the discussion, I am going to 
verbatim quote the relevant section of that paper:

/Taxonomic Data Quality Assessment and Validation/
The more difficult problem of misidentified taxa is being addressed by 
two linked endeavors that will help to mitigate the problem. First, 
digitization and automated markup of all taxonomic literature to 
delineate important taxonomic, anatomical, locality, and other 
information in original descriptions and revisions of a taxon have been 
proven in concept (Koning et al. 2005). The major natural history museum 
libraries are collaborating to digitize their works through the 
Biodiversity Heritage Library Project (http://www.bhl.si.edu). Once this 
process is underway, species-occurrence records in biodiversity data 
portals can be linked to taxon names in the literature so that data 
stewards and users can better estimate specimen identification accuracy.
Second, an important but still nascent next step will be to improve 
dissemination of corrected identifications. Taxonomic experts often 
check and correct misidentifications in collections, but traditional 
practice has not been conducive to reporting these corrections to the 
community at large. This is the reason that, ideally, GBIF data 
providers themselves maintain the occurrence data that they share with 
the network, and as needed update the names applied to those records 
with an annotation as to the recency of update.  The taxonomic community 
also acknowledges that some groups are more taxonomically difficult than 
others. This is metadata that should be linked to species-occurrence 
records so that potentially naïve end-users would be able to determine 
which groups have well vetted taxonomies and highly accurate specimen 
identifications. This solution is preferable to withholding data for 
problematic groups, because identifications at a more inclusive 
taxonomic rank (e.g. genus or family) may be accurate, even though 
included species-level identification(s) is/are not.

full reference:   Guralnick, R. P., A. W. Hill and M. Lane.  2007.  
Towards a collaborative, global infrastructure for biodiversity 
assessment.  Ecology Letters 10(8):663-672 [doi: 

Note, the full paper provides one view of how the field of biodiversity 
informatics and data portals may continue to grow in the near future. 
I'd be interested to hear any thoughts on the "vision" laid out there.

Best regards,
Rob Guralnick
Assoc. Professor and Curator
Dept. of Ecol. and Evol. Biology
University of Colorado Museum of Natural History
University of Colorado Boulder
Boulder, CO 80309-0265

More information about the Taxacom mailing list