[Taxacom] new GBIF dataportal

Richard Pyle deepreef at bishopmuseum.org
Sat Aug 4 16:38:52 CDT 2007

David has captured most of my thoughts on this issue -- which I've spent a
lot of time thinking about lately.  The basic issue is a trade-off between
data accuracy, and data accessibility.  These two things tend to work
against each other:  Do I expose all my data now, so that people can access
the parts they need?  Or do I wait until I've verified all the content?  If
it's too messy when released, it quickly gets branded as garbage and
useless, and it's very difficult to overcome such a branding once applied,
even when the data are later cleaned up.  If you wait to make sure it's all
clean before releasing, then it can remain inaccessible for years.

One of the most promising solutions to the problem is what David outlined
below, which is to provide robust feedback tools so that consumers of the
data can very easily report suspected errors back to the providers, and the
providers can easily make corrections.  An extension of this approach is to
develop login proceedures to allow consumers to get in and correct the data
themselves (rather than just report the error and wait for over-worked data
managers to get around to it eventually).  With a reliable and comprehensive
logging/auditing system, I think this approach has great promise.

As we continue to develop a new prototype implementation of ZooBank, my
vision is to provide the best of both worlds.  The idea would be to expose
all the available data, and clearly distinguish the "verified" from the
"unverified" in an unambiguous, dichotomous way.  Obviously, careful thought
would need to be put into the criteria that constitute "verified", but I
believe this can be solved to the satisfaction of most.  But the key is to
provide consumers with an easy mechanism for contributing content and
corrections in a way that helps move data records from the "unverified" bin
to the "verified" bin.


> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu 
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of 
> Shorthouse, David
> Sent: Saturday, August 04, 2007 10:56 AM
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] new GBIF dataportal
> > But a more serious question:
> > GBIF is heading towards full operation. The new data portal 
> launched 
> > July
> 2nd
> > is a wonderful piece of programming work which gives us a vision of 
> > what
> we
> > can expect.
> > If only that massive lot of errors and wrong 
> interpretations could be 
> > avoided!
> > Wouldn't it be better if someone look at the data before 
> they are put 
> > in
> the
> > public domain?
> I feel your dismay & I suspect many share the same opinion. 
> But, these and other similar errors also appear in 
> peer-reviewed publications. The only difference now is the 
> errors are not closed-access & all the dirty laundry is being 
> aired. More individuals like yourself who can critically 
> examine the data necessarily leads to better data & 
> consequently, better use of the data. This means you and 
> others like you have to be actively involved in the process & 
> assume some of the responsibility. GBIF has options available 
> for users to directly contact the provider who may not 
> realize they are serving erroneous data. To provide feedback 
> to the University of Navarra's Museum of Zoology, see one 
> such erroneous record here:
> http://data.gbif.org/occurrences/21924/. They also have 
> publicly available event logs on providers such that anyone & 
> everyone can see nomenclatural, geocoding, etc. issues. Logs 
> for MZNA are accessible at:
> http://data.gbif.org/datasets/provider/185/logs/.
> That being said, GBIF could improve its portal by:
> 1. Facilitating communal vetting of data & make it abundantly 
> obvious that it are providers & not themselves who are 
> responsible for some of the garbage that slips through. 
> 2. Make these event logs for providers very apparent (email reports
> perhaps?) & make providers accountable. These event logs 
> really should cause providers sit up & take notice because 
> the dirty laundry is now being pointed out.
> 3. Flashy disclaimers to catch one's eye such that the end 
> user takes some responsibility prior to using acquired data 
> in occurrence algorithms, etc.
> My two cents,
> David P. Shorthouse
> ------------------------------------------------------
> Department of Biological Sciences
> CW-403, Biological Sciences Centre
> University of Alberta
> Edmonton, AB   T6G 2E9
> mailto:dps1 at ualberta.ca
> http://canadianarachnology.webhop.net
> http://arachnidforum.webhop.net
> http://www.spiderwebwatch.org
> ------------------------------------------------------
> _______________________________________________
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

More information about the Taxacom mailing list