[Taxacom] Chameleons, GBIF, and the Red List

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Sun Aug 24 18:43:13 CDT 2014


Adding yet another record (an annotation record) for Peripsocus maoricus doesn't in any obvious way help to solve the problem of the non-existent virus of the same name. What would help to solve it would be an annotation fixed to the actual offending record stating that there is no such virus. What's to bet it doesn't get fixed any time soon...

Stephen

--------------------------------------------
On Mon, 25/8/14, Roderic Page <Roderic.Page at glasgow.ac.uk> wrote:

 Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
 To: "Richard Pyle" <deepreef at bishopmuseum.org>
 Cc: "Bob Mesibov" <mesibov at southcom.com.au>, "quentin groom" <quentin at br.fgov.be>, "TAXACOM" <taxacom at mailman.nhm.ku.edu>
 Received: Monday, 25 August, 2014, 11:38 AM
 
 To pick up one thing Rich
 mentioned:
 
 
 There may be another solution, however, which
 is for GBIF to cache corrections submitted by people like
 you and other experts, such that these
 annotations/corrections can be made visible to all users of
 GBIF data; not just the source datasets.  Perhaps this
 feature already exists.  Perhaps the politics of
 implementing such a feature are too daunting to overcome.
 
 
 I think the
 idea of GBIF “caching corrections” is pretty much where
 we are heading, even if accidentally. Elsewhere I’ve
 discussed (and rejected) the idea of annotations as sticky
 notes on data bases http://iphylo.blogspot.co.uk/2014/04/more-on-annotating-biodiversity-data.html,
 and the “filtered push” model assumes, as Rich
 points out, that there’s somebody on the other end with
 the time to make use of all these annotations (see http://iphylo.blogspot.co.uk/2014/03/rethinking-annotating-biodiversity-data.html
 ).
 
 Given that GBIF has
 duplicate specimen records already (e.g., occurrences that
 come directly from the original museum, and also via other
 projects such as FishBase, or from DNA barcoding projects),
 we can think of these different records as being
 “annotations” (e.g., this occurrence is what the museum
 says about this specimen, this other occurrence is what BOLD
 or GenBank say). So, we could cluster these and end up
 knowing that this museum specimen is the voucher for these
 DNA sequences.
 
 If people
 have downloaded GBIF data, clean it, then send it back, we
 could simply cluster the new data with the old, and people
 can then see what has happened to the data when a user has
 scrutinised it.
 
 Regards
 
 Rod
 ---------------------------------------------------------
 Roderic Page
 Professor of
 Taxonomy
 Institute of Biodiversity, Animal
 Health and Comparative Medicine
 College of
 Medical, Veterinary and Life Sciences
 Graham
 Kerr Building
 University of Glasgow
 Glasgow G12 8QQ, UK
 
 Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
 Tel:  +44 141 330 4778
 Skype:  rdmpage
 Facebook:  http://www.facebook.com/rdmpage
 LinkedIn:  http://uk.linkedin.com/in/rdmpage
 Twitter:  http://twitter.com/rdmpage
 Blog:  http://iphylo.blogspot.com
 ORCID:  http://orcid.org/0000-0002-7101-9767
 Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
 
 
 On 24 Aug
 2014, at 23:05, Richard Pyle <deepreef at bishopmuseum.org<mailto:deepreef at bishopmuseum.org>>
 wrote:
 
 Bob,
 
 What you describe is EXACTLY
 what GBIF and others in the biodiversity informatics world
 are hoping to achieve.
 
 My
 previous post (reply to Stephen) covered these parts of the
 process:
 1) Data exist in thousands of
 databases around the world
 2) Aggregators
 like GBIF make our lives MUCH easier in helping us to
 discover those data
 3) We, the experts of
 the world, spend hours "cleaning" data after GBIF
 has so helpfully allowed us to locate it.
 
 What you're talking about
 is the next step:
 
 4) After
 we, the experts of the world, have spent hours
 "cleaning" the data, how do we allow those efforts
 to propagate back to the sources, so that the NEXT person
 who encounters those records through GBIF can benefit from
 the toils of us experts?
 
 There are two basic roadblocks to achieving
 this final step.
 
 First, as
 has been made ABUNDANTLY clear in this thread, the data do
 NOT belong to GBIF.  They belong to the hundreds
 (thousands?) of institutions around the world that manage
 those thousands of databases.  Ultimately, those
 corrections have to find their way back to the source
 databases, so that GBIF can re-index them with the
 corrections included.  And believe me, GBIF and others have
 tried to do this EXTENSIVELY -- for many years.  A lot of
 the mechanisms are being developed (e.g., FilteredPush), but
 so far there has been slow adoption of those mechanisms by
 the thousands of source databases.  There are many reasons
 for this, but I suspect the main reason is that institutions
 are barely keeping up day-to-day activities with
 ever-shrinking budgets, and simply do not have time or IT
 expertise to implement the corrections to the datasets that
 they manage. Thus, because the source data remain
 "unclean", the aggregated data in GBIF remains
 unclean.
 
 The second major
 roadblock is the lack of "proper" identifiers
 (globally unique, persistent, actionable) for these
 occurrence records.  The only way that corrections that you
 make in your downloaded copy of GBIF data is if you can
 report back on exactly which records need cleaning (along
 with the corrected information).  GBIF does assign its own
 locally unique identifier (integer), which could be used for
 this purpose -- but only for piping the data back to GBIF. 
 GBIF can relay the corrections back to the source databases,
 but that will only be helpful to the rest of us if the
 source incorporates the fixes.
 
 There is actually a third roadblock, which has
 the potential to become a major roadblock, but we
 haven't bumped into it yet so much because we still
 can't get past the first two roadblocks. And that is,
 institutions will not automatically assume that every
 "correction" that is sent to them is actually
 "correct".  Managers of those data will in almost
 all cases want to review the changes to ensure that they are
 appropriate for updating in the source database. And this
 process, of course, requires time and resources that most
 institutions simply do not have.
 
 There may be another solution, however, which
 is for GBIF to cache corrections submitted by people like
 you and other experts, such that these
 annotations/corrections can be made visible to all users of
 GBIF data; not just the source datasets.  Perhaps this
 feature already exists.  Perhaps the politics of
 implementing such a feature are too daunting to overcome.
 
 But the bottom line is that we
 really do need to address this fourth step, so that we can
 more effectively benefit from the work of others, and
 (conversely), so that our own efforts will benefit more than
 just ourselves.
 
 Aloha,
 Rich
 
 
 -----Original Message-----
 From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu]
 On Behalf
 Of Bob Mesibov
 Sent: Sunday, August 24, 2014 11:28 AM
 To: Donat Agosti
 Cc: TAXACOM;
 quentin groom
 Subject: Re: [Taxacom]
 Chameleons, GBIF, and the Red List
 
 Donat Agosti wrote:
 
 "I feel, the discussion is too much
 centered on data that has not the
 information content needed, like studying a
 Landsat image at 30 meter
 resolution and
 discussing what tree species is shown"
 
 Excellent metaphor! For most
 scientific uses, you need much more data than
 is provided by any available database. Can you
 get everything you need
 online? No. Do
 existing aggregators like GBIF offer a helpful starting
 point?
 For some people and some uses,
 yes.
 
 But now the important
 question: when you have all the information you
 need, and clean it and enrich it, do you
 publish it online in a usable form? I
 don't know what Quentin Groom's project
 was about, nor do I know if he
 published his
 final data.
 
 In my own case,
 every one of my 12123 locality records for Australian
 Millipedes is freely available in CSV format
 (and in abbreviated form in KML)
 from the
 'Millipedes of Australia' website. This store is
 larger and more up to
 date and contains
 fewer errors than any aggregator store, or even, the
 combined data providers' stores (because
 certain providers have been slow
 to add my
 edits to particular records, or to upload them to their own
 or
 aggregator stores).
 
 But if people like me and Quentin publish data
 freely to the Web and
 aggregators don't
 use this improved/extended data, aggregation looks less
 and less useful.
 --
 Dr Robert Mesibov
 Honorary
 Research Associate
 Queen Victoria Museum and
 Art Gallery, and School of Land and Food,
 University of Tasmania Home contact:
 PO Box 101, Penguin, Tasmania, Australia
 7316
 (03) 64371195; 61 3 64371195
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu>
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at:
 http://taxacom.markmail.org
 
 Celebrating 27 years of
 Taxacom in 2014.
 
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu>
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org
 
 Celebrating 27 years of
 Taxacom in 2014.
 
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org
 
 Celebrating 27 years of
 Taxacom in 2014.
 



More information about the Taxacom mailing list