[Taxacom] Chameleons, GBIF, and the Red List

Richard Pyle deepreef at bishopmuseum.org
Tue Aug 26 11:18:12 CDT 2014


The model that Rod outlines is basically the model that GNUB has adopted for taxon names.

The same taxon name appears in multiple sources, but with somewhat different complements of metadata.  Once two records from two different sources are recognized as being the same, then metadata are merged & reconciled.  With enough sources of the same name, the full record metadata usually emerges.  Obviously, discrepancies need to be reconciled, but at least they can be identified as discrepancies (whereas this is not clear from a single source).

It works very well for taxon names, because any name is likely to be in many different sources.  It's a little different with Occurrence data because it's not often the case that the same occurrence record exists in different databases with independent origin (most of the duplicates in GBIF likely come from re-sharing of the same original source record).  But it certainly can happen -- as in Rod's example of museum database vs. published literature, or in cases like botanical duplicate specimens, where specimens from the same gathering are distributed to different Museums, which each record different information.

Aloha,
Rich

> -----Original Message-----
> From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
> Of Roderic Page
> Sent: Monday, August 25, 2014 6:34 AM
> To: Chuck Miller
> Cc: TAXACOM; Bob Mesibov
> Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> 
> Hi Chuck,
> 
> No, or at least not in the way I think that you mean.
> 
> The “TripAdvisor” model is the Hotel publishes data about the hotel, and
> users then add comments supporting or disputing the attributes of the hotel.
> So, it’s assumed that there is an authoritative source of data on the hotel,
> and we get to put sticky notes on that information. These notes may be
> ignored by the hotel.
> 
> The model I’m proposing (based on http://fluidinfo.com ) is that everyone
> gets to publish the same kind of data, and then we reconcile that (based, in
> part, on how much we trust the sources).
> 
> Imagine, for example, that a Hotel says “our address is Cool Street”. In
> TripAdvisor someone may add a comment saying “the address is 130 Cool
> Street”, and somebody else might add “the post code is 12345”. At this point,
> there’s no mechanism for the hotel description to be updated to include the
> street number and post code. TripAdvisor will keep saying the hotel is in Cool
> Street until somebody at TripAdvisor reads the comments, talks to the hotel,
> and updates the information.
> 
> Imagine, instead, that we treat the three sources as equivalent, then we can
> add
> 
> “Cool Street”
> “130”
> “12345”
> 
> and get "130 Cool Street 12345”.
> 
> So, it’s a bit like TripAdvisor, but imagine that we restrict ourselves to just the
> comments, and we don’t treat the hotel as the definitive source of
> information. So, we combine the information from the comments, and from
> that make a summary of the data.
> 
> Of course, we might trust some commenters more than other - I find it
> useful to ignore any complaints about room size if the comments come from
> the US, because their expectations are frankly ridiculous ;) We might give
> extra weight to information provided by the hotel itself, or we may choose to
> trust someone else.
> 
> In the context of a museum or herbarium specimen, I would imagine that
> we’d have multiple sources of data, which might include:
> 
> 1. the digitised museum catalogue
> 2. the literature that mentions the specimen 3. the voucher information
> recorded in GenBank
> 
> Given this we can do a number of things:
> 
> 1. If we trust the museum, we can simply ignore the other sources and go
> with the “primary source”
> 2. If we trust the literature more, we may accept that 3. If the sequences
> suggest a different identification to what the museum says, we may choose
> to accept GenBank 4. We may choose to take the consensus of all sources,
> perhaps weighted by some measure of their past performance
> 
> One advantage of this approach is that it doesn’t rely on waiting for the
> museum to accept or reject corrections or annotations. I could geo-reference
> a bunch of specimens, upload those, and they’d be immediately available to
> anyone to use. But these wouldn’t overwrite the original museum’s data (in
> the same way that if I add a comment to a hotel listing, it doesn’t overwrite
> yours). Users could elect to ignore my georeferencing (for example, by
> saying “give me only data from the original provider”), they may elect to take
> just my version of the data, or they may take a synthesis of the data (a bit
> like the overall hotel rating TripAdvisor computes).
> 
> Hope this makes sense, I suspect I’ve not explained this terribly well. One
> nice outcome of this approach is that the problem of duplicate records
> becomes less a disaster and more of an opportunity.
> 
> Regards
> 
> Rod
> 
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine College of
> Medical, Veterinary and Life Sciences Graham Kerr Building University of
> Glasgow Glasgow G12 8QQ, UK
> 
> Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
> Tel:  +44 141 330 4778
> Skype:  rdmpage
> Facebook:  http://www.facebook.com/rdmpage
> LinkedIn:  http://uk.linkedin.com/in/rdmpage
> Twitter:  http://twitter.com/rdmpage
> Blog:  http://iphylo.blogspot.com
> ORCID:  http://orcid.org/0000-0002-7101-9767
> Citations:
> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> 
> 
> On 25 Aug 2014, at 16:47, Chuck Miller
> <Chuck.Miller at mobot.org<mailto:Chuck.Miller at mobot.org>> wrote:
> 
> Rod,
> Re: "I prefer a different model, where data is considered to be "social" and
> we can all annotate it (in effect, the museums are themselves simply one
> annotator)."
> 
> Are you talking about a "TripAdvisor" or "Yelp" kind of application for
> biodiversity data records?
> 
> Chuck
> 
> -----Original Message-----
> From: Roderic Page [mailto:Roderic.Page at glasgow.ac.uk]
> Sent: Thursday, August 21, 2014 5:52 PM
> To: TAXACOM
> Cc: Bob Mesibov
> Subject: Re: [Taxacom] Chameleons, GBIF, and the Red List
> 
> A couple of quick comments.
> 
> Regarding expertise, I agree that there is lots that non experts can do, but
> also take Doug's point about the value of taxonomic input. I once saw a talk
> by Charles Godfray we he was describing the role taxonomic expertise
> played in building maps of mosquitoes that transmitted malaria (see, e.g.
> http://dx.doi.org/10.1371/journal.pmed.1000209 ). He said the role of the
> taxonomist wasn't the oft-assumed one of identifying specimens, instead it
> was to interpret distributional data from the literature in the light of changing
> taxonomies.
> 
> Where I differ from James is that I'm not really a fan of an annotation model
> where the focus is on annotating data and pushing those annotations back to
> the "primary providers". Given the scale of the problem, and that evidence is
> likely to be widely distributed (problems are often only uncovered when
> data is aggregated from different sources) I prefer a different model, where
> data is considered to be "social" and we can all annotate it (in effect, the
> museums are themselves simply one annotator). There's a bit more about
> this here: http://iphylo.blogspot.co.uk/2014/04/more-on-annotating-
> biodiversity-data.html Note that I'm not disputing that it would be nice to
> feed annotations back to collections, but that this isn't the main goal (and it
> think that it's pretty clear that there is going to be a huge bottle neck
> involving this process).
> 
> Regards
> 
> Rod
> 
> 
> Sent from Acompli<http://t.acompli.com/ac_sig>
> 
> 
> 
> 
> On Thu, Aug 21, 2014 at 11:52 AM -0700, "James Macklin"
> <james.macklin at gmail.com<mailto:james.macklin at gmail.com><mailto:jam
> es.macklin at gmail.com>> wrote:
> 
> Hi Rod,
> 
> Sorry, a little slow... I also think it is important to stress the data quality life
> cycle here. What we still as yet do not do well is connect the expert work
> done on these specimens or their digital derivatives (or observations, I
> guess), which are not done by the source/owner, back to them so the
> source/owner can clean/update the record and provide it to GBIF and/or
> other aggregators. The literature is one path where there is reference to the
> specimens used but as we know not everything ends up published this way.
> Further, extracting the information from the literature can be challenging
> even today. Lyubomir and Pensoft make this easy (thanks!) but we are still a
> long way from convincing other publishers to include the specimen data in a
> readily accessible form (or even mandating its presence as evidence).
> Another way to get expert knowledge back to the source is through
> annotation. Those of you who know me realize that my colleagues and I have
> spent a fair bit of time studying this problem and coming up with solutions
> (FilteredPush). I would say that in general there are now reasonable
> solutions for achieving distributed annotation at various levels of complexity
> but there is still a challenge/bottleneck in pushing these annotations back to
> the source and into their collection management system. The bottleneck is
> potentially at the source that must process the annotations. If we automate
> (or even semi-auto) the annotation process through curation workflows,
> something my colleagues and I are now focusing on, we could potentially
> flood the "curators" of the specimens/data. Then the question becomes how
> much the owners are committed to processing potentially valuable
> modifications/additions and adding them to their database. Certainly data
> curation and positions to support it are in their infancy. The annotations that
> are not processed by the source still have value and can inform the
> aggregators but have to be dealt with in a slightly different manner. So, this
> returns to the issue of when GBIF takes in a record update (or a new record),
> what metadata follows it to say it has been changed (created) based on
> some form of expertise...
> 
> I think we also need to be careful of the use of the term "expert."  I think it is
> reasonable to assume that a taxonomist is not going to be any better at
> georeferencing a specimen based on the collecting event data  (assuming
> this person was not associated with the collecting event) than a geographer,
> historian or even a citizen that happens to live near where the event took
> place. So, in the case of the Chameleon paper, and others like it, the issue
> really relates to taxonomic expertise and thus the name that appears
> associated with the record and not the entire record necessarily.
> 
> Papers like the Chameleon are quick to judge the end product but do not
> take into consideration what an achievement it is to simply have a GBIF
> resource and the challenges the greater "we" have overcome just to get this
> far! Let's stop highlighting the problem yet again and get to work on solving it
> and making the GBIF resource more valuable to all ;-)
> 
> Best,  JAmes
> 
> James Macklin, Ph.D.
> Research Scientist
> Botany and Biodiversity Informatics
> Associate Curator of the AAFC National Vascular Plant Collection (DAO)
> Agriculture and Agri-Food Canada Ottawa, Ontario, Canada
> 
> 
> On Thu, Aug 21, 2014 at 6:20 AM, Roderic Page
> <Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk><mailto
> :Roderic.Page at glasgow.ac.uk>> wrote:
> Just to follow up on this discussion:
> 
> Stephen, I think I often come across as grumpy, but your cynicism makes me
> look like a fanboy, so thank you for that ;) Can we maybe assume that GBIF's
> primary goal isn't to keep bureaucrats happy, that it's genuinely trying to
> provide access to basic biodiversity information in one place because that
> seems like a worthwhile goal - leaving aside whether GBIF is the best way to
> tackle that goal.
> 
> Bob, if I understand your argument correctly, it's that access to mostly
> unveiled biodiversity data isn't much use, and in your view that's mostly what
> GBIF is serving up. Assuming that it would be nice to have access to good-
> quality distributional data in one place, what if GBIF provided, say,
> distributions of species that had been cleaned and had some degree of
> expert scrutiny. In other words, say a researcher publishes an evidence-
> based distribution map, what if that was stored on GBIF in a citable form
> (e.g., had a DOI), and others could download that distribution and make use
> of it?
> 
> I guess this was the thinking behind the now abandoned SDR project (see
> https://code.google.com/p/gbif-sdr/wiki/PortalIntegration ), and is perhaps
> where the Map of Life http://mol.org is headed (although at the moment it's
> simply showing you a bunch of distributions from different sources).
> 
> Lyubo, I couldn't agree more, having links to literature related to a record
> would be great. Many of our online biodiversity databases are devoid of links
> to the evidence for a particular assertion, but as more and more literature
> comes online we can do something to fix that. +1 for extracting from the
> literature, especially if we can automate this at scale (although that will give
> Bob nightmares).
> 
> Regards
> 
> Rod
> 
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine College of
> Medical, Veterinary and Life Sciences Graham Kerr Building University of
> Glasgow Glasgow G12 8QQ, UK
> 
> Email:
> Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk><mailto:
> Roderic.Page at glasgow.ac.uk><mailto:Roderic.Page at glasgow.ac.uk<mailto:
> Roderic.Page at glasgow.ac.uk>>
> Tel:  +44 141 330 4778<tel:%2B44%20141%20330%204778>
> Skype:  rdmpage
> Facebook:  http://www.facebook.com/rdmpage
> LinkedIn:  http://uk.linkedin.com/in/rdmpage
> Twitter:  http://twitter.com/rdmpage
> Blog:  http://iphylo.blogspot.com
> ORCID:  http://orcid.org/0000-0002-7101-9767
> Citations:
> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> 
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu><
> mailto:Taxacom at mailman.nhm.ku.edu>
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> 
> Celebrating 27 years of Taxacom in 2014.
> 
> 
> 
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> 
> Celebrating 27 years of Taxacom in 2014.




More information about the Taxacom mailing list