[Taxacom] Data quality in aggregated datasets
pentcheff at gmail.com
Thu Apr 25 14:23:03 CDT 2013
So, on this conversation, I think we've just reached #4.
Give me a ring when we're back to #1.
pentcheff at gmail.com
dpentche at nhm.org
On Thu, Apr 25, 2013 at 1:41 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:
> Leaving aside the issues of what both providers and aggregators can do to
> clean the data, we seem trapped in an endless cycle of :
> 1. OMG the data is broken!
> 2. SOMETHING MUST BE DONE!
> 3. Wave arms frantically, mention projects currently underway that will
> almost certainly solve the problem "real soon now".
> 4. ... [tumble weed]
> 5. Go to 1
> There at least things we need to do to tackle this problem, and until we
> do we're not being serious about data quality.
> 1. Identifiers
> In order to clean data that data has to persist long enough for people or
> algorithms to act on it. If I add an annotation to a piece of data I want
> that information to persist, otherwise why would I bother? At the level of
> specimens we don't have identifiers, and few have shown any commitment to
> tackling this problem (notable exception is Roger Hyam's work at the RBGE,
> see http://www.mapress.com/phytotaxa/content/2012/f/pt00073p030.pdf ).
> GBIF routinely deletes vast (in some cases literally millions) of specimen
> URLs, so any attempt to attach annotations to those records is doomed.
> 2. Annotation tools
> Of course there are tools being developed by our community, but I've not
> seen any that look at all usable. In the real world we are used to tracking
> packages being couriered around the world (there's an app for that), and
> many will have come across feedback tools online where you can notify a
> site of an issue and engage in a conversation to resolve it. There are also
> more general annotation tools being developed, e.g. http://hypothes.is/ Let's leverage these.
> Annotation rests on being able to identify the thing being annotated, and
> on the web URLs serve that purpose. Until we have stable URLs for
> specimens, and these are used by everyone who has something to say about
> that specimen, then we are doomed to repeat steps 1-5.
> But of course, we know all this, and have done so for a while...
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
> Email: r.page at bio.gla.ac.uk
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> Skype: rdmpage
> Facebook: http://www.facebook.com/rdmpage
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> ORCID id: http://orcid.org/0000-0002-7101-9767
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> The Taxacom Archive back to 1992 may be searched with either of these
> (1) by visiting http://taxacom.markmail.org
> (2) a Google search specified as: site:
> mailman.nhm.ku.edu/pipermail/taxacom your search terms here
> Celebrating 26 years of Taxacom in 2013.
More information about the Taxacom