[Taxacom] Data quality in aggregated datasets

Dean Pentcheff pentcheff at gmail.com
Mon Apr 22 13:10:56 CDT 2013

Yes, it is. I had (over)interpreted your comment as implying a strongly
human-mediated process of email / phone / written corrections being
incorporated into the providers' systems, then moving those forward to the
aggregators. The rub with that, of course, is the high friction inherent in
one-off styles of data correction.

What FilteredPush offers (as I understand it, and my understanding is basic
at best!) is a standardized pipeline so that errors noticed in data
(whether noticed at a provider's site or an aggregator's site) get fed back
directly to the provider, who can then evaluate and decide what to do. The
key there is that aggregators and providers will have to bake FilteredPush
into their systems so that the pipeline exists.

The expected benefit is a drastic drop in the amount of individualized
interpretation and interaction demanded by nearly every error correction
these days. Ideally, whatever data site you're on, you'd have a "report
error on this record" capability that would pipe your error report right
back to the provider (without you as the error-noticer having to figure out
who/how/where to start sending emails or ringing phones).

Will this fix everything? Of course not. Providers can still be asleep at
the wheel and fail to act on error reports. Aggregators can still fail to
properly update records with fixed content. But at least it's a step away
from the "Oy! Fred! Biff on the line here. Can ya give a peep to record
#44323? It looks wonky!" world in which we live now.

Dean Pentcheff
pentcheff at gmail.com
dpentche at nhm.org

On Sun, Apr 21, 2013 at 4:33 PM, Robert Mesibov <mesibov at southcom.com.au>wrote:

> Dean Pentcheff wrote:
> "http://wiki.filteredpush.org
> There is much more than silence, but making a working system takes both an
> initial effort and changes in the way provider systems work. It will take
> time to take effect."
> My question was "What mechanisms are there for detecting and fixing errors
> besides (interested third party) > (data provider) > aggregator?". Isn't
> FilteredPush a project to streamline just that mechanism, with cooperating
> data providers?
> The 'silence' I referred to is coming from the aggregators, who seem
> committed to ignoring this kind of error-fixing mechanism: aggregator >
> (data provider) > aggregator.
> GBIF likes to emphasise that it is only a facilitator. It doesn't own the
> data it publishes, it merely provides a place for data holders to 'expose'
> what they have. It is resolutely ignoring the opportunity provided in this
> for an outside party (GBIF or a GBIF-contracted service) to do some basic
> record-checking, then collaborate with the data holder to make corrections
> or add 'Queried' flags. There would be benefits in this for all interested
> parties: the data holders, GBIF as publisher, and end-users. This isn't
> happening.
> GBIF has been going for how many years? And has finally gotten around to
> talking about offering advice to participants about data quality:
> http://community.gbif.org/pg/groups/21292/biodiversity-data-quality-interest-group/
> As I suggested in an earlier post and in my ZooKeys paper, the barriers to
> data-checking at aggregator level aren't technical. Call them 'policy' or
> 'attitudinal' barriers, they're not unlike Person A being reluctant  to
> tell Person B that they've made a mistake, because A wants to remain
> friends with B and doesn't want to upset B, and anyway, what's a little
> mistake?
> The analogy fails because the aggregators (A) are multi-million dollar
> organisations hoping to service a global community, and dealing with
> multi-million dollar organisations (B) whose 'mission statements' probably
> talk about a commitment to 'continuous improvement'.
> Note: I say all this (and I published the ZooKeys paper) without much hope
> of seeing reform. As explained in the paper, I've created an alternative
> for my little basket of the world's species occurrence records, and unlike
> the aggregators, I write directly to data providers with messages like
> 'Could you please check [records]? It looks like there are errors in the
> lat/lon's, which probably should be [X]. Many thanks.'
> --
> Dr Robert Mesibov
> Honorary Research Associate
> Queen Victoria Museum and Art Gallery, and
> School of Agricultural Science, University of Tasmania
> Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
> Ph: (03) 64371195; 61 3 64371195

More information about the Taxacom mailing list