[Taxacom] taxonomic names databases

Nico Franz nico.franz at asu.edu
Thu Sep 1 12:52:38 CDT 2016


Not all of this discussion is adequately captured if we do not make some
qualitative or relative distinction between data quality and trust in data.
These two are clearly related but can nevertheless have different pathways
in our data environments and point to different means for resolution.

My sense is that in the following situation, many of us will not have to
hesitate for long to decide which option is preferable.

1. A dataset with 99 records that are "good", and 1 that is "bad" (needs
"repair"), and to which I have no direct editing access *in the system*
where that system is designed to give me that access and editing power and
-credit.

2. A dataset with 80 records that are "good", and 20 that are "bad", but
where the system design is such that I have the right to access, repair,
have that action stored permanently (provenance), and accredited to me.

The first dataset is of better quality, but the design tells me that it is
unfixable by me. Do I feel comfortable publishing on the 100 records?
Actually, not really. Is the act of someone with access fixing that 1
record for me a genuine solution? Also not really, because "good" (quality)
is often a function of time, and with time certain aspects of good quality
data are bound to deteriorate, and so the one-time fix does not operate at
the problem's root.

The second dataset is of worse quality, but in some sense it just tells me
what I already know about my specimen-level science, i.e. that if I am lazy
or not available to oversee the quality, then there might be issues. I may
decide to fix them, or not, depending on the level of quality that I need
for a particular intended set of inferences I wish to make. In either case,
that is my call, and I will get it to the point where I do feel comfortable
publishing. The design of the second system facilitates that, and *that* is
why I trust more, not because it has better data.

So then, at the surface this may sometimes look like a discussion about
data quality only. It is not. Too many aggregating systems are systemically
mis-designed to (not) empower individual experts while preserving a record
of individual contributions and diversity of views. Acceptance of a
classificatory system, for instance, tends to be a localized phenomenon,
even in a regional community of multiple herbaria, for instance. Nobody in
particular believes in a single backbone. This failure to design
appropriately primarily affects trust, and secondarily quality, more so
over time. A great range of sound biological inferences are still possible.
But so are better designs.

Cheers, Nico



More information about the Taxacom mailing list