[Taxacom] taxonomic names databases

Tony Rees tonyrees49 at gmail.com
Thu Sep 1 18:22:16 CDT 2016


Hi Nico, all,

I have to take issue with Nico's main point here which seems to be that a
database with a higher level of residual errors, that can be corrected by
"anybody", may be preferable to one with a lower level but is under the
control of a "gatekeeper" so to speak, who has sole editing rights. In my
experience, at least for the major systems with a track record of
scientific scrutiny and continuous effort to improve,  the latter tend to
be much more reliable than the former: for example why would I not defer to
Bill Eschmeyer's expertise for information on extant fishes, Paul Kirk's on
the fungi, Geoff Read for Annelida, and so on? If I find errors or
inconsistencies in their system's content I simply alert them and, 9 times
out of ten, receive a prompt and courteous reply and relevant action as
well as appreciation for spotting the error. I do not want carte blanche to
edit their systems and they would probably not appreciate it either!

In any event, no such system is ever perfect and one would be wise to
separately verify any data item considered "crucial" to a planned
publication etc. All databases have disclaimers about potential residual
errors, one simply has to make a judgement about which are more or less
trustworthy of fit for a particular intended use, and where to set the bar
beneath which it is simply better to ignore a particular data system as a
source of sufficiently trusted information. In reality most "aggregators"
of such data take the best sources they can (a subjective decision) and
then hopefully, either have a proactive policy of detecting inherited
errors - such as inter-dataset comparisons and investigation of
discrepancies as revealed, going back to the original literature, and
numerous internal data integrity checks, or at least be reactive to
improvements as suggested by others. At least that is what I aspire to -
and recognise that it will never be perfect, but hopefully still a lot
better than no equivalent product (hence "Interim" as the first word in the
name of my project, IRMNG).

Just my 2 cents, as ever,

Best - Tony

Tony Rees, New South Wales, Australia
https://about.me/TonyRees

On 2 September 2016 at 03:52, Nico Franz <nico.franz at asu.edu> wrote:

> Not all of this discussion is adequately captured if we do not make some
> qualitative or relative distinction between data quality and trust in data.
> These two are clearly related but can nevertheless have different pathways
> in our data environments and point to different means for resolution.
>
> My sense is that in the following situation, many of us will not have to
> hesitate for long to decide which option is preferable.
>
> 1. A dataset with 99 records that are "good", and 1 that is "bad" (needs
> "repair"), and to which I have no direct editing access *in the system*
> where that system is designed to give me that access and editing power and
> -credit.
>
> 2. A dataset with 80 records that are "good", and 20 that are "bad", but
> where the system design is such that I have the right to access, repair,
> have that action stored permanently (provenance), and accredited to me.
>
> The first dataset is of better quality, but the design tells me that it is
> unfixable by me. Do I feel comfortable publishing on the 100 records?
> Actually, not really. Is the act of someone with access fixing that 1
> record for me a genuine solution? Also not really, because "good" (quality)
> is often a function of time, and with time certain aspects of good quality
> data are bound to deteriorate, and so the one-time fix does not operate at
> the problem's root.
>
> The second dataset is of worse quality, but in some sense it just tells me
> what I already know about my specimen-level science, i.e. that if I am lazy
> or not available to oversee the quality, then there might be issues. I may
> decide to fix them, or not, depending on the level of quality that I need
> for a particular intended set of inferences I wish to make. In either case,
> that is my call, and I will get it to the point where I do feel comfortable
> publishing. The design of the second system facilitates that, and *that* is
> why I trust more, not because it has better data.
>
> So then, at the surface this may sometimes look like a discussion about
> data quality only. It is not. Too many aggregating systems are systemically
> mis-designed to (not) empower individual experts while preserving a record
> of individual contributions and diversity of views. Acceptance of a
> classificatory system, for instance, tends to be a localized phenomenon,
> even in a regional community of multiple herbaria, for instance. Nobody in
> particular believes in a single backbone. This failure to design
> appropriately primarily affects trust, and secondarily quality, more so
> over time. A great range of sound biological inferences are still possible.
> But so are better designs.
>
> Cheers, Nico
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
>
> Injecting Intellectual Liquidity for 29 years.
>



More information about the Taxacom mailing list