[Taxacom] taxonomic names databases

Nico Franz nico.franz at asu.edu
Thu Sep 1 18:58:45 CDT 2016


Thank you, Tony.

   I do think that I could have spoken more clearly, but also think that we
look at things a bit differently here. In building bigger and bigger
"backbones" (which go all the way to the species-level tips, right?), I
think necessarily the lines between author and aggregator get blurred. But,
design can model the distinction, and the lack thereof.

   On the author-to-aggregator spectrum, Eschmeyer (
http://www.calacademy.org/scientists/projects/catalog-of-fishes) is
evidently more on the author side of the spectrum than others "sources" of
similar or even smaller scope. I assume though that Dr. Eschmeyer (sorry
for this appearing to get personal - absolutely not intended but I believe
needed to make the case), might not personally claim equally profound
expertise regarding the systematics of all fish lineages, in the sense of
all lineages equally being part of his active revisionary fish systematics
research program, so to speak. Whenever we speak of biodiversity, largely
reliably, we do draw on past and current expertise that is in effect partly
borrowed (from past authors) and distributed (in various sources). Blurry.
But it does matter immensely, I believe, that Dr. Eschmeyer is a person,
with a personal and internationally valued reputation, a personal record of
commitment to "his" domain. Someone that one can disagree with, combing
through the Eschmeyer catalog, and presumably a signal will come back,
either reconciliatory or resisting. Those features, to me, are features of
authorship.

   IRMNG is somewhere on that spectrum, to be sure, and likely not so far
from Eschmeyer (as evidenced by your objection). Though note the less
personalized citation practice: http://www.marinespecies.org/about.php

   Poor design, to my mind, is the kind of design where - qua aggregation -
a sense of authorship is weakened, obscured, to the point of "this backbone
just is". I believe that is a design that necessarily must result in
lowered trust, because I believe at a fundamental level that good taxonomy
has an individual expert-driven "flavor". You might call it subjective..I
think the design has to honor the notion that expertise is personalized.

   The reason why I would not defer to Dr. Eschmeyer's expertise by fiat is
that in certain cases (fish experts please help me or let me dangle in
eternal shame and agony), I may well think that he is mistaken in his
personally researched or editorially chosen preferred classificatory
representation. Maybe I disagree with his filtering of the latest
phylogenetic inferences into the catalog.

   And I disagree on the "they won't appreciate much" issue too - again, to
me that points to design. For any given group, the aggregating environment
can in principle store multiple conflicting views, and flag these as such.
That takes nothing away from an author's unique contribution or motivation,
it just means designing for multiple views and offering choices in cases of
conflict. That is how taxonomy operates throughout the entire "primary"
literature (hard-to-define term, see above), except apparently in the
aggregation domain where conflict and persistent disagreement tends to get
designed away (
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-208
).

   Is it likely much harder to build consistently expert-identifying,
conflict-embracing, but also scalable systems? Of course. But that does not
 make the decision not do so any less of a choice (pragmatic,
understandable), and one that has trust-related consequences.

Best, Nico


On Thu, Sep 1, 2016 at 4:22 PM, Tony Rees <tonyrees49 at gmail.com> wrote:

> Hi Nico, all,
>
> I have to take issue with Nico's main point here which seems to be that a
> database with a higher level of residual errors, that can be corrected by
> "anybody", may be preferable to one with a lower level but is under the
> control of a "gatekeeper" so to speak, who has sole editing rights. In my
> experience, at least for the major systems with a track record of
> scientific scrutiny and continuous effort to improve,  the latter tend to
> be much more reliable than the former: for example why would I not defer to
> Bill Eschmeyer's expertise for information on extant fishes, Paul Kirk's on
> the fungi, Geoff Read for Annelida, and so on? If I find errors or
> inconsistencies in their system's content I simply alert them and, 9 times
> out of ten, receive a prompt and courteous reply and relevant action as
> well as appreciation for spotting the error. I do not want carte blanche to
> edit their systems and they would probably not appreciate it either!
>
> In any event, no such system is ever perfect and one would be wise to
> separately verify any data item considered "crucial" to a planned
> publication etc. All databases have disclaimers about potential residual
> errors, one simply has to make a judgement about which are more or less
> trustworthy of fit for a particular intended use, and where to set the bar
> beneath which it is simply better to ignore a particular data system as a
> source of sufficiently trusted information. In reality most "aggregators"
> of such data take the best sources they can (a subjective decision) and
> then hopefully, either have a proactive policy of detecting inherited
> errors - such as inter-dataset comparisons and investigation of
> discrepancies as revealed, going back to the original literature, and
> numerous internal data integrity checks, or at least be reactive to
> improvements as suggested by others. At least that is what I aspire to -
> and recognise that it will never be perfect, but hopefully still a lot
> better than no equivalent product (hence "Interim" as the first word in the
> name of my project, IRMNG).
>
> Just my 2 cents, as ever,
>
> Best - Tony
>
> Tony Rees, New South Wales, Australia
> https://about.me/TonyRees
>
> On 2 September 2016 at 03:52, Nico Franz <nico.franz at asu.edu> wrote:
>
>> Not all of this discussion is adequately captured if we do not make some
>> qualitative or relative distinction between data quality and trust in
>> data.
>> These two are clearly related but can nevertheless have different pathways
>> in our data environments and point to different means for resolution.
>>
>> My sense is that in the following situation, many of us will not have to
>> hesitate for long to decide which option is preferable.
>>
>> 1. A dataset with 99 records that are "good", and 1 that is "bad" (needs
>> "repair"), and to which I have no direct editing access *in the system*
>> where that system is designed to give me that access and editing power and
>> -credit.
>>
>> 2. A dataset with 80 records that are "good", and 20 that are "bad", but
>> where the system design is such that I have the right to access, repair,
>> have that action stored permanently (provenance), and accredited to me.
>>
>> The first dataset is of better quality, but the design tells me that it is
>> unfixable by me. Do I feel comfortable publishing on the 100 records?
>> Actually, not really. Is the act of someone with access fixing that 1
>> record for me a genuine solution? Also not really, because "good"
>> (quality)
>> is often a function of time, and with time certain aspects of good quality
>> data are bound to deteriorate, and so the one-time fix does not operate at
>> the problem's root.
>>
>> The second dataset is of worse quality, but in some sense it just tells me
>> what I already know about my specimen-level science, i.e. that if I am
>> lazy
>> or not available to oversee the quality, then there might be issues. I may
>> decide to fix them, or not, depending on the level of quality that I need
>> for a particular intended set of inferences I wish to make. In either
>> case,
>> that is my call, and I will get it to the point where I do feel
>> comfortable
>> publishing. The design of the second system facilitates that, and *that*
>> is
>> why I trust more, not because it has better data.
>>
>> So then, at the surface this may sometimes look like a discussion about
>> data quality only. It is not. Too many aggregating systems are
>> systemically
>> mis-designed to (not) empower individual experts while preserving a record
>> of individual contributions and diversity of views. Acceptance of a
>> classificatory system, for instance, tends to be a localized phenomenon,
>> even in a regional community of multiple herbaria, for instance. Nobody in
>> particular believes in a single backbone. This failure to design
>> appropriately primarily affects trust, and secondarily quality, more so
>> over time. A great range of sound biological inferences are still
>> possible.
>> But so are better designs.
>>
>> Cheers, Nico
>> _______________________________________________
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
>> The Taxacom Archive back to 1992 may be searched at:
>> http://taxacom.markmail.org
>>
>> Injecting Intellectual Liquidity for 29 years.
>>
>
>



More information about the Taxacom mailing list