[Taxacom] taxonomic names databases

Roderic Page Roderic.Page at glasgow.ac.uk
Tue Sep 6 13:59:46 CDT 2016


Hi Nico,

Thanks for the clarifications. I agree that there’s something of a disconnect between taxonomy and biodiversity informatics, and between the TDWG community and the those who attend Evolution meetings (as I used to).

All sorts of reasons, with unfortunate consequences. Not sure I agree that synthesis is a bad thing, but I agree that we don’t deal with provenance very well (or the flip side of that, which is credit). I think there are ways we can tackle this, especially now that many of the things we care about (publications, specimens, sequences, trees, etc.) have identifiers, as indeed do people (i.e., ORCIDs). Hence it’s becoming practical to build systems that link assertions to evidence, and to who made those assertions. Indeed this is something GBIF is actively interested in, partly because it would like to engage more successfully with the research community (and not just those who access data via GBIF but those whose activity has generated the data it aggregates).

Regards,

Rod

On 6 Sep 2016, at 16:21, Nico Franz <nico.franz at asu.edu<mailto:nico.franz at asu.edu>> wrote:

Thank you, Rod.

   The poor communication is on me, not you. Also, I am taking advantage of this being a forum where one can perhaps be a bit jarring to test out ideas and more forward based on reactions.

   My ultimate purpose is not to single out persons. It's just that lately I am fighting inside myself the notion that certain aspects of "biodiversity informatics" do not serve the individual, career-building taxonomist as well as they can and should. For instance, the notion that taxonomy and phylogenetics are "separate things". Or that we need a single "synthesis" of our current biodiversity knowledge in order to navigate through data. I think both notions are false, and also unnecessarily detrimental to the taxonomic agenda. I think science funding panels, in some of which I personally partake on occasion, are a bit drunk on the whole notion of "synthesis". More so than synthesis, we need provenance of conflicting views, and assessment tools for robustness on inferences given these conflicting views. But synthesizing conflict away is short-sided, instead we need to embrace it. If "users demand" one tree, well then we can argue why that demand is not a sound reflection of our science and scientific business plan, and provide other means for users to relevantly meet goals.

   I also seem to observe a bit of an unfortunate separation between the TDWG community and, say, the folks who more frequently attend Evolution Meetings. And to me, not enough biodiversity informatics services seem to put individual taxonomists at the very top of their hierarchy of contributors to serve. (certain services clearly do, ScratchPads, Pensoft, name/nomenclatural services, etc.) Maybe it's just that my feelings are hurt, but I also suspect there are things to learn here, more soberly.

   Your summary is very reasonable, thanks. Some clarifications.

   With "deflationary" I mean: saying that a certain practice or technology is not that meaningful or impactful. "We are just organizing things." "It is just a tool for navigation". "We are just synthesizing the data that are currently out there; we take what people give us". "We are just doing with the users want and need". "Classification does not matter very much".

   I am trying to causally connect this kind of thinking to other design features of aggregating biodiversity data services. And ultimately to a common but perhaps unfair or mistaken perception regarding trust in data provided by these services.

   Further clarifications.

   No, GBIF does not build one classification. GBIF (and many other services; I don't mean to be so specific) continues to build a chain of multiple versions, except the chain's versions are not well connected semantically. Each version is being used at the time to do other science with, of course.

   Clearly, in providing "the synthesis", GBIF (and others) does create some taxonomies whose structure is best and only accreditable to these "sources". An example that I am deeply familiar with is this: https://tree.opentreeoflife.org/taxonomy/browse?id=211889

   After working up about 20 recent belid trees and classifications in the Euler/X tool, and aligning them all, this is the one tree that stomped me. In the primary literature, there are at least two largely non-intersecting/self-propagating lineages of belid tree making (bit of a east versus west story). This one is/was different, and not (by me) attributable (even in parts) to any actual author. That is cherry picking, but I claim that it is also a design feature.

   So, yes to your summary of my current thoughts. In biodiversity informatics we (often, by no means always) seem to have bought into certain design rationales and paradigms that affect trust in data. This design package tends to include a purported need for synthesis, which is a misnomer when viewed over time and often also tends to counter-act good provenance tracking between versions. And the package tends to eliminate exposure of conflict and uncertainty which in turn are pillars of the taxonomic enterprise. And it comes packaged with a notion that these are "just technical, operational needs" to meet the demands of users at scale.

   Maybe this package needs to be looked into more. Personally, I want us taxonomist to be aware, and to play the best role we can in getting good services. I am trying to point in all directions.

Best, Nico


On Tue, Sep 6, 2016 at 4:25 AM, Roderic Page <Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>> wrote:
Hi Nico,

So, I’m try to parse this paragraph into something I can act upon. Phrases like "deflationary stance”, "exclusion of that heterogenous community” , and "honor the notion that expertise is personalized” are *cough*, perhaps less than crystal clear (or I’m being lazy).

GBIF consumes a bunch of mutually inconsistent classifications and/or lists of names, these classifications and lists are rarely connected to evidence (for example, few cite the taxonomic literature supporting each name, hardly any provide something useful such as a DOI for an article).

GBIF then applies a bunch of techniques to try and synthesise a single classification from this input, so that users (the majority of whom don’t care at all about taxonomic niceties) can navigate the data. These techniques have author(s) (mostly Markus Döring at GBIF), the code is open, and it’s development is public for all to see. It is, however, often hard for an outsider to work out how conflicts are resolved, or how some obvious errors have come about (e.g., http://dev.gbif.org/issues/browse/PF-2600 ).

If I understand your concerns correctly, they are:

1. GBIF builds a classification that may create new relationships not explicitly mentioned in the taxonomic literature ("novel theory making”). If GBIF were to claim that it simply takes what people give it and the synthesis doesn’t, of itself, create anything new, this would be a "deflationary stance”. To my knowledge GBIF doesn’t claim this, indeed, one of the goals of synthesis is to generate something more than a simple aggregation of things.

2. GBIF builds ONE classification (albeit one that evolves over time). Not everybody may agree with that classification (the "heterogenous community”). Note that GBIF links to all the input classifications, so you can still browse them. But yes, there is one “GBIF” viewpoint.

3. It is hard to go from the GBIF classification to the expertise that generated the names, lists, and classifications that are ultimately incorporated into that classification. If it were possible to do this, that could increase the level of trust people might have, and the willingness of experts to engage with the process of assembling the GBIF classification.

Is this a reasonable summary?

Regards,

Rod


On 2 Sep 2016, at 15:48, Nico Franz <nico.franz at asu.edu<mailto:nico.franz at asu.edu>> wrote:

  Of course not all will agree with this view. But I think it is a
plausible position *for a taxonomist* to adopt. And that may mean that,
regardless of how certain aggregators prefer to perceive their activities
as merely this or that, for a good section of the expert community there
*is* a perception of novel theory making, and of novel theory making under
a design paradigm that can work to the exclusion of that heterogenous
community. A deflationary stance is not an effective way to work against
that perception. Acknowledgement does not negate the great value of
syntheses to some; instead I think it ultimately helps bring contributors,
users, and quality/trust issues closer together.

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778<tel:%2B44%20141%20330%204778>
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page




---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page




More information about the Taxacom mailing list