[Taxacom] Likelihoodism versus probabilism; is it so simple?
Richard.Zander at mobot.org
Tue Oct 20 16:50:15 CDT 2020
Thanks for the messages pointing out that modern systematics uses a number of methods and various data to try to pin down evolutionary relationships. But ... as Jared wrote "by smartly using genomics and morphology with various branch support methods, we can derive much more robust phylogenies."
That's the rub. A phylogeny is not an evolutionary diagram, it is a cluster analysis using trait transformations between sets of traits. Posterior probabilities are not combinatorial probabilities; the reason they are called posterior is because they gauge how much better a hypothesis generates the data (note the likelilhood function inherent in Bayes' Formula) than does the flat prior. The likelihood is another optimality criterion, where the best answer is the hypotheses most likely to generate the data compared to the next most likely, or the next few most likely.
Likelihood and Bayes Factors work okay when there are two entities involved, such as measuring the evolutionary distance between two species, as in trees with stem taxa at nodes. Otherwise phylogenies of sets of traits generating sets of traits dichotomously are difficult to translate to theoretical evolutionary trees with radiations.
From: Jared Bernard <bernardj at hawaii.edu>
Sent: Tuesday, October 20, 2020 3:36 AM
To: Tim Dickinson <tim.dickinson at utoronto.ca>; Richard Zander <Richard.Zander at mobot.org>; stephen_thorpe at yahoo.co.nz
Subject: Re: [Taxacom] Likelihoodism versus probabilism; is it so simple?
I'm not nearly as experienced as you all (so I vacillated about responding), but I completely agree with Tim's response. As someone who has written species descriptions and is doing molecular evolution work, I think anyone would be foolish to make interpretations based on a single method. Using an assortment of analyses and datasets is standard in systematic/evolution research. This goes for branch support statistics as well as the types/number of data (molecular markers, morphology, etc.).
Most trees use both ML bootstrap and posterior probabilities as branch support, and sometimes also MP. Brown & Thomson (2017) have suggested using Bayes factor either in the place of posterior probabilities or in addition, especially in large datasets, because MCMC can't distinguish between 0.99 and 0.999999 (orders of magnitude different).
As far as I can tell, virtually everyone treats a tree as what it is: just a hypothesis. Clearly different genes may give incongruent trees, even conserved markers. Nowadays, publishing phylogenies based on less than a handful of markers is practically a thing of the past -- and now often studies use hundreds or even thousands of markers. As phylogenomics has blossomed, some hoped that incongruent phylogenies would come to an end. Maybe eventually, but will always require various methods and careful analytics. Rokas et al. (2003) found this to be the case when analyzing yeast with 106 genes, emphasizing that even with massive amounts of data, the results can be inconsistent. Difficulties of analyzing phylogenomic datasets led to famous debates about the placement of comb jellies (Whelan et al. 2015) and arthropods (Philippe et al. 2005). Brown & Thomson (2017) advocate for using the Bayes factor to search for genes causing inordinate sway over a topology, as they demonstrate for the conflicting placement of turtles. For phylogenomic analyses, branch support can be inflated, so bootstraps are only reliable when ≥ 95%, and even higher for posterior probabilities.
Even though analyses based solely on morphology are prone to mistake homoplasy for genuine relationships (for example, Old- and New-World vultures, or falcons and raptors), morphological matrices are sometimes included with genetic datasets for analysis. My hope is that by smartly using genomics and morphology with various branch support methods, we can derive much more robust phylogenies.
On Fri, Oct 9, 2020 at 9:48 AM Tim Dickinson via Taxacom <taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>> wrote:
Is it possible that this thread represents a kind of straw man take on
molecular phylogenetics? My very limited experience in this area
suggests that people who care about using molecular data to derive trees
that will contribute to the edifice of classifying organisms alive today
and making inferences about their ancestry are using more than just a
single source of data, and more than a single method for analyzing those
data. Also, they are developing methods with which to interrogate those
data more closely, and are trying to distinguish between results
stemming from methodological choices and those reflecting more closely
some aspect of biological and historical reality. These methods are
increasingly important as new sequencing methods are generating larger
and larger volumes of sequence data, not only from low copy number
nuclear genes, but also from organellar genomes.
For example, Rob Lanfear (ANU) has a blog post
which he describes how the IQ-TREE software calculates the extent to
which individual sites on a DNA sequence, and individual loci in a
multilocus alignment, are concordant (and discordant) with a given node
on the ML tree calculated from the multilocus alignment. After showing
how high bootstrap values may be associated with low concordance values,
he writes, "A sensible biological interpretation here would be that this
resolution of the [multilocus] tree is a good best guess for the species
tree, but that there is plenty of conflicting signal meaning that: (i) I
wouldn't bet my house on this resolution of the species tree; (ii) even
if this is the correct resolution, there's probably a lot of conflicting
signal in the gene trees from processes like incomplete lineage
sorting." The example data used in this blog post come from this paper:
<https://doi.org/10.1093/sysbio/syx041>(which itself is an example of
querying data in order to understand why results may vary in relation to
I note that the conflict between gene trees can also emerge from purely
phenetic methods of comparing tree toplogies. It appears that these
methods as well are being employed in order to better understand the
signal (-s) present in comparative sequence data.
The Systematic Biology paper referenced above may have been chosen by
Lanfear because of the way in which the authors discount the possible
role of conflict between individual gene trees, writing that "analyses
of concatenated data are still expected to converge on the same estimate
of phylogeny as data are increased," and explicitly rejecting the need
to consider coalescence models (so, yet another approach). My point is
that molecular phylogenetics in the era of next generation sequencing
seems increasingly to be reflecting on why data and methods give the
results they do, and how far those results can be trusted. I'm skeptical
that the simplistic contrast suggested by others captures all that's
going on in the field.
<Senior Curator Emeritus
<ROM Green Plant Herbarium (TRT)
<Department of Natural History
<Royal Ontario Museum
<100 Queen's Park
<CANADA M5S 2C6
<Phone: (416) 782 1607 FAX: (416) 586 7921
<E-mail: tim.dickinson at utoronto.ca<mailto:tim.dickinson at utoronto.ca> - ORCID ID http://orcid.org/0000-0003-1366-145X
<URL: http://labs.eeb.utoronto.ca/dickinson/NABFH-I/ (publications on Crataegus)
Taxacom Mailing List
Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>
For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu<mailto:taxacom-owner at mailman.nhm.ku.edu>
The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org
Nurturing nuance while assaulting ambiguity for about 33 years, 1987-2020.
More information about the Taxacom