[Taxacom] Likelihoodism versus probabilism; is it so simple?

Tim Dickinson tim.dickinson at utoronto.ca
Fri Oct 9 14:48:04 CDT 2020

Is it possible that this thread represents a kind of straw man take on 
molecular phylogenetics? My very limited experience in this area 
suggests that people who care about using molecular data to derive trees 
that will contribute to the edifice of classifying organisms alive today 
and making inferences about their ancestry are using more than just a 
single source of data, and more than a single method for analyzing those 
data. Also, they are developing methods with which to interrogate those 
data more closely, and are trying to distinguish between results 
stemming from methodological choices and those reflecting more closely 
some aspect of biological and historical reality. These methods are 
increasingly important as new sequencing methods are generating larger 
and larger volumes of sequence data, not only from low copy number 
nuclear genes, but also from organellar genomes.

For example, Rob Lanfear (ANU) has a blog post 
(http://www.robertlanfear.com/blog/files/concordance_factors.html) in 
which he describes how the IQ-TREE software calculates the extent to 
which individual sites on a DNA sequence, and individual loci in a 
multilocus alignment, are concordant (and discordant) with a given node 
on the ML tree calculated from the multilocus alignment. After showing 
how high bootstrap values may be associated with low concordance values, 
he writes, "A sensible biological interpretation here would be that this 
resolution of the [multilocus] tree is a good best guess for the species 
tree, but that there is plenty of conflicting signal meaning that: (i) I 
wouldn’t bet my house on this resolution of the species tree; (ii) even 
if this is the correct resolution, there’s probably a lot of conflicting 
signal in the gene trees from processes like incomplete lineage 
sorting." The example data used in this blog post come from this paper: 
<https://doi.org/10.1093/sysbio/syx041>(which itself is an example of 
querying data in order to understand why results may vary in relation to 
methodological choices).

I note that the conflict between gene trees can also emerge from purely 
phenetic methods of comparing tree toplogies. It appears that these 
methods as well are being employed in order to better understand the 
signal (-s) present in comparative sequence data.

The Systematic Biology paper referenced above may have been chosen by 
Lanfear because of the way in which the authors discount the possible 
role of conflict between individual gene trees, writing that "analyses 
of concatenated data are still expected to converge on the same estimate 
of phylogeny as data are increased," and explicitly rejecting the need 
to consider coalescence models (so, yet another approach). My point is 
that molecular phylogenetics in the era of next generation sequencing 
seems increasingly to be reflecting on why data and methods give the 
results they do, and how far those results can be trusted. I'm skeptical 
that the simplistic contrast suggested by others captures all that's 
going on in the field.


<Tim Dickinson
<Senior Curator Emeritus
<ROM Green Plant Herbarium (TRT)
<Department of Natural History
<Royal Ontario Museum
<100 Queen's Park
<Toronto  ON
<Phone:  (416) 782 1607     FAX:  (416) 586 7921
<E-mail:  tim.dickinson at utoronto.ca  -  ORCID ID http://orcid.org/0000-0003-1366-145X
<URL: http://www.rom.on.ca/en/collections-research/rom-staff/tim-dickinson
<URL: http://www.eeb.utoronto.ca/people/d-faculty/Dickinson.htm?quot;%20title=
<URL: http://labs.eeb.utoronto.ca/dickinson/NABFH-I/ (publications on Crataegus)

More information about the Taxacom mailing list