The nature of cladistics [was: Whooper; Canadian ge ese split; paraphyly]

Richard.Zander at MOBOT.ORG Richard.Zander at MOBOT.ORG
Thu Nov 18 13:42:52 CST 2004

Well, a kind of algorithm that deals with the total reliability of
phylogenetic analysis is addressed by the Bayesian (true Bayesian, not
phylogenetic Bayesian) statisticians. One wants to make an intelligent bet
(in this case on retrodicted branch arrangements). To make the bet you need
to use among other things the Bayesian equation but the main point of the
Bayesian philosophy is that ALL relevant conditions must be examined and
probabilistic values ascertained or even guessed at. THEN the risk and
subsequent loss if wrong must evaluated (e.g. losing your house or losing
your science), such that a sufficiently high posterior probability is
required given the possible loss of the bet.

Most phylogenetic analyses ignore this philosophy and assert that their
often highly probable results are "conditional" on some assumptions
(occasionally some few assumptions are listed). Because the assumptions are
actually many, this is like saying "I WIN THE BET conditional on not having
lost it."

An example of an assumption is perfect sequence alignment. To be addressed
is the question: is there a reasonable realignment that affects a branch
arrangement of interest, or even if unreasonable is the different alignment
that affects the branch arrangement sufficiently probable that it lowers the
posterior probability below 95%?  Any possible alternative arrangement
introduced by problems with hybridization, sample error, paralogy,
differential lineage sorting, model selection, different results from
different data sets or different software or different iterations, and
whatever else is involved must be addressed by multiplying the posterior
probability by the chance of being correct conditional on the assumption.
Thus, every 1 in a hundred chance of being wrong because of a wrong
assumption reduces the posterior probability (if high) by 1 percent. We have
only 5 percent to work with if the posterior probability as calculated by
software is 100 percent. Six one-in-a-hundred chances of being incorrect
yield a 94% probability of being correct.

Some have pointed out that working with morphological characters, because
they are few and hard to model evolutionarily, is pretty much cluster
analysis, with only general groupings being worthwhile and details very
dubious. Given the stringencies (wrong model means wrong results) and
inadequacies (as per above argument) of model-based molecular analysis, the
molecular results are pretty much cluster analysis, too.

The way to reassure the taxonomic users of the results of phylogenetic
analysis that the results are reliable is to address all the relevant
assumptions and conditional features of the analysis, and say outright when
a particular assumption is so definitely correct that it is essentially
100%, and when an assumption is not and how that affects the probability of
being correct.

Otherwise the onus of guesswork is lain upon the user taxonomist, not the
analytic phylogeneticist.

At 10:16 AM 11/18/2004, Curtis Clark wrote:
>My other post makes it clearer, but basically the idea is this.
>Phylogenies are built of species. As one approaches the species level,
>the tools of phylogenetic reconstruction become less and less useful,
>until at some point "you're not in phylogeny any more".

Well, that's pretty much my original point!  Cladistic methods at those
levels seem inappropriate BECAUSE they are divorced from the reality they
purport to reflect.  When someone can construct an algorithm to accommodate
that, it will be a major breakthrough.

