[Fwd: Re: Probabilities on Phylogenetic Trees]

Tom DiBenedetto tdib at UMICH.EDU
Mon Sep 15 15:19:56 CDT 1997

Richard Zander wrote:

>> >Maximum parsimony, which is a belief-oriented analysis....

>> What do you mean "belief-oriented'?

>Just in the sense that there is no sampling during phylogenetic
>time...no information at each node of the Markov process,

It wont be hard for me to play dumb here,,,,I dont really know what
this means.
A parsimony analysis deals simply with the distribution of character
states (the
groupings implied by the state-distributions of each character), and
chooses the
pattern which is most consistent with all of those groupings. It
makes no reference
to process.

>and therefore
>a large number of assumptions must be made about rates of character
>state change, even and uniform distribution of state changes, what model
>is correct, that sort of thing (like believing that the dice you bet on
>are fair and fairly cast).

no, it makes no such assumptions...It has nothing to do with various
models of
how states change.

>> most certainly a
>> method of reconstruction by a discovery process. It is not a
>> statistical method of course, but that is a virtue, in my book...
>Excellent, but what is the method reconstructing?

It is reconstructing the phylogeny under the assumption that the
phylogeny can
be inferred through examination of character-state distributions, and
that there is no process, other than descent, which can pattern
character distributions throughout the various character systems
which one can investigate. That is why cladists emphasize "total
evidence", looking to as many character systems as possible, rather
than relying exclusively on one (very problematical) source of
evidence (as do the statistical phylogeneticists with their reliance
on sequence characters only).

> A phylogenetic tree
>believed to be based on slow and even evolutionary processes,

parsimony makes no such assumption

>believed to be dichotomously branching, without introgression,

Niether are these "beliefs". Resolution is sought down to the level
of dichotomous branching because such results will not be discovered
unless they are sought. There are plenty of instances in which
complete resolution is not achieved. There may also be instances of
spurious resolution, but these can be expected to be persistently
weak nodes, and subject to change given new evidence.
Introgression is a problem in lower level analyses where
relationships are both phylogenetic and tokogenetic. But parsimony is
a method of phylogenetic systematics anyway, and should not be
criticized for failing in instances where it is not meant to suceed.

> and using just exactly the characters used so far for what is expected to be total

I dont understand. One cannot used characters that havent been
developed. The general approach of "total evidence' is to always seek
out more evidence. not restricting oneself to a single class of

> I'm thinking that if a phylogenetic result is supposed to have a
>probabilistic basis, then the probability (given all those mathematical
>figures, there has to be a pony somewhere) that the resultant tree is
>the true should be calculable.

I dont see that. The probabilistic basis you refer to is a question
of the extent to which the result fufills the demands of the method
from which it is drawn. The ultimate truth of the result would
entail, in addition, the question of the reliability of the method
itself. Since we can never know the ultimate truth of historical
reconstructions, the ultimate reliability of the method is
unknowable; our only recourse is to use methods which are consistent
with those notions we are prepared to accept as assumptions.
A parsimony analysis will yield a result which is most consistent
with the notions that life evolved, and that characters which are
determined (to the best of our biological knowledge) to be the same,
in different taxa (homologous) are present by descent, I dont know of
a better way of approaching the "truth" within the framework of
evolutionary biology.

> I've figured that if a tree with an extra
>step is half as probable, then in the case of a three terminal taxon
>tree (two taxa of which share an advanced character), which has three
>possible topologies, the shortest tree should have a probability of .5,
>and each of the two (1-step) longer trees, should have a probability of

But the question of whether this is the true tree is not at all
addressed by this. What is the probability that the apomorphy you
identify is really the result of an evolutionary transformation event
that actually occurred in the ancestor of the those two taxa? How can
such a queston be answered? A cladist will say that the sum of all
empirical data on the distribution of characters in these taxa
indicate that this conclusion is tenable,,by accepting it, one can
explain the sum of the empirical data in a manner which makes minimal
recourse to special ad hoc explanations. Any other arrangement
necessitates recourse to more ad hoc explanations. Finding general
statements which minimize recourse to special ad hoc hypotheses is a
general characteristic of the scientific method. The distribution of
character states is the empirical data, the presence of derived
states in more than one taxa is the specific data which seeks
explanation, the process of descent from a common ancestor is the
explanation, and the most parsimoious solution is the arrangement
which maximizes the power of the explanation.
        An analogous procedure is followed by those who seek to
understand evolutionary processes,,,a hypothesized process is applied
as generally as possible, then subject to testing and falsification
(thereby having its true domain specified).
(consider the sordid history of the "molecular clock").
This is the reason why it is important to maintain systematics as an
independant discipline from evolutionary process modelling. Testing
and potential falsification of process models can only be achieved by
comparison of the phylogenies implied by the model (e.g. a max-like
phylogeny) to a well supported total evidence parsimony phylogeny.
To assert that a model derived phylogeny is the "best estimate" of
the REAL phylogeny, and that is the end of the story, is absurd, for
it assumes that the model is an accurate mirror of evolutionary
processes, when in fact the model remains *untested* in the very
domain in which it is supposedly revealing the truth.

Proposing such a phylogeny as a *hypothesis* is fine, so long as one
concurrently proposes a testing method, but it must be understood
that the hypothesis entails not only the phylogenetic pattern itself,
but also the underlying extrapolation of the process model. That is
another reason why ancillary methods which test the reliability of
finding the pattern (bootstrapping etc.) fail to even address the key
point; what needs to be tested in the validity of the model
Given the results of process modelling to date, I sense that there
should probably be an expectation that the model will be different in
just about every different taxon (and for every different sequence).

More information about the Taxacom mailing list