More on probabilities of phylogenetic trees

Richard Zander bryo at COMMTECH.NET
Thu Sep 11 09:43:37 CDT 1997

Ted Schultz wrote:
> Having thought a little more about it, I think there are some problems with
> talking about the "posterior probabilities" of phylogenetic trees.
>  First of all, the "law of regularity" refered to by Richard Zander is not

I think I said Principle of Indifference.

> necessarily accepted by most Bayesians.  In fact, some Bayesians contend
> that assigning equal prior probabilities to alternative hypotheses does not
> at all accomplish the goal of "neutrality" toward a priori hypotheses, but

Well, yeah. The Bayes Theorem has provision for initial probabilities if
you know them, but if you don't they conveniently cancel out of the
formula if you set them all equal to the same value. A point I'm trying
to make is that Bayesian type thinking (belief oriented and *lots* of
assumptions and simplistic models) works fine at the beginning of
studying a phenomenon, then later one studies the assumptions and
models, and gathers frequency-oriented statistics to continue the study.
Maximum parsimony, which is a belief-oriented analysis, is a first cut
at the problem, not a "method of reconstruction by a discovery process."

> instead creates a probability distribution that is just as biased as other
> conceivable ones.
 [cut here]
> To put this into practical terms (and, really, statistics is supposed to be
> a tool in the service of practical thinking), we virtually never expect
> that any tree will be just as likely as any other tree before "viewing the
> data."  First of all, there are usually other hypotheses floating around

You're getting ahead of yourself. If you say that you've informally
narrowed the field to, say, groupings that are much like those of
phenetic clustering results, sure, that's great. But is that "total
evidence" and if so, what is the chance that your narrowing of the
number of trees to some smaller number than possible is correct?
Multiply probabilities. The "obvious" is, again, belief oriented.

> out there that represent our "best guesses" about the phylogeny of a
> particular group prior to a new study/revision.  For example, before

My point is that best guesses are often totally improbable, even if they
more probable than any one other possible hypothesis. Suppose you had a
dozen best guesses about a phylogeny based on totally different studies.
Wonderful! We should use that as a classification, since the resultant
dendrogram represents in toto our best theory about relationships. But
as data-based evidence of an actual phylogeny? I think not since there
is more theory going into the result than illumination coming out.

> carrying out a DNA study of a group of organisms, a researcher will
> probably consult the morphological literature and take as his null
> hypothesis the best morphologically-based phylogeny for the group.  If no
> explicit phylogeny exists, there will still be species groups, subgenera,
> genera, tribes, etc., that constitute more-or-less a priori hypotheses of

You are mixing apples and oranges. I'd like to see data sets dealt with
mathematically, *then* compared with more-or-less a priori hypotheses.
If you have the data, then why not evaluate it with available
statistical methods?

> monophyletic groups.  Sometimes our a priori concept of relationships may
> be based on one or a few characters, and the new data may consist of many
> more characters that we haven't yet considered.
> Another indication that we do not assign equal prior probabilities to all
> possible trees is our reaction to the outcome of a new study.  We may look
> at a DNA-based phylogeny for a particular group and say "That's ridiculous.
> I don't believe it.  The gene they used is obviously inappropriate for
> this taxonomic level."  On the other hand, we may look at the results and
> say: "Wow.  Features A, B, and C accord dead-on with what we know about
> this group.  Therefore I will give credence to feature D, which is actually
> quite surprising."
> The problem gets even worse when you start talking about estimating not
> only topology but branch lengths, states at ancestral nodes, etc.  Then the
> set of all possible trees is even more astronomically large.  You are never
> in real life going to divide up the a priori probability among all those
> trees in anything like equal amounts.

Of course not. But if someone said that betting on throwing a pair of
dice is impossible because there are too many ways in which the dice
could roll around after being thrown and before ending their roll, would
that stop a gambler from betting when the odds are very good? Should you
stop your car when you see police with siren blaring and flashing lights
far enough behind you that maybe he is after someone else or racing to
the scene of a crime...there are so many different other reasons for
such an event it is hard to decide. The data set, however, say you
glance at your speedometer and see you are going 70 in a 30 mph zone,
helps you decide among all these previously equal probabilities.

> I think what we are really doing is not so much saying: "Hey, there are all
> these taxa out there and they could be joined in any one of N number of
> bifurcating trees.  I think I'll find out which one of those N trees is the
> real one by gathering a bunch of data."  Instead, we are saying: "Hey,
> there are all these characters out there, and each one suggests groupings
> of taxa (some groupings the same, some different and contradictory, some
> different and non-contradictory).  I think I'll find out what meta-grouping
> they most consistently suggest when considered together."  If I'm right,

Sounds like a modification of phenetics. I'll go with that. Maximum
parsimony is a fine clustering idea, because of its phylogenetic
component it is theoretically better for classification than other
clustering methods.

> then this means that a whole lot of trees were never contenders in the
> first place, and it also seems to recommend character-based support

You can eliminate never-contenders informally as above or through
Bayesian statistics by giving them very low probabilities based on your
belief about the chance of the same advanced character found in two
terminal taxa being caused by a shared ancestor (1 state change) or
occurring homoplastically (2 state changes). Longer trees are far less

> measures like Bremer support and bootstrapping, which R.Z. has expressed
> doubts about.
Not doubts. I'd just like to have some confidence about what they mean.
Bremer support refers to the first sets of trees of a particular number
of steps longer than the shortest tree. Does the sum of the
probabilities of these trees pass .5? Is there more evidence for or
against? Same with numbers of large must the
numbers of synapomorphies be on a branch to give probabilistic
confidence (P>.5) to the subclade? Sanderson has written a nice
criticism (in Syst. Biol. I think) of using bootstrapping in
phylogenetics (the frequencies are modeled, not real, and the data are
not infinite, etc.).

Richard H. Zander, Buffalo Museum of Science
1020 Humboldt Pkwy, Buffalo, NY 14211 USA bryo at

More information about the Taxacom mailing list