Probabilities on Phylogenetic Trees

Richard Zander bryo at COMMTECH.NET
Mon Sep 8 13:55:53 CDT 1997

Ted Schultz wrote:
> I am confused by Richard Zander's recent comments on Taxacom about how
> phylogenetic trees have low "posterior probabilities."
> Three questions:
> 1) How do you calculate the posterior probabilities of phylogenetic trees?

Read carefully the excellent paper of
Mau, B., M. A. Newton & B. Larget. 1996. Bayesian phylogenetic inference
via Markov chain Monte Carlo methods. Tech. Rep. 961, Dept. of
Statistics, University of Wisconsin, Madison, July 1996.
(download the postscript file). Nice explanation there.

> Where do you obtain the prior probabilities that are a necessary part of
> the calculation of posterior probability?  (As I understand it,
> maximum-likelihood values are not such probabilities.)

Right. Prior probabilities are the probabilities of the hypotheses
before knowledge of the data set. If there are three trees possible, the
prior probabilities of each are 1/3. The regularity assumption is that
all are of equal probability, prior to knowledge of the data set.

> 2) Where in a reading of Farris 1973 do you find a proof that phylogenetic
> trees have low posterior probability?

That was my point, he left it out for some reason. By extention, his
discussion that A tree one step longer than the shortest tree has in
comparison a 1/4 probability of being correct means that you can sum the
probabilities of all trees one step longer...and as A tree two steps
longer has a 1/16 probablility then you must sum the probabilities of
all trees two steps longer. The logic is clearly that two state changes
are half as probable as one state change.

> 3) If a tree topology is 99% correct and 1% incorrect, does that tree fall
> into the category of "incorrect"?  If so, this might explain why any

> topology has a low probability of being 100% correct, but a high
> probability of being, say, 95% correct.  (I can't say for certain, however,
> since I don't know where these posterior probabilities are coming from.)
> A way of dealing with the fact that any given tree may have a low
> probability of being 100% correct (in parsimony or likelihood methods) is
> to include in your solution set multiple trees such that, together, you are
> confident (at, for instance, the 95% level) that the set includes the
> correct tree.  This "error" range on trees has been advocated for some
Wonderful, if you can do so. Let's take the case of Bremer support.
Suppose you have Bremer support for a subclade that indicates that the
branch is stable for all trees 1, 2 and 3 steps longer than the shortest
tree. That's a lot of trees, and they all have relatively higher
posterior probability of being the same as the true tree than all the
other trees possible. Now what must be proved, so that you have more
evidence for than against, is that the sum of the probabilities of all
trees through three steps longer than the shortest tree exceed .5. The
cichlid fish study of 32 terminal taxa cited above deals with 10 to the
40th power possible hypotheses (trees) (says Mau et al., p. 14). The sum
of the posterior probablilities of the 14 most likely trees is .50. This
is encouraging, and these trees (topologies are shown in parenthetical
notation) are somewhat similar but subclades of subtopologies are not
alike. Only two subtopologies occurred in all trees in the .5 cumulative
credible region, these two consisting of 7 taxa and 5 taxa,
respectively, and these groups fit a requirement for more evidence for
than against. (The paper is dense, I may have misread something here.)
THEN we come to whether the various assumptions (biological clock, other
regularity impositions) really allow some kind of phylogenetic
reconstruction, or are we just clustering, again, taxa by shared
presumeably advanced traits and merely are interpreting this as
phylogenetic reconstruction.


Richard H. Zander, Buffalo Museum of Science
1020 Humboldt Pkwy, Buffalo, NY 14211 USA bryo at

More information about the Taxacom mailing list