More on probabilities of phylogenetic trees

Ted Schultz schultz at ONYX.SI.EDU
Thu Sep 11 08:56:36 CDT 1997

Having thought a little more about it, I think there are some problems with
talking about the "posterior probabilities" of phylogenetic trees.

 First of all, the "law of regularity" refered to by Richard Zander is not
necessarily accepted by most Bayesians.  In fact, some Bayesians contend
that assigning equal prior probabilities to alternative hypotheses does not
at all accomplish the goal of "neutrality" toward a priori hypotheses, but
instead creates a probability distribution that is just as biased as other
conceivable ones.  If I understand RZ correctly, he is saying that all
possible N bifurcating trees for some set of taxa must be assigned equal
probability prior to "viewing the data,"  and, if we do assign such an a
priori probability distribution, then we are forced to conclude a
posteriori that the optimal (most-parsimonious/most likely/minimum
distance) tree will usually have a very low probability of being the "true"
tree.  He is also saying that even if we include some "error range"
proportion of suboptimal trees, we still may not achieve an acceptable a
posteriori probability.  This is perhaps not surprising, since the number
of possible alternate "hypotheses" (= topologies) is usually fantastically
large, and if we distribute the a priori probability equally among them,
then a huge amount of evidence will be required to boost the probability on
any given tree to >0.50 (Zander's criterion).

To put this into practical terms (and, really, statistics is supposed to be
a tool in the service of practical thinking), we virtually never expect
that any tree will be just as likely as any other tree before "viewing the
data."  First of all, there are usually other hypotheses floating around
out there that represent our "best guesses" about the phylogeny of a
particular group prior to a new study/revision.  For example, before
carrying out a DNA study of a group of organisms, a researcher will
probably consult the morphological literature and take as his null
hypothesis the best morphologically-based phylogeny for the group.  If no
explicit phylogeny exists, there will still be species groups, subgenera,
genera, tribes, etc., that constitute more-or-less a priori hypotheses of
monophyletic groups.  Sometimes our a priori concept of relationships may
be based on one or a few characters, and the new data may consist of many
more characters that we haven't yet considered.

Another indication that we do not assign equal prior probabilities to all
possible trees is our reaction to the outcome of a new study.  We may look
at a DNA-based phylogeny for a particular group and say "That's ridiculous.
I don't believe it.  The gene they used is obviously inappropriate for
this taxonomic level."  On the other hand, we may look at the results and
say: "Wow.  Features A, B, and C accord dead-on with what we know about
this group.  Therefore I will give credence to feature D, which is actually
quite surprising."

The problem gets even worse when you start talking about estimating not
only topology but branch lengths, states at ancestral nodes, etc.  Then the
set of all possible trees is even more astronomically large.  You are never
in real life going to divide up the a priori probability among all those
trees in anything like equal amounts.

I think what we are really doing is not so much saying: "Hey, there are all
these taxa out there and they could be joined in any one of N number of
bifurcating trees.  I think I'll find out which one of those N trees is the
real one by gathering a bunch of data."  Instead, we are saying: "Hey,
there are all these characters out there, and each one suggests groupings
of taxa (some groupings the same, some different and contradictory, some
different and non-contradictory).  I think I'll find out what meta-grouping
they most consistently suggest when considered together."  If I'm right,
then this means that a whole lot of trees were never contenders in the
first place, and it also seems to recommend character-based support
measures like Bremer support and bootstrapping, which R.Z. has expressed
doubts about.

Ted Schultz, Research Entomologist
Department of Entomology, MRC 165
National Museum of Natural History
Smithsonian Institution
Washington, DC 20560

schultz at
Phone (voice and fax): 202-357-1311

More information about the Taxacom mailing list