Corroboration

Mon Aug 24 09:49:39 CDT 1998

```I see several messages that refer to cladistics, so perhaps the thunder
clouds of past controversy have cleared? Here is a query about what
corroboration means in phylogenetic reconstruction.

Corroboration of phylogenetic hypotheses is commonly assumed to occur
through increased probability. An analogy: A coin with each side labeled
1 and 2, and a six-sided die are put in a pot. These are previously
proven not loaded. One of these is selected randomly and fairly tossed.
You are told a "1" appeared. You are asked which unseen item, the coin
or the die, generated the "1"? The likelihood of the data set "1" is
highest for the coin, or 1/2, compared to that of the die, 1/6, so the
coin is the optimum hypothesis for generating the data set. Its
posterior probability using Bayes' theorem is

(1/2)/(1/2 + 1/6), or .5/.67, or .75. This is your best bet.

If the same object is again tossed unseen to you and and you are told
another "1" was obtained. The likelihood of this occuring twice with the
coin is 1/2 * 1/2, or 1/4, while with the die is 1/6 * 1/6, or 1/36. The
posterior probability of the coin is now

(1/4)/(1/4 + 1/36), or .25/.28, or .89, a significant increase in
probability.

BUT, in this analogy there is no information in the test AGAINST the
hypothesis that the coin is the source of the data. There is no
contradiction in the data that may be interpreted as support for
suboptimum hypotheses as there is in parsimony or likelihood analyses.

A more appropriate analogy is guessing whether a coin is normal or has
two heads. Both sides of the coin are quickly shown to several observers
viewing it through a obscuring burlap screen. Two thirds say it is a
normal coin, and one-third say it is double-headed. Now a different
group does the same viewing, and again two thirds say it is normal and
one third say it is double-headed. With the second data set, is there an
increase in probability of the optimal hypothesis, that it is a normal
coin? No, because the alternative hypothesis is also corroborated at its
level of probability. You have corroboration only that the probability
of the coin being normal is two-thirds.

The same is true with phylogenetic reconstruction. Analysis of a
sequence generates likelihoods for various phylogenetic trees, some more
reasonable than others, but none of them yielding high probability.
Analysis of a second sequence, which generates similar likelihoods, does
not increase the probability that the optimum tree is the true tree.
Corroboration of the sort needed for phylogenetic reconstruction must
involve high probability of a single tree or subclade from two or more
analyses. This same is true of "corroboration" in parsimony analyses
where two independent data sets may yield the same optimum tree, yet
also many suboptimal but reasonable trees.

I feel that strong Bremer support and high posterior probability are the
only measures of reconstruction in phylogenetic analysis, because then
there are no reasonable alternative trees. Results of this nature are,
however, few and far between the the published literature.

--

Richard H. Zander
Curator of Botany, Buffalo Museum of Science
1020 Humboldt Pkwy, Buffalo, NY 14211 USA
bryo at paradox.net   voice: 716-896-5200 ext. 351

```