Status of plant systematics
Daniel Barker
sokal at HOLYROOD.ED.AC.UK
Sat Sep 6 21:27:56 CDT 1997
On Fri, 29 Aug 1997, Richard Zander wrote:
> Fear not that molecular systematics will render us classical
> systematists obsolete. The problem is in computerized "reconstruction,"
> which is all pretty much Bayesian. Check Farris' 1973 Dollo parsimony
> paper for a curiously slanted discussion of probability that hides an
> important fact: computerized phylogenetic analysis usually gives
> improbable results.
> If you analyse a simple three-taxon cladogram, the shortest tree with
> two taxa sharing one advanced character has a .67 posterior probability
> of being right. This is better than random .33 but compared to the sum
> of the other two hypotheses (possible trees), .33, its just fair. With
> two characters, the probability of the shortest tree is .80, better.
> With 4 terminal taxa, however, you have 9 hypotheses, and the likelihood
> of the shortest tree must be far greater than with just 3 hypotheses to
> be "probable." For trees with several terminal taxa, the posterior
> probablility of the shortest tree is less than .5, and there is more
> evidence against than for.
There are optimization problems in all fields of science, and the ways to
solve them are basically independent of the problem and optimality
criterion. It doesn't matter whether you want to minimize air resistance
of a car, minimize total wire length in a microchip or minimize potential
energy in 3 dimensional protein structure. In the case of parsimony, we
try and minimize the length of a phylogenetic tree. As with the problems
of design and biophysics, we can find a reasonable solution using standard
approaches such as greedy algorithms, hill climbing, simulated annealing
or a combination of any of these. For example, typically Phylip's DNAPARS
uses a greedy algorithm to propose a tree, which it then optimizes using
hill climbing.
It does not matter if there is more evidence against the solution than for
it. I have never heard anyone use this reason to suggest that other
optimality criteria are irrelevant. For example, it would be unreasonable
to scrap a microchip on the grounds that the probability its use of wire
is optimal is just 1%. It would be even less reasonable to abandon
computerized simulation in microchip design because of this figure. All
that matters is that what we have a good guess, and computer programs can
help us guess fairly well.
Some parts of the solution to an optimization problem may be better than
others. This is certainly true of phylogeny: few phylogeneticists would
claim to have EVER published a cladogram that is wholly correct. But few
would claim (even privately) to have ever published one that is wholly
incorrect, either. The hope is that a cladogram is reasonable overall, and
beyond doubt correct in some places. Measures like the CI and RI tell us
the overall quality of the tree: if, after minimizing homoplasy, there is
still a lot of homoplasy, we should not place faith in the reconstruction.
We might decide to find some data that better matches the optimality
criterion (i.e. characters with a lower rate of evolution), or change the
optimality criterion to better match our data (i.e. correct for multiple
hits). Anyway, apart from such measures of overall tree quality, quality
of specific clades in a tree can be checked using the bootstrap. If a
clade has high bootstrap support, we believe in it, even if the rest of
the tree is rather dire.
> [cut to keep my reply fairly short ...]
> Phylogeneticists of any stripe are getting away with maybe one paper
> in ten having a good chance of a correct result, and which paper that is
> is unknown. (The intuited regularity assumptions of Bayesian analysis
> are another problem which I won't even bring up...) Simplicity is useful
This frequentist (not Bayesian) attitude to confidence is fine for simple
hypothesis testing, for example testing the effect of treatments on
disease, or fertilizer on yield. In such cases we want to know the answers
to basic questions like 'Does this work? Is it worthwhile?' But trees are
not hypotheses that are to be rejected or accepted with absolute
certaintly. The probability indicates how much we should believe the tree
overall, and is in this respect similar to the RI and CI.
> in other sciences where the "least wrong" answer(s) out of many can be
> immediately checked for correctness, but we have no such tests in
> systematics.
I think, with the CI, RI, tree probability, bootstrap and at least half a
dozen other statistics, parsimony has more immediate checks for
correctness than most optimization problems. We do have such tests, and we
have so many of them that few people can keep up with developments and
understand them all.
> This will be abundantly evident someday as maximum likelihood analyses
> become more prevalent and posterior probabilities are published, and
> then it's back to monography for systematists.
maybe back to monography, but cetainly not for the reasons you give.
Further reading:
Buck, B. & Macaulay, V. A. (eds) 1991. Maximum entropy in action (Oxford &
New york: Clarendon Press)
Kirkpatrick, S., Gelatt, C. D. & Vecci, M. P. 1983. Optimization by
simulated annealing. Science 220(4598): 671-680
>
> --
>
> *******************************************************
> Richard H. Zander, Buffalo Museum of Science
> 1020 Humboldt Pkwy, Buffalo, NY 14211 USA bryo at commtech.net
> *******************************************************
>
Daniel Barker,
Institute of Cell and Molecular Biology,
University of Edinburgh,
Daniel Rutherford Building,
King's Buildings,
Mayfield Road,
Edinburgh
EH9 3JH
UK
http://www.icmb.ed.ac.uk/sokal.html
More information about the Taxacom
mailing list