Status of plant systematics

Richard Zander bryo at COMMTECH.NET
Sun Sep 7 13:40:06 CDT 1997

Daniel Barker wrote:
> There are optimization problems in all fields of science, and the ways to
> solve them are basically independent of the problem and optimality
> criterion. It doesn't matter whether you want to minimize air resistance
> of a car, minimize total wire length in a microchip or minimize potential
> energy in 3 dimensional protein structure. In the case of parsimony, we
> try and minimize the length of a phylogenetic tree. As with the problems
> of design and biophysics, we can find a reasonable solution using standard
> approaches such as greedy algorithms, hill climbing, simulated annealing
> or a combination of any of these. For example, typically Phylip's DNAPARS
> uses a greedy algorithm to propose a tree, which it then optimizes using
> hill climbing.
> It does not matter if there is more evidence against the solution than for
> it. I have never heard anyone use this reason to suggest that other
> optimality criteria are irrelevant. For example, it would be unreasonable
> to scrap a microchip on the grounds that the probability its use of wire
> is optimal is just 1%. It would be even less reasonable to abandon
> computerized simulation in microchip design because of this figure. All
> that matters is that what we have a good guess, and computer programs can
> help us guess fairly well.

Again: Occam's Razor and parsimony work fine in producing solutions that
are best of many, though of low probability in themselves. In other
sciences, these "best guesses" can be checked immediately, but there is
no such provision in systematics. Low probability solutions remain just
that, low probability.

> Some parts of the solution to an optimization problem may be better than
> others. This is certainly true of phylogeny: few phylogeneticists would
> claim to have EVER published a cladogram that is wholly correct. But few
> would claim (even privately) to have ever published one that is wholly
> incorrect, either. The hope is that a cladogram is reasonable overall, and
> beyond doubt correct in some places.  Measures like the CI and RI tell us
> the overall quality of the tree: if, after minimizing homoplasy, there is
> still a lot of homoplasy, we should not place faith in the reconstruction.

To be partly correct in some acceptable, reasonable way, and an advance
in knowledge the cladogram has to be, gee, I'd expect it to be more than
half right. The several likeliest cladograms would have to agree in more
than half their topologies, AND their posterior probabilities that they
are correct must add to >.5. Demonstrate this in any published
phylogenetic analysis of more than a very few terminal taxa, and...well,
I'll have other complaints, I'm sure.

> We might decide to find some data that better matches the optimality
> criterion (i.e. characters with a lower rate of evolution), or change the
> optimality criterion to better match our data (i.e. correct for multiple
> hits). Anyway, apart from such measures of overall tree quality, quality
> of specific clades in a tree can be checked using the bootstrap.  If a
> clade has high bootstrap support, we believe in it, even if the rest of
> the tree is rather dire.

Not me. The bootstrap requires the potential of infinite information. No
such thing is possible. Sanderson has published pointed objections to
the bootstrap.

> > [cut to keep my reply fairly short ...]
> > Phylogeneticists of any stripe are getting away with maybe one paper
> > in ten having a good chance of a correct result, and which paper that is
> > is unknown. (The intuited regularity assumptions of Bayesian analysis
> > are another problem which I won't even bring up...) Simplicity is useful
> This frequentist (not Bayesian) attitude to confidence is fine for simple
> hypothesis testing, for example testing the effect of treatments on
> disease, or fertilizer on yield. In such cases we want to know the answers
> to basic questions like 'Does this work? Is it worthwhile?' But trees are
> not hypotheses that are to be rejected or accepted with absolute
> certaintly. The probability indicates how much we should believe the tree
> overall, and is in this respect similar to the RI and CI.

No, no. I require no absolute certainty. A chance of a cladogram being
right 1 in 10 times means, in the frequentist sense, that if the data
set occurred many times, 1 in 10 of the true trees would be the same as
the max parsimony or max likelihood solution. You don't know which, and
will never know.

> > in other sciences where the "least wrong" answer(s) out of many can be
> > immediately checked for correctness, but we have no such tests in
> > systematics.
> I think, with the CI, RI, tree probability, bootstrap and at least half a
> dozen other statistics, parsimony has more immediate checks for
> correctness than most optimization problems. We do have such tests, and we
> have so many of them that few people can keep up with developments and
> understand them all.

Nope. Farris pointed out long ago that synapomorphies imply homoplasy
(just one more step and a synapomorphy deconstructs into homoplasy).
These tests you mention generally only test for the presence of
information put there already by systematists. The data on the terminal
taxa are data on mental constructs, which reflect apprehensions of
nature "out there." In addition to being necessarily one step removed
from reality, these assume that the shortest tree is ALREADY the best
solution, and largely just compare shortest trees or potentially
shortest trees, whereas the true tree is most probably (almost
certainly) one of the longer trees.
> >   This will be abundantly evident someday as maximum likelihood analyses
> > become more prevalent and posterior probabilities are published, and
> > then it's back to monography for systematists.
> maybe back to monography, but cetainly not for the reasons you give.

Exactly for the reasons I give.

> Further reading:
> Buck, B. & Macaulay, V. A. (eds) 1991. Maximum entropy in action (Oxford &
> New york: Clarendon Press)
> Kirkpatrick, S., Gelatt, C. D. & Vecci, M. P. 1983.  Optimization by
> simulated annealing. Science 220(4598): 671-680
> >
> > --
> >
> > *******************************************************
> > Richard H. Zander, Buffalo Museum of Science
> > 1020 Humboldt Pkwy, Buffalo, NY 14211 USA bryo at
> > *******************************************************
> >
> Daniel Barker,
> Institute of Cell and Molecular Biology,
> University of Edinburgh,
> Daniel Rutherford Building,
> King's Buildings,
> Mayfield Road,
> Edinburgh
> EH9 3JH
> UK


Richard H. Zander, Buffalo Museum of Science
1020 Humboldt Pkwy, Buffalo, NY 14211 USA bryo at

More information about the Taxacom mailing list