Status of plant systematics
James Francis Lyons-Weiler
weiler at ERS.UNR.EDU
Sat Sep 6 14:06:16 CDT 1997
On Sat, 6 Sep 1997, Daniel Barker wrote (in response to R. Zander):
> There are optimization problems in all fields of science, and the ways to
> solve them are basically independent of the problem and optimality
> criterion. It doesn't matter whether you want to minimize air resistance
> of a car, minimize total wire length in a microchip or minimize potential
> energy in 3 dimensional protein structure. In the case of parsimony, we
> try and minimize the length of a phylogenetic tree. As with the problems
> of design and biophysics, we can find a reasonable solution using standard
> approaches such as greedy algorithms, hill climbing, simulated annealing
> or a combination of any of these. For example, typically Phylip's DNAPARS
> uses a greedy algorithm to propose a tree, which it then optimizes using
> hill climbing.
This is all well and good, but you are referring to ways to
optimize the process of filling a criterion - i.e,. there is
a difference between an algorithm and a criterion.
The alternatives you list are ways to go about meeting the
criterion which is in question. Following your analogy,
there may be many ways to increase the speed of a car, but
the question is whether speed is what needs to be
optimized.
>
> It does not matter if there is more evidence against the solution than for
> it. I have never heard anyone use this reason to suggest that other
> optimality criteria are irrelevant. For example, it would be unreasonable
> to scrap a microchip on the grounds that the probability its use of wire
> is optimal is just 1%. It would be even less reasonable to abandon
> computerized simulation in microchip design because of this figure. All
> that matters is that what we have a good guess, and computer programs can
> help us guess fairly well.
But this is exactly what the positivists who push parsimony
rely upon most. Again, to take the analog route, it would
be unreasonable to scrap a monophyletic group on the
grounds that it does not exist on a tree that is one step
shorter than 10,000 trees where it does exist - and such
a conclusion uncovers a shortcoming of the criterion (more
evidence in tree length alone). Parsimony has been to some
degree generalized with the decay index (a.k.a. Bremer support)
and for Paup users, go to Joe Felsenstein's web page and
follow the link to Autodecay, written by Torsten Eriksson
and Niklas Wilkstr:om.
>
> Some parts of the solution to an optimization problem may be better than
> others. This is certainly true of phylogeny: few phylogeneticists would
> claim to have EVER published a cladogram that is wholly correct. But few
> would claim (even privately) to have ever published one that is wholly
> incorrect, either. The hope is that a cladogram is reasonable overall, and
> beyond doubt correct in some places. Measures like the CI and RI tell us
> the overall quality of the tree: if, after minimizing homoplasy, there is
> still a lot of homoplasy, we should not place faith in the reconstruction.
> We might decide to find some data that better matches the optimality
> criterion (i.e. characters with a lower rate of evolution), or change the
> optimality criterion to better match our data (i.e. correct for multiple
> hits). Anyway, apart from such measures of overall tree quality, quality
> of specific clades in a tree can be checked using the bootstrap. If a
> clade has high bootstrap support, we believe in it, even if the rest of
> the tree is rather dire.
Again, the question of suitability of a measure is cogent -
the homoplasy we measure through CI, HI, and any tree-
based measure of information is dreadfully confounded
when errors exist in the optimal tree - evolutionary homoplasy,
i.e., true homoplasy, can give us the wrong tree, and then
what do these measures mean? They may under these conditions
underestimate the level of homoplasy. I heartily recommend two
chapters in the book Homoplasy (eds. Sanderson and Hufford):
one by Archie ("Measures of homoplasy"), the second by
Chang and Kim ("The measurement of homoplasy: a stochastic view")
1996.
>
> I think, with the CI, RI, tree probability, bootstrap and at least half a
> dozen other statistics, parsimony has more immediate checks for
> correctness than most optimization problems. We do have such tests, and we
> have so many of them that few people can keep up with developments and
> understand them all.
I try, and I agree, it can be overwhelming.
See above comments on CI and HI. The homoplasy excess ratio
comes closest to being free of the confounding, in my humble
estimation. On the boostrap, re-sampling a character state
matrix that has noise, and say a long branch, might lead to
just this result - and the group with the highest
bs value may be artefactual in every parsimony tree found
during resampling owing to branch attraction.. The bs does not
take care of this and other problems. It is, again in my humble
estimation, a method that provides some indication of how decided
the data are about the result we would "likely" get with
exact enumeration; i.e., it reflect more about the tree that
the data appear to support, and as such is more of an indicator
of how well the bs resampling approximates the tree(s) that the
data support, i.e., goodness of fit to the specified criterion,
namely parsimony. This is not the same as saying it is
a good estimate of the phylogeny; for that, the robustness of
parsimony must again be scrutinized.
James Lyons-Weiler
More information about the Taxacom
mailing list