Status of plant systematics

James Francis Lyons-Weiler weiler at ERS.UNR.EDU
Sat Sep 6 14:06:16 CDT 1997


On Sat, 6 Sep 1997, Daniel Barker wrote (in response to R. Zander):

> There are optimization problems in all fields of science, and the ways to
> solve them are basically independent of the problem and optimality
> criterion. It doesn't matter whether you want to minimize air resistance
> of a car, minimize total wire length in a microchip or minimize potential
> energy in 3 dimensional protein structure. In the case of parsimony, we
> try and minimize the length of a phylogenetic tree. As with the problems
> of design and biophysics, we can find a reasonable solution using standard
> approaches such as greedy algorithms, hill climbing, simulated annealing
> or a combination of any of these. For example, typically Phylip's DNAPARS
> uses a greedy algorithm to propose a tree, which it then optimizes using
> hill climbing.

        This is all well and good, but you are referring to ways to
        optimize the process of filling a criterion - i.e,. there is
        a difference between an algorithm and a criterion.
        The alternatives you list are ways to go about meeting the
        criterion which is in question.  Following your analogy,
        there may be many ways to increase the speed of a car, but
        the question is whether speed is what needs to be
        optimized.
>
> It does not matter if there is more evidence against the solution than for
> it. I have never heard anyone use this reason to suggest that other
> optimality criteria are irrelevant. For example, it would be unreasonable
> to scrap a microchip on the grounds that the probability its use of wire
> is optimal is just 1%. It would be even less reasonable to abandon
> computerized simulation in microchip design because of this figure. All
> that matters is that what we have a good guess, and computer programs can
> help us guess fairly well.

        But this is exactly what the positivists who push parsimony
        rely upon most.  Again, to take the analog route, it would
        be unreasonable to scrap a monophyletic group on the
        grounds that it does not exist on a tree that is one step
        shorter than 10,000 trees where it does exist - and such
        a conclusion uncovers a shortcoming of the criterion (more
        evidence in tree length alone).  Parsimony has been to some
        degree generalized with the decay index (a.k.a. Bremer support)
        and for Paup users, go to Joe Felsenstein's web page and
        follow the link to Autodecay, written by Torsten Eriksson
        and Niklas Wilkstr:om.
>
> Some parts of the solution to an optimization problem may be better than
> others. This is certainly true of phylogeny: few phylogeneticists would
> claim to have EVER published a cladogram that is wholly correct. But few
> would claim (even privately) to have ever published one that is wholly
> incorrect, either. The hope is that a cladogram is reasonable overall, and
> beyond doubt correct in some places.  Measures like the CI and RI tell us
> the overall quality of the tree: if, after minimizing homoplasy, there is
> still a lot of homoplasy, we should not place faith in the reconstruction.
> We might decide to find some data that better matches the optimality
> criterion (i.e. characters with a lower rate of evolution), or change the
> optimality criterion to better match our data (i.e. correct for multiple
> hits). Anyway, apart from such measures of overall tree quality, quality
> of specific clades in a tree can be checked using the bootstrap.  If a
> clade has high bootstrap support, we believe in it, even if the rest of
> the tree is rather dire.

        Again, the question of suitability of a measure is cogent -
        the homoplasy we measure through CI, HI, and any tree-
        based measure of information is dreadfully confounded
        when errors exist in the optimal tree - evolutionary homoplasy,
        i.e., true homoplasy, can give us the wrong tree, and then
        what do these measures mean?  They may under these conditions
        underestimate the level of homoplasy.  I heartily recommend two
        chapters in the book Homoplasy (eds. Sanderson and Hufford):
        one by Archie ("Measures of homoplasy"), the second by
        Chang and Kim ("The measurement of homoplasy: a stochastic view")
        1996.
>
> I think, with the CI, RI, tree probability, bootstrap and at least half a
> dozen other statistics, parsimony has more immediate checks for
> correctness than most optimization problems. We do have such tests, and we
> have so many of them that few people can keep up with developments and
> understand them all.

        I try, and I agree, it can be overwhelming.

        See above comments on CI and HI.  The homoplasy excess ratio
        comes closest to being free of the confounding, in my humble
        estimation.  On the boostrap, re-sampling a character state
        matrix that has noise, and say a long branch, might lead to
        just this result - and the group with the highest
        bs value may be artefactual in every parsimony tree found
        during resampling owing to branch attraction..  The bs does not
        take care of this and other problems.  It is, again in my humble
        estimation, a method that provides some indication of how decided
        the data are about the result we would "likely" get with
        exact enumeration; i.e., it reflect more about the tree that
        the data appear to support, and as such is more of an indicator
        of how well the bs resampling approximates the tree(s) that the
        data support, i.e., goodness of fit to the specified criterion,
        namely parsimony.  This is not the same as saying it is
        a good estimate of the phylogeny; for that, the robustness of
        parsimony must again be scrutinized.

        James Lyons-Weiler




More information about the Taxacom mailing list