[Fwd: Re: Probabilities on Phylogenetic Trees]

James Francis Lyons-Weiler weiler at ERS.UNR.EDU
Thu Sep 18 09:16:24 CDT 1997

On Thu, 18 Sep 1997, Tom DiBenedetto wrote:

>  James Francis Lyons-Weiler wrote:
>       Cladistics as a formal processes of inference
> >        does not make it incompatible with probabilistic thinking.
> I'll leave the question of compatability to you, I'll just say that
> in any case, it is not based on a probabilistic approch (except in
> sensu lato).

        How can you say that parsimony is not probabilistic,
        when the entire concept is based on the assumption
        that evolutionary transformations are rare?  You never
        really address or responded to the point that
        the "test" itself is influenced by how well this
        assumption is met for a given data set.
> >> The probability that a given tree is a reconstruction of the
> >> phylogenetic
> >> pattern is maximized for the tree which orders homologies
> Yes, we all know about long branches. IF your arguments against
> parsimony rest just on that, then we are home free! What to do about
> long branches? Sample your taxa a bit more carefully (break up the
> long branches),,that will cure many situations (and you will do that
> eventually anyway..). Use different
> character systems (perhaps some in which such spurious patterns are
> rather difficult to imagine. That should take care of the rest. In a
> worst case situation we have what? A taxon which has evolved so much
> in every character system that all information is overwritten and
> spurious patterns emerge? Gee, I guess we will fail,,,and I await
> word of the method which could possibly reconstruct the history of a
> group with no historical information.

        Why not bother to ask if there are observable consequences
        we can detect in patterning of states that would at
        least indicate that long branches are a real possibility
        (next year, MPE Feb/Mar)...

> >
> >        Again, this is not universal.  The probability that the
> >        parsimony tree will be correct is a complex function of
> >        the (1) shape of the true tree, (2) the evolutionary
> >        distance (= rates of evolution, or taxon sampling) among
> >        the taxa sampled, AND (3) the number of characters one has.
> I didnt say universal, did I? As to your points. Does #1 mean that if
> the true tree aint a tree, then we are wrong? OK,,,,2. no absurdly
> long branches? OK,,,,3. as many as possible? OK
> See, I am really quite an agreeable guy!

        I must say I concur.  You're nicer than some, too.

> >> Well, I must admit that I am quite surprised by some of the results
> >> of statistical phylogenetics. However I tend to feel that the results
> >> gain corroboration from the amount of empirical evidence which
> >> supports them, rather than from some quantification of my surprise at
> >> the findings.
> >
> >        Then you mean to CONFIRM, not to test.  Philosophers
> >        distinguish markedly between corrboration and confirmation;
> >        in fact, they are antithetical.
> Well it is good you understand they are different (antithetical is a
> bit strong),, but I dont think you are clear on why they are
> different. Confirmation is what you get when you go out to find
> support for your hypothesis (an approach called "verificationism" by
> some, watch the literature for news on this front).

        Well, Popper's life work was to show that verificationism
        sensu the Vienna circle was bankrupt, and succeeded in
        doing so (3 papers with proof of the ABSENCE of probabilistic
        support through inductive reasoning).
Corroboration is
> what you get from
> a test of your hypothesis, (trying to make it fail); it is a positive
> result from a powerful test. Homology hypotheses are tested, not
> confirmed. They are subject
> to falsification by other homologies,,,,given the fact that this
> happens in just about every cladogram,. I cant see how you can deny
> it.

        You can't at once induce and deduce.  Which is it?
        Your sets of hypotheses include in the background information
        that some of the hypotheses (which through consilience
        or congruence or whatever) provide a test are likely
        to be false.  I don't feel comfortable relying on the
        assumption that a majority of them are not false.

. It
ironically enough, statistical phylogenetics (and yeah I
> am thinking of max-like especially) which is inherently a
> CONFIRMATIONALIST approach (pass that baloney please). A max-like
> phylogeny is not the result of any sort of a test. Is it capable of
> falsifying the model which underlies it? Of course not. It is merely
> a development of the phylogenetic implications of the model in light
> of the data. There is no test anywhere in sight. A parsimony
> critereion (congruence test) does test each of the individual
> homology hypotheses, and when they pass that test they are
> corroborated.

        You're not alone in that... and recent developments
        by Yang, Chang, Steel, Swofford and Sullivan, and
        others (including a minor point by me on infinitely
        many solutions) are beginning to make these points
        more clear.  You may be surprised to learn that
        I feel rather the same way about presumptions of
        models of evolution... and your polite nods
        in the direction of accomplishments, etc. are
        a welcome respite from the regular whitewash...
        but then that brings us back to the need for Bayesian
        priors, which some max lik people have started to
        play with more recently.  The nice thing about
        maximum likelihood (as opposed to say, I don't
        know, cladistics?) is that the proponents of the
        methods (some) have (tried) to state the limitations
        of the methods clearly, they (some) have tried to
        make all the assumptions of the methods obvious.
        A ratehr disturbing trends has started recently,
        where (as you point out) students learn it, and
        think it is gospel, and then publish astounding
        discoveries like hypothesis testing in phylogenetics
        in major science magazines, obviously understating
        limitations, and overstating strengths.  But why
        should we expect less?  Cladists and max lik people
        are human... and humans tend to go to war over ideas.
> >      A long string of confirming
> >        instances leads to ZERO probabilistic support for the next
> >        such instance (I've seen lots of grass, but my bet is on
> >        the fact that it's NOT all green).  Measuring the degree of
> >        covariation beyond that which one expects by chance alone
> >        is a good way out of the confirming instances straightjacket.
> Another way out is not to get in in the first place. Parsimony does
> not amount to a string of confirming instances, because each new
> character has the power to falsify the original hypothesis.

        I think you know my position here; it's no good
        to state and restate theses and antitheses.  Show
        me the money.  Where is the empirical proof of
        mx pars as a critical test?  Where is atheoretical
        proof?  I've searched the literature far and wide
        for such papers. Consensus among people does not make

> >      the relevant question not answered by (gulp)
> >        cladists is how much corroboration should be attributed to
> >        hypotheses when they pass the test?  For Popper,
> >        c = 1/b, where b is the boldness of the hypothesis...
> >        so the question becomes, how bold is your hypothesis?
> very
> :)
        If you hypothesis is that parsimony is a test, I agree.
> >        That question can be directly answered by addressing the
> >        question, "Well, how often do I expect this level of
> >        (congruence, covariation) by chance alone?", which is
> >        a statistical question applied to a cladistic problem.
> Actually, if this is of interest to you, then go for it. I dont see
> it as having any impact whatsoever on how we go about doing
> systematics. Quantify boldness, tell me how often I can expect
> results like the ones I got, etc.etc. With all due respect, these are
> not burning questions in my mind. They wont influence how I study
> characters, or code them, nor wiil they cause me to prefer
> less-than-parsimonious solutions, so I can do my work in peace,
> before you even show up.
> So tell me, on a conceptual level, how do you imagine quantifying
> something like the expected congruence of a set of morphological
> characters? The image of them showing distribution patterns caused
> "by chance" is a bit too abstract for my li'l head.

        On a conceptual level?  OK - conceive of this..
        imagine that a majority of your hypotheses of homology are
        wrong.  You will find congruence, nevertheless,
        by chance alone.  That much has been empirically
        demonstrated by Archie and scores of others.

> Look James,,I have no problem with statistics,,,statistics are
> wonderful,,,I've delved a bit myself,,,my argument is not with
> statistics in general,,it is with the nonsense assertions by folks
> like Felsenstein and Edwards, that parsimony is a statsitical method
> (albeit a crude one), is model based (although the model remains
> forever "implicit"),

        Why implicit?  Are not transformation series probability
        statements incarnate?

, and
that (at least by implication) one can do a
> better job at reconstructing phylogeny by focussing on the
> development of statistical methods for discerning pattern from
> (effectivly) only sequnece data.

        I have NEVER heard Joe F. or Edwards, for that matter
        state publicly that morphological data are not potentially
        informative, or that sequences are the only way to go.
        I think they are a tad more objective than you give them
        credit for.

. This
attidude may have been a ploy
> to enable people who are inherently gifted for statsitical work to
> play the phylogeny game. but it became a mantra which has infected a
> generation of
> "systematists" who actually believe on faith that it is true, and who
> never  bother to learn what systematics is really about (and who also
> often never bother to study the organsims whose phylogeny they are
> supposedly "estimating").

        Have you seen the movie "conspiracy theory?"  You
        might like it.  I don't know about the recently knighted
        maximum likelihood schoolboys, but MY impetus for
        creating statistical tests is biologically-oriented.
        I came to the field to use the trees everyone was
        excited about to help test historical hypotheses for
        explanations of patterns of ecological diversity.. a noble
        cause, yes?  But the trees for the groups I was
        most interested in were not in any way from my
        perspective critically generated... the taxonomists
        followed the methodological paradigm of the day, and
        did they best they could... and they did a lot of
        damn hard work to get those trees.
        But I saw no evidence of testing, no attempts to
        falsify... and the folks publishing the trees
        had low confidence in all trees (even there own)...

> percieve to be the real methods of cladistic systematics.

        This SLAYS me.  Are you saying that cladistics is
        a static, stagnant field that does not adapt and
        grow and improve itself by adopting more stringent
        tests when they eventually (and inevitably) are
        produced?  If so, then cladistics will become,
        over time, like Latin... cladobabble will be
        written, but not spoken

. Statistics
> are powerful tools, but they are not the only way to do science. In
> fact, they are sometimes simply inappropriate to the question at
> hand. Felsenstein et.al. made sure to slam the door on that
> possibility from the get go,,but actually, I think that the problem
> of discerning the singular, unique historical branching pattern is
> inherently NOT a statistical question.

        Statistics is not a way to do science; it is a check on
        science.  Your assumption that there is a singular
        branching pattern is a good example... that is actually
        a hypthesis, and it can be tested with the appropriate
        statistics.  WHEN the data pass such tests, THEN the
        assumption is bolstered.  Stats are just flexible checks
        on these types of assertions.  Everything in science should
        be, at least in principle, assailable... but we need
        critical tests of our data, our preconceived notions,
        and so on.  To this I think you agree.

> I do NOT mean by this any disrespect for the accomplishments of that
> field, nor for the impressive development of the various statistical
> tools, my argument is merely with the scope of applicability of these
> tools. For the purpose of modeling genetic evolution, and perhaps
> even for the purposes of assessing the homology at various sites (in
> the pre-parsimony stages of a systematic analysis), these methods
> either do, or may possibly have enormous value. But I do not feel
> that they are capable of addressing the real problem which
> systematists address,,finding the branching pattern of phylogeny in a
> framework which draws from all the legitimate sources of evidence

        This, too is a statistical question.  Are these data
        legit, or are they noise?

> available.
At the heart of it, I think the dispute has been ecentered
> on the statisticians refusal to acknowledge the legitimacy (and
> sometimes even the existence) of the "traditional" approaches to
> systematics (the approaches which still can be said to have taught us
> just about *everything* we know about phylogeny).

        Really.  That's going too far, Tom.
> The concepts of homology, pattern recognition, and parsimony are not
> going away.

        I should hope not.  They are bloody useful...  but
        they need independent checks.  Otherwise, folks
        will be asserting particular homologies.

        On the notion of ad-hocness, here's a quandry.  Why,
        if the max pars tree results in the fewest ad-hoc
        hypotheses, are the jounral pages FILLED with crap
        about why the researcher didn't get the tree they
        expected?  IF they get the wrong tree, they almost
        invriable make up stories to make the tree make
        sense... and they are creative, and sometimes
        interesting... but they are post facto, and ad-hoc.
> When they finally come to terms with all this, perhaps we can move on
> to a rational discussion of how the various conceptual frameworks can
> fit together.

        I am glad to see that you consider this a real possibility.
> I dont consider you a lost
> cause, because I guess you really dont argue passionatly for max-like
> type approaches, and some of your uses of statistics seem not to
> necessarily interfere with cladistic practices. But you do seem to
> resist the notion that parsimony approaches may have legitimate
> justifications outside of the world of statistics, and you do seem
> generally uncomfortable with a world in which the statistical horizen
> is not infinite.

        I've outlined my positions i think fairly well,
        but I've never intimated that statistical inference
        is unbounded.  In fact, I've tried to call for more
        realism in the representation of statistical methods.

        My golden rule is that ALL methods of inference are limited.
> >        Try disproving your hypotheses with a test that has some
> >        teeth.  You'll be surprise at the increase in your
> >        subjective confidence, which (contra Popper) is all we
> >        really have anyway.
> I tried your test on my morpho characters,,,I was so damn significant
> it crashed my li'l mac,,,,I've not come down from that cloud yet!

        You are indeed significant, Tom (typo) :)

        Did you read the manual????

> I think you should go through all that a bit slower. I formulate a
> set of hypotheses, each of which engenders an expectation of
> congruence. I test these hyptheses contra eachother. I prefer the set
> of hypotheses which survive the test, yes. yikes?

        I responded earlier to this; and the point is
        that of high logical probability.  You state that
        you want hypotheses of high probability, and yet
        insist that when you have a set where p is high,
        parsimony of all things will critically test
        these already very reasonable hypotheses.
        I don't see how it is possible for a
        set of hypotheses with high logical probability
        can be afforded the dose of corrorobotation you
        give them when by definition they have a
        high logical probability./

> >        Here's an analogy.  I am studying a community of
> >        organisms for evidence of competition.  I formulate
> >        hypotheses of competition under a criterion that
> >        demands that they indicate competition.  Then I
> >        examine my sets of hypotheses for competition...
> >        BINGO!  competition!
> No. competition=homology the first two times you use the word, then
> competition = congruence the third and fourth time you use it.

        OK engenders an expectation for the second time, but
        I see the analogy of looking among sets of hypotheses
        of competition for competition as directly analogous
        to making a list of hypotheses of homology, each of
        which has a high probability (that's how they are
        chosen), and then finding congruence.  Also, the
        point you missed it that the ecologist has an alternative...
        to ask how often one expects to the see the level or
        degree of competition in a null world where competition
        does not exist, or is equally probable among all species,
        or whatever interesting hypothesis s/he is testing.  And
        they do it.  They don't rely on mere congruence
> ......and replace "demands" with "expects".
> >        No test.
> Amazing,,,,,look at any cladogram and you will see plenty of homology
> hypotheses brutally splattered  all over the place,,,,falsified,,,but
> not tested, eh? I guess cladistics really is magic!

        They are not really falsified, are they Tom.
        They are only falsified for the time being.

> >>. And I say that you dont know how nucleotides change in these
> >> taxa, and you wont begin to make coherent
> >> statements about how they do until AFTER you have a phylogeny in
> >> hand.
> >
> >        BOSH and poppycock.  I know enough about the processes
> >        of evolution to say that sometimes it leaves behind decent
> >        evidence of shared geneaology, and sometimes it leaves behind
> >        baloney.

        addressed with statistics, everytime.

> >     If I examined the data on a tree FIRST, and the tree is
> >        based on baloney, I'll make coherent statements, but they will
> >        be lunchmeat.  Sandwiches, anyone?

I dont think it is. In fact, most of our knoweldge of these other
factors comes from examination of homoplasy on parsimonious trees
(including those less exact trees made before the method was
formailized in cladistics, and from 'obvious" situations).

How much of this goes on that we don't know about???

More information about the Taxacom mailing list