# [Fwd: Re: Probabilities on Phylogenetic Trees]

Tom DiBenedetto tdib at UMICH.EDU
Thu Sep 18 07:22:02 CDT 1997

``` James Francis Lyons-Weiler wrote:

>        You determine, however, whether the evidence is congruent
>        or not one a parsimony tree.  Hence statements of congruence
>        are inferences, not observations.

I dont see this as a crucial point,,you can see it either way. Often
I can see the congruence,,I can just look at a matrix. I can even see
the incongruence. I can even see the most parsimonious tree, if the
matrix is small enough. Is it a complex observation, or an inference?
...whatever...

>        This congruence is measured
>        by the degree to which (inferred) character state TRANSFORMATIONS
>        covary on the same branch.  The hypotheses of transformation
>        imply process (you don't aquire vertebrae without process,
>        do you Tom?).

you get stuck on that shift key for a moment there? Try a little more
baloney grease on your fingers,,,that might help! Yes, everything
about phylogeny implies process. Patterns imply processes,,thats how
we managed to figure out that there is such a thing as an
evolutionary process,,,because character state distributions seem to
form a pattern,,one recognized long before an adequate process was
imagined. That is also why phylogenetic pattern reconstruction is the
front door to evolutionary process investigation. Noone has ever
imagined that the one doesnt imply the other. That does not mean that
you need to (or can) know the process before you discern the pattern.

Cladistics as a formal processes of inference
>        does not make it incompatible with probabilistic thinking.

I'll leave the question of compatability to you, I'll just say that
in any case, it is not based on a probabilistic approch (except in
sensu lato).

>> The probability that a given tree is a reconstruction of the
>> phylogenetic
>> pattern is maximized for the tree which orders homologies
>> parsimoniously; i.e. with the minimal number of steps. Is
>> this really that complicated?
>
>        This is merely an assertion; a hope.  It has been
>        known for a VERY long time (sorry, Joe) since 1978
>        that conditions exist that have perfectly mundane,
>        realistic predicates that will cause the mp tree to
>        have a low probability of being a reconstruction.

parsimony rest just on that, then we are home free! What to do about
long branches? Sample your taxa a bit more carefully (break up the
long branches),,that will cure many situations (and you will do that
eventually anyway..). Use different
character systems (perhaps some in which such spurious patterns are
rather difficult to imagine. That should take care of the rest. In a
worst case situation we have what? A taxon which has evolved so much
in every character system that all information is overwritten and
spurious patterns emerge? Gee, I guess we will fail,,,and I await
word of the method which could possibly reconstruct the history of a
group with no historical information.

>        Long branch attraction can occur simply because of
>        sparse taxon sampling.  If enough time has passed
>        between two nodes in the true tree without speciation,
>        mutations and character evolution can erase the
>        phylogenetically informative aquisitions.  It is
>        a fact that if you don't sample a taxon, it might
>        as well be extinct (for the purposes of your analysis),
>        so long branch attraction can occur because we don't
>        have sufficient funds to sample some extant taxa,
>        to spend time on character analysis of some taxa we
>        have got, etc.  (I've a paper coming out soon that
>        describes how to find long branch taxa... and yes,
>        they mislead morphological data, too).

very good, we see things the same way. Do lousy systematics and get
lousy results. Big news!

>> given the set of homologies, it is highly probable
>> that the shortest tree reflects the pattern of taxic divergence.
>
>        Again, this is not universal.  The probability that the
>        parsimony tree will be correct is a complex function of
>        the (1) shape of the true tree, (2) the evolutionary
>        distance (= rates of evolution, or taxon sampling) among
>        the taxa sampled, AND (3) the number of characters one has.

I didnt say universal, did I? As to your points. Does #1 mean that if
the true tree aint a tree, then we are wrong? OK,,,,2. no absurdly
long branches? OK,,,,3. as many as possible? OK
See, I am really quite an agreeable guy!

>> Well, I must admit that I am quite surprised by some of the results
>> of statistical phylogenetics. However I tend to feel that the results
>> gain corroboration from the amount of empirical evidence which
>> supports them, rather than from some quantification of my surprise at
>> the findings.
>
>        Then you mean to CONFIRM, not to test.  Philosophers
>        distinguish markedly between corrboration and confirmation;
>        in fact, they are antithetical.

Well it is good you understand they are different (antithetical is a
bit strong),, but I dont think you are clear on why they are
different. Confirmation is what you get when you go out to find
support for your hypothesis (an approach called "verificationism" by
some, watch the literature for news on this front). Corroboration is
what you get from
a test of your hypothesis, (trying to make it fail); it is a positive
result from a powerful test. Homology hypotheses are tested, not
confirmed. They are subject
to falsification by other homologies,,,,given the fact that this
happens in just about every cladogram,. I cant see how you can deny
it. It is, ironically enough, statistical phylogenetics (and yeah I
am thinking of max-like especially) which is inherently a
CONFIRMATIONALIST approach (pass that baloney please). A max-like
phylogeny is not the result of any sort of a test. Is it capable of
falsifying the model which underlies it? Of course not. It is merely
a development of the phylogenetic implications of the model in light
of the data. There is no test anywhere in sight. A parsimony
critereion (congruence test) does test each of the individual
homology hypotheses, and when they pass that test they are
corroborated.

>        A long string of confirming
>        instances leads to ZERO probabilistic support for the next
>        such instance (I've seen lots of grass, but my bet is on
>        the fact that it's NOT all green).  Measuring the degree of
>        covariation beyond that which one expects by chance alone
>        is a good way out of the confirming instances straightjacket.

Another way out is not to get in in the first place. Parsimony does
not amount to a string of confirming instances, because each new
character has the power to falsify the original hypothesis.

>        the relevant question not answered by (gulp)
>        cladists is how much corroboration should be attributed to
>        hypotheses when they pass the test?  For Popper,
>        c = 1/b, where b is the boldness of the hypothesis...
>        so the question becomes, how bold is your hypothesis?

very
:)

>        question, "Well, how often do I expect this level of
>        (congruence, covariation) by chance alone?", which is
>        a statistical question applied to a cladistic problem.

Actually, if this is of interest to you, then go for it. I dont see
it as having any impact whatsoever on how we go about doing
systematics. Quantify boldness, tell me how often I can expect
results like the ones I got, etc.etc. With all due respect, these are
not burning questions in my mind. They wont influence how I study
characters, or code them, nor wiil they cause me to prefer
less-than-parsimonious solutions, so I can do my work in peace,
before you even show up.
So tell me, on a conceptual level, how do you imagine quantifying
something like the expected congruence of a set of morphological
characters? The image of them showing distribution patterns caused
"by chance" is a bit too abstract for my li'l head.

>        Ecology is also a formalized field, yet ecologists
>        far and wide would guffaw at the intimation that
>        because they sat and watched a population or community,
>        and made lots of notes, and checked and counter-checked
>        among various hypotheses, that the theory they then
>        developed has been tested by the mystical properties of
>        the methods of Ecology!  Yet you insist that that's all
>        it takes to do cladistics.

gee, is that what I said? The mystical properties of the methods of
cladistics. I must be getting old,,,the memory, y'know...

Look around, Tom.  Biologists
>        all over the place are doing statistics.  On all sorts
>        of things (like morphology, for instance).

wow james, you are right! hey! I never noticed,,,imagine that!
Morphology???no,,,,
really?,,,you mean like in morphometrics,,,like (e.g.) those 5 or 6
papers that my major prof has put out in  the past few years (the
Look James,,I have no problem with statistics,,,statistics are
wonderful,,,I've delved a bit myself,,,my argument is not with
statistics in general,,it is with the nonsense assertions by folks
like Felsenstein and Edwards, that parsimony is a statsitical method
(albeit a crude one), is model based (although the model remains
forever "implicit"), and that (at least by implication) one can do a
better job at reconstructing phylogeny by focussing on the
development of statistical methods for discerning pattern from
(effectivly) only sequnece data. This attidude may have been a ploy
to enable people who are inherently gifted for statsitical work to
play the phylogeny game. but it became a mantra which has infected a
generation of
"systematists" who actually believe on faith that it is true, and who
never  bother to learn what systematics is really about (and who also
often never bother to study the organsims whose phylogeny they are
supposedly "estimating"). This is the root of all my impassioned
pleas for total evidence, and for endless repetition of what I
percieve to be the real methods of cladistic systematics. Statistics
are powerful tools, but they are not the only way to do science. In
fact, they are sometimes simply inappropriate to the question at
hand. Felsenstein et.al. made sure to slam the door on that
possibility from the get go,,but actually, I think that the problem
of discerning the singular, unique historical branching pattern is
inherently NOT a statistical question.
I do NOT mean by this any disrespect for the accomplishments of that
field, nor for the impressive development of the various statistical
tools, my argument is merely with the scope of applicability of these
tools. For the purpose of modeling genetic evolution, and perhaps
even for the purposes of assessing the homology at various sites (in
the pre-parsimony stages of a systematic analysis), these methods
either do, or may possibly have enormous value. But I do not feel
that they are capable of addressing the real problem which
systematists address,,finding the branching pattern of phylogeny in a
framework which draws from all the legitimate sources of evidence
available. At the heart of it, I think the dispute has been centered
on the statisticians refusal to acknowledge the legitimacy (and
sometimes even the existence) of the "traditional" approaches to
systematics (the approaches which still can be said to have taught us
The concepts of homology, pattern recognition, and parsimony are not
going away.
When they finally come to terms with all this, perhaps we can move on
to a rational discussion of how the various conceptual frameworks can
fit together.
I dont consider you a lost
cause, because I guess you really dont argue passionatly for max-like
type approaches, and some of your uses of statistics seem not to
necessarily interfere with cladistic practices. But you do seem to
resist the notion that parsimony approaches may have legitimate
justifications outside of the world of statistics, and you do seem
generally uncomfortable with a world in which the statistical horizen
is not infinite.

>        protocol has all the pieces, all the potential, all the
>        good intentions of every other field... but it's STUCK.

you mean, well grounded   :)

>        Try disproving your hypotheses with a test that has some
>        teeth.  You'll be surprise at the increase in your
>        subjective confidence, which (contra Popper) is all we
>        really have anyway.

I tried your test on my morpho characters,,,I was so damn significant
it crashed my li'l mac,,,,I've not come down from that cloud yet!

>        THIS is a great example of where the process of
>        formulating a hypothesis and actually testing it
>        is entirely conflated.  OF COURSE your data will
>        be congruent, because you formulate a set of
>        hypotheses under a criterion that demands that
>        they be congruent... and then the set of hypotheses
>        that survives the test of congruence is preferred.
>        Yikes.

I think you should go through all that a bit slower. I formulate a
set of hypotheses, each of which engenders an expectation of
congruence. I test these hyptheses contra eachother. I prefer the set
of hypotheses which survive the test, yes. yikes?

>        Here's an analogy.  I am studying a community of
>        organisms for evidence of competition.  I formulate
>        hypotheses of competition under a criterion that
>        demands that they indicate competition.  Then I
>        examine my sets of hypotheses for competition...
>        BINGO!  competition!

No. competition=homology the first two times you use the word, then
competition = congruence the third and fourth time you use it.
......and replace "demands" with "expects".

>        No test.

Amazing,,,,,look at any cladogram and you will see plenty of homology
hypotheses brutally splattered  all over the place,,,,falsified,,,but
not tested, eh? I guess cladistics really is magic!

>>. And I say that you dont know how nucleotides change in these
>> taxa, and you wont begin to make coherent
>> statements about how they do until AFTER you have a phylogeny in
>> hand.
>
>        BOSH and poppycock.  I know enough about the processes
>        of evolution to say that sometimes it leaves behind decent
>        evidence of shared geneaology, and sometimes it leaves behind
>        baloney.

yeah,,,but WHEN SPECIFICALLY?

>       If I examined the data on a tree FIRST, and the tree is
>        based on baloney, I'll make coherent statements, but they will
>        be lunchmeat.  Sandwiches, anyone?

sure,,a little mustard on mine....If the tree is wrong then you are
correct, your coherent statements will be lunchmeat. That is not an
effective argument however against my assertion that your statements
wont even be coherent sans phylogeny.

>        i'll grant you that parismony allows us to present
>        hypotheses... and it provides a criterion for choosing
>        among hypotheses... but so does throwing a dart, or
>        musing to myself in the mountains about my organism,
>        and so on... what you keep overlooking is that the
>        validity of parsimony as a tool for constructing
>        hypotheses fluctuates as a function of the effects of
>        processes of all sorts (knowable and unknowable) on
>        the patterns in hypotheses of homology.  It's not
>        all that complicated, really.

I know that James, I am simply stating that you learn about how the
various processes have operated within specific phylogenies, because
(as you stats folks would see it) the branching pattern is a
parameter itself. That is why you need a method independant of these
concerns by which to choose the tree in which you will model the
other parameters. Your model and phylogeny are mutually
dependant, and the complex is needs to be tested outside of its own
context.

>        My entire motivation is to improve the trees and
>        hypotheses of homology and character transformation
>        series estimated because I'm mostly interested in
>        the biology.  Is that so wrong?

gee, that almost sounds like you aint so hostile to

>> inference=guess? Is it a guess that some animals have vertebrae,
>
>        It is a guess that they share these features by common
>        descent.

well gee ok, but it is a pretty high level guess (as in basic to
evolutionary biology)

>        Dressing up guesses in fancy clothes like "hypothesis",
>        and "inference" is good for while, but then it gets
>        tiresome.  Let's admit at least that they are
>        informed guesses?.

Well, my dictionary says guess= to make a judgement without
sufficient information. Informed guess means what? Almost sufficient
information? On some level I guess i agree (the dao which can be
spoken is not the eternal dao),,,can I have another sandwich?

>> gee, James, the notions of homology and lineage branching are rather
>> basic in evolutionary biology.
>
>        Yes, but so is the difference between homology and
>        analogy, and so is the distinction between divergent
>        sophisticated) are the notions of introgression,
>        hybridization, reticulation, horizontal gene transfer,
>        mutational saturation, adaptive convergence, random
>        convergence, and so on.  The question is can we
>        detect with any method of reasoned inference they
>        differences between patterns caused by descent with
>        modification and lineage branching and these other,
>        equally real, potential causes?  Not if the
>        method of inference is mislead by such patterns,
>        and parsimony is.

I dont think it is. In fact, most of our knoweldge of these other
factors comes from examination of homoplasy on parsimonious trees
(including those less exact trees made before the method was
formailized in cladistics, and from 'obvious" situations).

Granted, there has been some
>        progress on trying to find out what the observable
>        consequences of such sources of noise are, but in the
>        end, you still get a tree, not a test.

You get a tree from the test.

>
>> Failing that, you do not have a legitimate
>> explanation for the pattern. If you suceed, then I would admit
>> that my pattern might have a shaky foundation,,,but what would I do
>> then? It is the best pattern I have,,,,maybe I would have to accept
>> it for now and look for more evidence!
>> (Horrors)
>
>        I think it would be reasonable to present hypotheses of
>        hybridization, for instance, if one could detect such
>        things... I know of some folks who have, but I can't
>        for the life of me recall any cladists doing such things.

Nelson developed an approach to identifying potential reticulation on
cladograms in the late seventies or early eighties (SysZoo I think)

>        I don't see the two alternatives as mutually exclusive.

Me neither. They fit together conceptually in a certain way though,
and that is, I think, an interesting question, and a point of
dispute.

>        Of course we should find out where information has been
>        obliterated or muddled, AND we should look at various
>        different genes, AND various different character systems.
>        Understanding genomic evolution is a worthwhile task, and
>        so is understanding the evolution of dentition in the Equus
>        lineages.  As I've said before, these artificial classes of
>        morphology and molecule, character transformation and
>        models of genetic evolution, are SO harmful to the
>        shared goal of studying biology from an evolutionary
>        standpoint.

good for the most part. But these classes do refer to different
research programs with different needs, standards, limitation etc.

>         No one has anything to lose, for instance, by
>        gathering up all the molecular data they can to study
>        dentition evolution.

now wait,,,,after all that nice warm and fuzzy stuff, why reimpose a
distinction? Why gather up all the molecular data,,,why not just take
it for granted that you gather up all the data period?

```