Weights, probability

Doug Yanega dyanega at DENR1.IGIS.UIUC.EDU
Tue Mar 4 14:47:12 CST 1997

James Francis Lyons-Weiler wrote:
>       In fact, there is fundmental difference in the approach I
>       do espouse.  By taking advantage of the known consequences
>       of patterns on the performance of tree-building algorithms,
>       one can prevent the analysis of misleading data.  Some
>       processes have observable consequences, and leave footprints
>       in the data (these processes include phylogeny, homoplasy,
>       differential lineage sorting, long branches); and sometimes
>       character and taxon sampling can result in unfortunate matrices
>       that are error-prone.  The a priori approch cuts the
>       vicious circle, and provides tree-independent inferences
>       on the observations made (the matrix).

But it sounds like you're making an extremely risky assumption there
enough to recognize them in action before you've even attempted to build a
tree. I, for one, do not feel all that comfortable claiming I (or you)
understand evolution so well that I can trust a program to tell me when
evolution has thrown me a curveball. I also still think it sounds
suspiciously like doing a parsimony analysis without generating a tree -
which is where most people look for "footprints" in their matrices before
they try weighting characters.

>       It's perfectly analgous to testing ecological data for
>       normality prior to the application of paramteric statistics;
>       the methods assume normality, so one test the data
>       FIRST.

I know it sounds good to say that this is analogous, but - as Tom has been
stating - by drawing this analogy you are again apparently treating the
whole thing as if it's one big probabilistic function right from square one.
The existence and properties of things like normal and poisson distributions
- and ways to sample them, etc. - has been demonstrated to everyone's
satisfaction, certainly, but you can't test for normality (or compare a data
set to any set of expectations) unless you have *defined* a normal
distribution to begin with, as a standard of reference. I'm not sure I see
exactly what the concept of an "expected distribution of character states"
is being defined *as* - i.e., what standard of reference you are proposing -
so as to allow you to make this sort of comparison (or if it has any meaning
at all in the context of phylogenetic reconstruction).

> Phylogenetic methods assume any number of things;
>       testing the matrices for violations of these assumptions
>       is as desirable as trying to ensure that what you think
>       are homologies are; in fact, it's the same problem.

We may actually agree here about it being the same problem - in fact, it
looks like what you're advocating is basically just plain old character
weighting cast in a different light, but still basically testing one's
matrix TWICE - so not only is it the same problem, but the same approach
people have used in the past, given a different flavor. Let me ask you
again; if your suggested procedure is a method for "testing for assumption
violations" and parsimony is a method of "testing for assumption
violations", what does your procedure accomplish that is so fundamentally
different from simply by running a parsimony analysis and then reweighting
characters based on the outcome (i.e., whether one detects homoplasy, long
branches, or whatever) and running it again? As a possible answer to my own
question, it simply sounds like you're advocating using an algorithm for
character weighting based on assumptions about the sorts of patterns
different evolutionary processes are expected to produce, and just looking
for those patterns without using a tree-style output. In other words, the
thing you're trying to do is build a _process model_ into phylogeny, which
parsimony does not. If so, this is not a new idea, nor do I find that it
inspires me to trust in those process/pattern assumptions any more than I
did before.
Still trying to penetrate the haze,
Dr. Douglas Yanega
Depto. Biologia Geral
Univ. Federal de Minas Gerais
Caixa Postal 486
30161-970 Belo Horizonte, MG, BRAZIL

More information about the Taxacom mailing list