# Weights, probability

James Francis Lyons-Weiler weiler at ERS.UNR.EDU
Tue Mar 4 09:28:50 CST 1997

```On Tue, 4 Mar 1997, Doug Yanega wrote:

> Let's see if I can anticipate James' response...he is not talking about
> state transformations, but the overall pattern of character states that have
> been assigned in one's matrix, and an _a priori_ analysis (i.e. before one
> applies parsimony to the matrix) that ostensibly will reveal whether there
> is structure in the pattern or not, and which characters are more/less in
> keeping with this overall structure. He said:
>
>         "Sometimes some of the pairwise comparisons (e.g., A in
>         taxon i vs. A in taxon j) represent true homologies, while
>         some of the pairwise comparisons (for the SAME character)
>         do not (e.g., A in taxon i vs. A in taxon Z), due to
>         homoplasy.  Not every character is equally informative
>         owing to this feature, and the degree to which a character
>         is weighted equally belies and ignores this entirely.
>
>         Because a proportion of the among-taxon comparisons (REMINDER:
>         I am discussing comparisons, NOT transformations, which are
>         inferences,not observations) are homologous, and some are not,
>         fairly simple math provides the actual proportion (and therefore
>         realized probability) of a state comparison being informative.
>         What we don't know by looking at the differences is WHICH are
>         informative, and we can't _know_ them by cladistic parsimony
>         alone, but I propose that we can determine at least a relative
>         determination of which comparisons are misleading, and accurately
>         identify which comparisons are not, and thereby IMPROVE upon the
>         performance of cladistic parsimony and other tree-selection
>         criteria."
>
> What I don't quite understand is how this a priori pattern-testing is going
> to be fundamentally different from a circular application of parsimony
> itself; it's like building a tree, then concluding "Aha! Character 33
> appears to have a homoplasious state 1 in taxa 4 and 17, so character 33 is
> only 85% informative! Now let's plug that probability in with the others and
> run the data again." The only difference is that there is no actual tree
> constructed in that first pattern-testing step, just an algorithm (without a
> graphic representation) which gives you estimated probabilities of
> "informativeness" based on some overall comparison of each character to the
> others in the matrix. I find the concept of making a "relative determination
> of which comparisons are misleading" *before* an analysis to be hard to
> swallow, and my gut feeling is that it is equivalent to parsimony analysis
> in itself (i.e., looking for incongruence in a pattern).
> Just trying to make some sense of this,

In fact, there is fundmental difference in the approach I
do espouse.  By taking advantage of the known consequences
of patterns on the performance of tree-building algorithms,
one can prevent the analysis of misleading data.  Some
processes have observable consequences, and leave footprints
in the data (these processes include phylogeny, homoplasy,
differential lineage sorting, long branches); and sometimes
character and taxon sampling can result in unfortunate matrices
that are error-prone.  The a priori approch cuts the
vicious circle, and provides tree-independent inferences
on the observations made (the matrix).

It's perfectly analgous to testing ecological data for
normality prior to the application of paramteric statistics;
the methods assume normality, so one test the data
FIRST.  Phylogenetic methods assume any number of things;
testing the matrices for violations of these assumptions
is as desirable as trying to ensure that what you think
are homologies are; in fact, it's the same problem.

James Lyons-Weiler

```