Weights

Stuart G. Poss sgposs at WHALE.ST.USM.EDU
Wed Mar 5 13:18:04 CST 1997


James Francis Lyons-Weiler makes a good point when he writes:

" What's lacking is an epistemological equivalent to a null tree,
and a critical test of the model tree...", even if not quite literally
true, since some (ie Meacham's work on the probability of compatibility)
have proposed specific null models.

Could those who would argue that such null models are irrelevant to
phylogenetic inference please explain (or perhaps more clearly explain
to some of us), under what circumstances their own methods fail?

I had thought that Joe Felsenstein mathematically characterized Hennig's
dilemma quite well, although evidently some would disagree.  Although I
place myself among those who would disagree with Joe's characterization
of the consequences of his results, it would be for empirical reasons
(relative distributions of potentially misleading charcters versus
observed potentially historically informative characters) and not
because the phylogeny question can not be asked within a statistical
framework or that the statistical framework he has proposed is incorrect
or, worse yet, irrelevant.  In any event, given the the direction of the
debate about weights, I am left wondering, if a statistical paradigm for
character evaulation is inappropriate, then under what circumstances do
other methods fail?

Typically, in science one must be concerned with hypothesis testing
rather than "evidence accumulation", although in practice it may, at
times, be difficult to distinguish them.  Nonetheless, phylogenetic
testing procedures that can not fail under any circumstances are not
particularly useful scientifically.  A theory of methodology that
predicts everything (or nothing specific) is not informative
(information, being, at least as Shanon defined, it, a special set of
conditional probabilities).  Likewise, cladistic characters that are
ALWAYS regarded as "good" are not particularly informative.  It is only
when they conflict with other cladistic characters that they provide
evidence of evolution/phylogeny.  If some cladistic characters behave as
if they are "better" than other cladistic characters (more robust to
congruency or compatible when compared to others), then is it not
reasonable to suppose that such "hehaviors" may be in general different
among characters and that these differences will generate alternate
distributions of potential outcomes (trees)?  If such distributions
exists, then can not statistical or quasi-statistical methods be
employed to study such distributions?  If so, would not such study be
informative with respect to phylogeny?

Some have argued that the answer should be no, because we are speaking
of evolutionary events, which either happened or did not happen and
that, as such, can not be modeled statistically. If I understand him
correctly, Tom DiBenedetto makes this argument, by stating,

"if we were to know the true answer, we would be able to go back and
reweight our characters such that we could run the matrix and reproduce
the right answer. What would those weights be? In all cases, either 0 or
1, for if we knew what happened, we would know that a hypothesized
transformation either happened or didnt.  The probabilites say nothing
about the reality of the transformation; they are statements about what
we predict was likely to happen, given reference to some knowledge we
think may be relevant."

However, even assuming a Bayesian outlook, such an argument is specious,
since under no circumstances yet available to science would a scientist
be in a position to directly "know the true answer" for any cladistic
character (unless of course we are talking about the most recent of
events).  If we did, we would hardly require scientific arguments at
all.  Our knowledge is inferential, or to put it bluntly in this
context: there is no certainty that we have not included at least some
potentially misleading characters into our analysis (whatever it happens
to be).  Consequently, how we weight our characters (how independent we
think them to be) is important to the appropriateness to the inferences
we make.  Indeed, the size of the pool of potentially misleading
characters is so much larger than the set of potentially non-misleading
(compatible) ones, that in all probability, any set of cladistic
characters will likely contain at least a few and hence, will almost
always likely be at least partially wrong.  This is easy to demonstrate,
as all but the most artificial datasets have numerous incompatible
characters not all of which can logically be true simultaenously.

One could argue that failing to appreciate the importance of
establishing relative weights that should be placed on characters, even
if dealt with only as included or excluded, denies us the opportunity to
test some of the central tenants of Darwinism, as well as investigate
some of the most interesting questions in biology relating to the
genetic independence of morphologiccal features.  As Darwin so ably
observed (albeit, without an explicit statistical model or even without
the choice of a PC. Mac, or UNIX workstation, on which to run his
favority methods of inferring phylogenies), life is largely
probabilistic, but not so completely so as to be deterministic.

Studying relative weights is closely tied to the biology of studying
characters and taxa, both of which are highly proabilistic in nature.
Indeed Darwin's theory predicts that taxa should not have (any/too
many?) neutral characters, that under some circumstances some will
replace others, and that evolutionary transformations are not all
equally likely, owing largely to probabilistic consequences of natural
selection.  Presumably, such selection occurs even at the very moment
evolutionary "accidents" took place or at least sometime afterward.

I would find it highly ironic that methods to infer phylogeny should be
devoid of probabilistic reasoning, when phylogeny itself is largely the
result of natural selection, a very highly probabilistic process.  Are
we to conclude that the processes primarily responsible for phylogeny
are of no consequence to or independent of the very models that are
meant to infer the outcome of these processes?  If so, how will we know
when our methods fail?

Stuart Poss

--
_____________________________________________________________________
Stuart G. Poss                       E-mail: sgposs at whale.st.usm.edu
Senior Ichthyologist & Curator       Tel: (601)872-4238
Gulf Coast Research Laboratory       FAX: (601)872-4204
P.O. Box 7000
Ocean Springs, MS  39566-7000
_____________________________________________________________________




More information about the Taxacom mailing list