Circularity & testing.
aharvey at AMNH.ORG
Sun Oct 20 19:53:53 CDT 1996
As Rodham Tulloss wrote, "The subject being grappled with is fascinating.
As the emails get shorter and closer together, the light being shed gets
less." I'd like to try a different perspective, which I freely admit may
go up in flames.
Which hypotheses, exactly, about cladistic data might we want to interpret
_statistically_? Consider, we've got a data set; that is, a list of taxa, a
list of characters for which two or more homologous states have been
proposed, and for each taxon, data points indicating the state proposed to
be present for each character. Note we've had to make hypotheses about two
kinds of homology, that of characters among taxa (e.g., the feathers of
pheasants and scales of skinks are homologous characters), and that of
states among characters (e.g., the feathers of pheasants and the feathers
of falcons are homologous states). It seems most of the controversy
involves the latter, to which I will restrict discussion here.
Let's say hypothesis A is that all of these state homology hypotheses are
true. If hypothesis A itself is true, then the parsimony algorithm used on
the data set should yield a tree topology for which there is _no_
homoplasy. If there is in fact no homoplasy, then of course hypothesis A is
supported but not proven, and a relevant statistical test should indicate
the probability that this result (a homoplasy-free tree) could obtain if
hypothesis A is false. If there is _any_ homoplasy, hypothesis A as worded
is falsified; that is, at least some of the homology hypotheses are false.
The tree of course suggests which ones are false, namely, those that are
homoplastic on the tree, but I have been unable to come up with a relevant
statistical test, I think because the main hypothesis has already been
falsified (a type II error test seems nonsensical, e.g., what is the
probability that hypothesis A is true given observed homoplasy).
The problem obviously lies in that the hypothesis being tested is in fact a
statement about many hypotheses. Many, many hypotheses. The hypothesis "All
occurences of state b of character B are homologous" is itself a collection
of (correctably non-independent!) pairwise hypotheses ("b in taxon 1 is
homologous with b in taxon 2"; "b in taxon 1 is homologous with b in taxon
3..."). Much more informative than hypothesis A, it seems, would be to
evaluate the support the data gives to each of these hypotheses (e.g.,
"what is the probability that a supported pairwise homology is actually
false", and "what is the probability that an unsupported pairwise homology
is actually true"). In this case, also, I don't think circularity is a
relevant concern, since the hypothesis involves only two data points, and
the test involves the entire data set.
However, my impression is that we're a long way off from being able to
statistically address these pairwise homologies. The hypotheses that may
(or may not) be addressable now seem to be of the "bulk" variety, e.g.,
hypothesis A above, or perhaps an even weaker version: "At least some of
these state homology hypotheses are true." At best, maybe there is a test
that really can tell you that a significant proportion of the individual
homology hypotheses are likely to be correct, but without telling you which
ones those are. For morphological data sets, at least, this still seems
like a pointless piece of information, because we do so much filtering of
the available data prior to analysis (e.g., I don't try to establish
homologies for individual setae) that true randomness seems a remote
Well, that's my two cents.
Tangential P.S.: James Lyons-Weiler wrote that "Homoplasy indices show an
across-taxon, among study, very general trend: when more taxa are added,
the congruence goes down..." The problem lies in the indices, not the data;
i.e, it's an artifact (I gave a talk on this at the Toronto Hennig
Alan W. Harvey (aharvey at amnh.org)
Assistant Curator of Invertebrates
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024
(212) 769-5638; fax (212) 769-5783
More information about the Taxacom