Weights

Tom DiBenedetto tdib at UMICH.EDU
Sat Mar 8 12:43:11 CST 1997


James Francis Lyons-Weiler wrote:

>On Fri, 7 Mar 1997, Tom DiBenedetto wrote:

>> The more pointed question is whether random data ever returns single MP
>> trees, or sets that preserve resolution through strict consensus
>> calculations even for large data sets.
>>
>
>       I think a more pertanent question is how often does screened
>       data result in a lot of trees,

Well, I think my question has its own (im)pertinence. You stated that
many systematists might be surprised to learn that random data can
yield sets of shortest trees. Personally I find this *absolutely*
unsurprising. Given that most "normal" sized data sets contain enough
taxa such that the number of possible topologies exceeds the number
of molecules in the universe, I would find it totally surprising that
any finite data set could be mapped onto all of those trees equally.
If there is any difffernce in the lenghts on various trees, then it
seems to follow that one must necessarily have some set of shortest
trees. But if that set of shortest trees contains indications of
contradictory sets of relationships such that a strict consensus
yields no resolution, then what is the problem? (Actually I'm not
even sure that there is a real concern anyway, I'll discuss that
below). Stu Poss has contributed his concern that were he to be given
a set of random data, he would like a method which returns no
results. Now we might not be able to do that for him in all cases,
but I suspect that would be the usual result. Hence my question.

>       and how can you tell the
>       difference a lot of tress generated by random data, and
>       a lot of trees generated by screened data.

Personally this has never been a major concern of mine, and I still
dont really understand why it should be.  I dont use random data, I
use "screened" data. My tree calculation, as a logical combination of
hypotheses, is meant to tell me how those hypotheses can be logically
combined. Thats all. I believe in using a 'total evidence" approach,
such that the tree I end up with is the logical combination of all
the evidence out there for the group. We cant really do much better
than that, until we gather more evidence. So the tree stands as the
best logical combination of available evidence no matter how much or
how little it differs from "randomness". I am not saying that I would
be untroubled by a demonstration that my tree is no better supported
than it would be given random data,,,,but in any case, the only thing
I could do, or would do, is go out and gather more evidence. I'll do
that in any case.
 (Note; the following is a general argument, not necessarily directed
against you personally and your specific concerns).
As a systematist, I see myself engaged in a centuries long process of
piecing together evidence by which to infer the history of a group.
At certain points along the way, we gather up all the evidence and
present a (usually) revised phylogenetic reconstruction. And then we
go back to work, exploring different potential sources of evidence by
which to better understand the history of that group. We are not
flitting about, visisting one group or another, grabbing a random
length of evidence, and making an "estimate" of the phylogeny, then
flitting off to some other group where our primers might work. Were I
to do that, I could very easily see why I might have concerns about
"random data", and why I might fall into the trap of erecting
criterea derived from my experiences with other groups (rather than
congruence with other evidence from the same group).
My perspective is the traditional perspective of systematics;
focussed on a taxon, and on the organism, in all of its complexity.
The concerns you raise seem to me to come from a perspective which
focusses on genetics and the statistical analysis of oceans of As, T,
G, and Cs. That work is extremely valuable in its own right, but it
seems to me that you will learn things about the subjects you focus
on. In your case, that is generalizations about genetics and genetic
evolution in general. For me, hopefully it will be an accurate
reconstruction of the history of a single group.
I will gladly contribute the data from my group to help you make
generalizations, but I dont find it useful to impose expectations
from your previous generalizations as criterea for assessing my
results. Congruence with other types of data for my group is the
approapriate standard.
Perhaps this is the underlying difference in perspective which
results in our disagreements about method.

>        Moreover,
>       how can on tell the difference between the degree of
>       resolution is a consensus tree (Nelson?  Strict? Semi-strict?
>       combinable component? Adam's? n-trees?).
>       One approach that has been proposed is whether or not
>       we expect the degree of consensus observed among
>       the set of mpts _by chance alone_, another role for
>       probability in cladistics (Simberloff).

What do you mean another? Have we agreed anywhere yet?   :)
I dont understand the problem here. A strict consensus merely gathers
up a set of topologies and reports only those nodes which appear in
all of them. What is the relevance to a test vs. chance?

>       By the way, consensus trees were originally designed to deal
>       with summarizing congruence among trees from different data
>       sets, not among the thousands of equally mpt trees that exist
>       for a single data set.

So what?

>       By the way I'm curious; if you find 1,000 equally parsimonious
>       trees, which one provides the critical test of hypotheses
>       of homology?

The strict consensus; i.e. I will consider the grouping corroborated
if it appears in all 1000 (well, to be honest, I could be persuaded
to go along with the semi-strict; that the grouping either appears
in, or is not contradicted on any of the 1000).
(thats why we present the consensus; we do not make conclusions
for contradictory groupings)

>       Consensus tree also don't really tell you that your data are
>       uninformative ,or ambiguous; they may also tell you the
>       criterion you're using finds conflicting evidence.
>       It does not, however, provide further info, e.g.,
>       what are the sources of conflict?

yeah, and they are pretty lousy at opening beer bottles too...:)




More information about the Taxacom mailing list