Manufactured data sets - was Re: Felsenstein versus Ockham?

R. Zander bryo at PARADOX.NET
Wed Jan 27 09:48:36 CST 1999

David Orlovich wrote:

> R. Zander wrote:
> >
> >There is no evidence a priori that a data set
> >
> >species A 1 2 3 4 5 6 7 8 9
> >species B 1 2 3 4 5 0 0 0 0
> >species C 0 0 0 0 0 6 7 8 9
> >
> >with species A and B sharing characters 1-5 must be more closely related
> >than species A and C sharing characters 6-9.
> >
> It seems to me that if a data set was not clearly able to "give up" its
> phylogenetic secrets then the simple answer is to get more data.

Or if you have no more data, then don't make an inference when the loss, if you
are wrong, is considerable.

> Isn't the
> underlying principle that evolutionary history IS going to be reflected in
> the evolution of characters

In morphological studies, evolutionary history is destroyed as well as recorded
in traits. Enough destruction (long branches, anagenetic change, many
extinctions) and you get no signal or a false signal or a much modified vague
signal. With molecular characters, in an ideal world (neutral mutations,
strictly branching lines such as cpDNA and mtDNA are supposed to have, no
introgression, molecular clock, etc.), you should get nicely branching trees of
the gene's evolution which may be the same as the species evolution (given no
lineage sorting/concerted evolution) which may be checked with multiple
sequences from multiple loci. We are still a long way from really decent

> and it's our job to test their homology by
> analysis of as much data as necessary to give that answer?  To manufacture
> a data set like the one above is akin to asking "how would be deal with a
> mammal (sp. B), a bird (sp. C), and a mammal with feathers (sp. A)? - or
> for that matter, a unicellular procaryote, a chicken and a bacterium that
> laid eggs!".

Well, no. Accepted or uncontested groups are not studied mainly because they
are indeed well supported. The *problematic* groups, with lots of apparent
homoplasy, are the ones that need phylogenetic analysis. *Those* are the groups
with sometimes radically different trees only one step longer, involving
character states that are commonly convergent within or immediately outside the
studied group. Molecular studies are even better examples. Support of several
synapomorphies can be contradicted in a tree one step longer by a contrary tree
with almost as much support. The consistency index of molecular data sets of
problematic taxa is never 1.0. If you have .85 you are happy. With .5, you have
to really search for excuses for your inferences.

> If the data set above is the entire one for the imaginary
> organisms, then it won't ever exist and therefore doesn't matter, and if
> it's a partial character set for another set of imaginary organisms, then
> someone needs to look for more characters.
> Of course I might have got the wrong end of the stick completely and I'm
> sure someone will tell me!

Computerized phylogenetic analysis is decades old, but no one really has yet a
decent grasp of it. We are all clinging to a stick with only one end.

Richard Zander

> David Orlovich.


Richard H. Zander, Curator of Botany
Patricia M. Eckel, Research Fellow in Botany
Buffalo Museum of Science
1020 Humboldt Pkwy, Buffalo, NY 14211 USA
bryo at   voice: 716-896-5200 ext. 351

More information about the Taxacom mailing list