weights, random data etc.

P. Hovenkamp hovenkamp at RULRHB.LEIDENUNIV.NL
Fri Mar 14 15:48:48 CST 1997


Playing around with random data should be part of the basic cladistic
curriculum. There is no better way to become convinced of the following
facts:
1. Number of MPT's for a dataset is no indication of the anount of "signal" in
it. As Richard Zander wrote, random data may just as well result in 1 or in
1000 trees.
2. CI, or RI is no indication of the amount of "signal" in a dataset, unless it
is somehow related to a "baseline" established on basis of random data (as
suggested by Klassen, Mooi and Locke, 1991). Depending on the size of the
dataset (and a number of other parameters) CI may be as high as .5-.7 for totally
random data.
3. PTP-tests measure non-randomness, but not phylogenetic signal. Simply
duplication of  a number of characters in a random matrix will result in PTP-
values indicating significant non-randomness.

Why should this be a concern for us?
Most people (perhaps not most, but at least a great many) using cladistic
techniques rely on concordance of characters to test each individual character
as a hypothesis of homology, and use parsimony as the best way to measure this
concordance. That is all very well, and in accordance with cladistic theory.
However, without some indication of the amount of phylogenetic signal in a
dataset, analysing it this way may be as useful as going into a dark room with
a stopwatch, a flashlight and a mirror and establishing the speed of light as
10 m/s ("currently best estimate given our data").
P. Hovenkamp
Rijksherbarium/Hortus Botanicus
The Netherlands
hovenkamp at rulrhb.leidenuniv.nl

This space intentionally left blank




More information about the Taxacom mailing list