[Taxacom] The difference

Richard Zander Richard.Zander at mobot.org
Fri Oct 26 15:58:51 CDT 2007

After 10 years of mulling over the difference between morphological and
molecular phylogenetic analysis, and the great disparity between sizes
of data sets of each, it finally occurred to me that morphological
traits are based on genes, and shared apomorphic morphological traits
imply shared apomorphic DNA base sequences. These could be few, since a
few base changes can affect a gene. But, exons are now often used in
molecular phylogenetic analysis because DNA base changes occur more
slowly than in non-coding sequences and are more informative of basal
tree relationships. Genes now are acceptable in molecular analysis
because the results are similar to that from surrounding non-coding
areas, however, the reverse might be true that shared exons imply
surrounding promoter regions and introns have many shared DNA bases. 
Thus, given that morphological and molecular data sets are commonly
combined in the same phylogenetic analysis, one shared morphological
trait is not equivalent to one shared molecular trait but represents a
possibly large number of shared apomorphic DNA bases, the number unknown
and dependent on the gene or genes involved. 
Let's do a thought experiment: In a morphological data set of 30 traits,
30 genes may be involved. If each gene contributed 5 base pairs of
phylogenetic information, then we would have 150 informative traits.
This compares favorably with information from molecular analyses of one
gene. It could also be that each morphological trait implies the same
number of informative traits as a single gene in molecular analysis, in
which case the above  morphological data set would have 5 times the
phylogenetic information as a molecular data set of 6 genes. 
For those who reason via statistics, morphological data sets have low or
impossible to obtain bootstrap scores, and few traits cannot be
significant. For instance, suppose (AB)C was supported by 2
morphological traits, but (BC)A was supported by 1 trait. If the chance
of a trait occurring in two of the three lineages by chance alone was
1/3, then the chance of 2 out of 3 synapomorphies occurring in both A
and B by chance alone is 26 percent, too high for statistical
significance. However, if we assume each trait represents 5
synapomorphic base pairs, then the change of 10 molecular traits shared
by A and B out of 15 total synapomorphies at 1/3 probability is 1
percent, giving a confidence limit of 99% to (AB)C. 
Can we come up with some average or standard value of expected shared
DNA bases per shared morphological trait? Even if we can't, it is clear
that morphological data is given short shrift when combined on an even
basis with molecular data, and therefore morphological and molecular
data sets should always be evaluated separately.
Richard H. Zander 
Voice: 314-577-0276
Missouri Botanical Garden
PO Box 299
St. Louis, MO 63166-0299 USA
richard.zander at mobot.org
Web sites: http://www.mobot.org/plantscience/resbot/
and http://www.mobot.org/plantscience/bfna/bfnamenu.htm
For FedEx and UPS use:
Missouri Botanical Garden
4344 Shaw Blvd.
St. Louis, MO 63110

More information about the Taxacom mailing list