[Taxacom] What are accurate phylogenies?
Bob Mesibov
mesibov at southcom.com.au
Sun Oct 12 06:05:34 CDT 2008
I'm baffled by the use of the words 'estimate' and 'accuracy' by some
phylogeneticists. Anyone else having this problem?
Both words are commonly used in maths. For example, I might try to
estimate a number. If my estimate is the same as the number, then my
estimate is accurate.
In phylogenetics, I often read that a tree built by one method or
another is an estimate of a phylogeny. I think this means that the tree
is a guess at both the topology and the branch lengths of the true
phylogeny. I also read that a particular method is more likely to give
an accurate estimate of the phylogeny than another method. This is a
very interesting claim, because it suggests that there is some way to
know the true phylogeny, so we can compare it to the estimate.
Some authors trace their use of the word 'accuracy' to this paper:
Hillis, D.M. & Bull, J.J. 1993. An empirical test of bootstrapping as a
method for assessing confidence in phylogenetic analysis. Systematic
Zoology 42(2): 182-192.
Hillis & Bull here define 'accuracy' as 'the probability that a
specified group is contained in the true phylogeny' (p. 183).
'Probability?' Well, sort of, because they're doing a bootstrap
analysis, and bootstrapping generates estimates of likelihood.
Nevertheless, Hillis & Bull state clearly that 'knowledge of the true
phylogeny' is necessary to test accuracy in their particular use of the
word (p. 184). In their tests, they used a home-made, purpose-built
'true' phylogeny.
Something similar is done in
Woolley, S.M., Posada, D. and Crandall, K.A. 2008. A comparison of
phylogenetic network methods using computer simulation. PLOS One 3(4):
1-12; e1913.
Here the authors use the word 'accuracy' 15 times in 12 pages, but they
use simulated sequences for their analyses. However, Woolley et al.
caution that
"While simulations can provide general predictions about the behavior of
the models studied, as well as some sense of their robustness (insofar
as differing models are explored in the simulations), it is rarely
possible to simulate the entire universe of relevant models and the
models simulated may represent real data only to a given extent.' (p. 2)
So if you know the correct phylogeny in advance, it's possible to find
after a comparison that one method is more likely to give the correct
phylogeny than another. Or you might find that one method yields a tree
which more nearly approaches the correct phylogeny than another, which
is not the same thing, and involves comparing topologies and branch
lengths between trees (no fun at all).
But in real-world cases we don't know the phylogeny in advance. When I
see the word 'accurate' applied to a real-world phylogeny in the
systematics literature, should I understand that the author(s) are
reasoning as follows?
"In studies with already-known, synthetic phylogenies, the method we use
gives an accurate tree. By extrapolation, this method applied to
real-world data also gives an accurate estimate of the true phylogeny."
--
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery
and School of Zoology, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph (03) 64371195; 61 3 64371195
Webpage: http://www.qvmag.tas.gov.au/mesibov.html
More information about the Taxacom
mailing list