> What puzzles me most in this and in all such discussions on tree-estimation
> (or reconstruction) versus The Truth is that of all tree estimation
> procedures the only one which regularly causes this level of angst is
> PARSIMONY. What about likelihood trees or distance trees, doesn't the
> exact same argument apply? We have a criterion for preferring one tree
> over another (max likelihood) or at least a criterion for deciding which
> tree is the estimated tree (the one produced by the algorithm: NJ or
> whatever). There still must be what James has called "a bolus of faith" to
> raise that tree to the status of a meaningful phylogenetic estimate.
Yes, indeed. The reason why we have focused on parsimony
is because no other methods have come up. I have studied
the maximum likelihood approach - and right away found that
staticians balk at the simultaneous optimization of more than one
parameter as an "ill-posed problem" - and that's just the
beginning...
> In this regard there is nothing special about parsimony. WHATEVER
> estimation method we use, we still should distinguish "the tree that comes
> from the method" from "a good/valid/meaningful estimate of the phylogeny".
> Knowledge of the conditions under which each given method will and will not
> recover a known hierarchic signal are useful here. Parsimony is good
> sometimes, various likelihood models are good sometimes, NJ and even UPGMA
> will work provided the data (hence, the underlying evolutionary process)
> are appropriate, but EVERY method has its "Felsenstein zone" or equivalent:
> a set of circumstances in which to add more data will lead with increasing
> certainty to the wrong tree.
You are quite right here, as well. Most people don't
recognize that maximum likelihood can be inconsistent -
and I doubt anyone really has a clue as to how often
it is. The conditions for absolute consistency
(the right model, an infinite amount of data, the right
tree topology) may never occur - there at least it's
not still debated as toward whether phylogenetic estimation
is a matter of reconstruction or estimation.
> Tests on the data can be of some help in picking up problem cases in which
> the key assumptions of one (or more) methods are violated. And if the
> chosen method is appropriate to the data, then tests (eg, bootstraps and
> TPTPs) against the possibility that recovered groups are attributable to
> chance alone can help in distinguishing the bits of the tree which are
> meaningful from the bits which are not.
I concur.
> But until the data are tested, and until the tests on nodes of the tree
> have been done, a tree estimate is just a tree estimate, it cannot be a
> hypothesis about phylogeny.
Here I differ slightly - we don't need criteria for structuring
hypotheses as badly as we need critical tests of hypotheses.
Although a long way off, I predict that a class of statistical
inference will eventually place confidence limits on each
node of tree that can be universally accepted as
probability estimates of monophyly. So trees published
using the state of the art in tests of data are estimates
of phylogeny - as such I'd call them hypotheses.
