G-Fit index in PAUP

Mark Metz mmetz at LIFE.UIUC.EDU
Fri Jul 19 10:08:41 CDT 2002

The RandTrees function in PAUP generates a sample set of trees from all
possible trees from the data set.  In the parameters you tell it how many
trees you want generated.  These are plotted as a distribution of trees
based on the tree length.  From this distribution, a gamma 1 statistic can
be calculated (see http://www.xycoon.com/skewness_gamma_1.htm) which is
measure of data skewness and independent of the phylogenetic nature of the
data, per se.  This statistic can be compared to a table for significance of
the skewness of the data (Ho = the distribution of the data is
parametrically normal and g1calculated <= g1critical; H1 = the distribution
of the data is skewed and g1calculated > g1critical).  The application is
that if the sample trees are parametrically normally distributed, there is
probably no signal in the data, it's random.  However, if the data is
"critically" skewed then there is phylogenetic signal in the data, it is
significantly different than random data.  One of the problems with this
statistic as it is applied to minimum length tree searching is that the data
needs very little signal to deviate from randomness, so the signal to noise
ratio is still a problem and can confound any presumption of accuracy of the
tree construction.
I have the reference for applying this statistic to the distribution of
mininum length trees around here somewhere, but I can't find it right now.
I hope this helps.

