James Lyons-Weiler weiler at ERS.UNR.EDU
Fri Aug 23 18:46:40 CDT 1996

On Tue, 20 Aug 1996, Richard Jensen wrote:

> Regarding the effect of input order on the results of various analyses:
> I don't see how the order would have any effect on an eigenanalysis of a
> covariance or correlation matrix, as typically used for PCA, or on
> eigenanalysis of a dissimilarity matrix as typically used for PCOR.  And,
> unless there are programming errors, order of input will have no effect
> on the calculation of pairwise similarities or distances.
> Order can have an effect on the results of cluster analysis of a
> similarity or dissimilarity matrix.  The effect will be a function of how
> ties are resolved.  One way around this is to use a program, such as
> NTSYS-pc, that will allow the user to find all ties and all solutions -
> this can result in many different phenograms for the same data set, just
> as there may be many alternative equally parsimonious trees for a data set.

To kill a dead horse, I offer the following reference, just recently
brought to my attention:

Backelijau, T., L. De Bruyn, H. De Wolf, K. Jordaens, S. Van Dongen and
B. Winnepenninckx. 1996. Multiple UPGMA and Neighbor-joining trees and the
performance of some comupter packages.  Molecular Biology and Evolution

This team show that the most popular programs (PHYLIP, MVSP, SAS, SYN-TAX,
and NTSYS) recognize the problem of ties, but have  different  "tie
tolerancees" and rounding precisions, and therefore have different
efficiencies when dealing with the problem.  They also recognize that the
the problem of ties can be exacerbated by data entry order.

Data without ties can also exhibit entry order sensitivity for the reasons
I outlined previously (approximate rather than deterministic solutions).
So the points really are, first, how much confidence can one place in
classifications and ordinations that have not been shown to be robust to
entry order, and what con be done for data sets that do exhibit entry
order sensitivity.  There seems to be a fuzzy area between data
description (exploration) and pattern recovery with some of these
procedures, and a dearth of hypothesis testing as well.


