warren_lamboy at QMRELAY.MAIL.CORNELL.EDU
Tue Jun 13 09:25:13 CDT 1995
Mail*Link(r) SMTP FWD>RE>Confidence
.Date: 13/06/1995 5:29
.From: Nicolas Bailly
.On Mon, 12 Jun 1995 11:39:42 -0400, Warren Lamboy wrote
.>In my opinion, a good way to quantify one's confidence in a determination is
.>by comparison to known standards, that is, by statistical comparison to
.>specimens whose identities are known. There are a number of ways to do
.>but one way, with which I am familiar is that of disjoint principal
.>analysis (Systematic Botany 1990, Vol.15:3-12) which was first developed by
.>Dr. Svante Wold, of the University of Umea, Umea, Sweden.
.The problem with many statistical analysis is that the data MUST match some
. distribution, more or less following the robustness of the method. And the
. they match it, the less the interpretations are accurate.
.Moreover, the traditionnal normal and lognormal distributions seem to me
.not always <<<----------- To be
.not often <<<----------- choosen
.adjusted for biological problems. Even if PCA is robust face to the data
., I guess these kinds of data are too far of the normality.
.I think the problem of confidence is relevant to the fuzzy logic (theory of
. poss ibilities)
., not to the statistics (theory of probabilities).
.Museum National d'Histoire Naturelle,
.Laboratoire d'Ichtyologie Generale et Appliquee
.43, rue Cuvier, 75231 Paris Cedex 05, France
.Tel: (33 1) 40 79 37 63 / 40 79 37 49 Fax: 40 79 37 71
.Telex: MUSNAHN 202 641 F E-Mail: bailly at mnhn.fr
In the above, Nicholas Bailly brings up an interesting issue.
I would like to make two points: 1) In fitting the taxon models mentioned
above, it is NOT the original data that must follow a normal distribution, but
the residuals after fitting the model (those bits of data that are not
explained by the model) that are assumed to follow a normal distribution. 2)
The literature reference cited explains what to do about those data points
whose residuals are outliers (that is, do not reasonably fit a normal
In my experience, if one constructs taxon models using characters that
adequately distinguish the taxa, and enough samples of a taxon are used in
constructing each model, then there will be few, if any, data points whose
residuals cannot reasonably be assumed to come from a normal distribution. On
the other hand, if the taxa are not well separated by the characters selected,
or there are too few samples from one or more taxa, then problems of
non-normality will be more likely.
To say the last paragraph in another way, if the taxonomist accepts taxa that
cannot be distinguished consistently, persistently, and objectively, then
there will be big problems in trying to make identifications. No fancy
statistics or intricate mathematical analysis will save the taxonomist from
"bad taxa". Statistical and mathematical methods can sometimes alert one to
"bad taxa", however; this is one of the strengths of procedures such as
disjoint principal components analysis.
More information about the Taxacom