George Garrity george_garrity at MERCK.COM
Tue Jun 13 08:43:54 CDT 1995

        Reply to:   RE>>Confidence
Nicolas Bailly makes an important point that statistical theory is often
difficult to apply to taxonomic problems because one must test an hypothesis:
that the observed distribution fits some theoretical distribution (e.g.
normal, Poisson, log-normal, binomial, Chi-square, etc).

Perhaps part of the problem arises from the manner in which taxonomic studies
are conducted.  All too often (at least in bacteriology) one is confronted
with new species that are monotypic and in some cases new genera that are
monospecific.  Obviously, individual members of a species are drawn from an
underlying population of undefined (and unknown) size and each will vary from
the other on all the characteristics that we measure.  The question is by how
much? One must also determine how much variability is attributed to biology
and how much is attributed to test error.  Unless one has a number of samples
of the species, drawn from the population independently it is unlikely that we
can begin to answer these two questions.  This obvioulsy impacts the way in
which we interpret that data as well.   Are the clusters fuzzy , overlapping
or artifically created as a function of our sampling technique ?

One way around this problem might be to apply non-parametric statistical
methods to test hypothetical relationships.  The Kolmogorov-Smirnov goodness
of fit test is one such example.  It allows one to test the relationship
between two or more organisms (expressed in the form of cumulative
distributions of the underlying taxonomic character vectors used to define the
organisms) and to test whether or not those strains belong to the same
population (as defined by the taxonomic measurements).

One final point needs to be emphasized.  The taxonomic relationships that we
propose as "experts" are only theories.  Confidence in those theories (e.g.
validation) can only be achieved after others employ our methodologies with
the same reference material and other samples drawn from nature and arrive at
the same "identification".

George Garrity
Natural Products Research
Merck Research Laboratories
Rahway, NJ
george_garrity at

Date: 6/13/95 6:01 AM
To: George Garrity
From: Nicolas Bailly
On Mon, 12 Jun 1995 11:39:42 -0400, Warren Lamboy wrote

>In my opinion, a good way to quantify one's confidence in a determination is
>by comparison to known standards, that is, by statistical comparison to
>specimens whose identities are known.  There are a number of ways to do this,
>but one way, with which I am familiar is that of disjoint principal component
>analysis (Systematic Botany 1990, Vol.15:3-12) which was first developed by
>Dr. Svante Wold, of the University of Umea, Umea, Sweden.
The problem with many statistical analysis is that the data MUST match some
n distribution, more or less following the robustness of the method. And the
s they match it, the less the interpretations are accurate.

Moreover, the traditionnal normal and lognormal distributions seem to me

not always                <<<-----------              To be
not often                <<<-----------               choosen

adjusted for biological problems. Even if PCA is robust face to the data
ty, I guess these kinds of data are too far of the normality.

I think the problem of confidence is relevant to the fuzzy logic (theory of
ibilities), not to the statistics (theory of probabilities).

Nicolas Bailly
Museum National d'Histoire Naturelle,
Laboratoire d'Ichtyologie Generale et Appliquee
43, rue Cuvier, 75231 Paris Cedex 05, France
Tel: (33 1) 40 79 37 63   /   40 79 37 49   Fax: 40 79 37 71
Telex: MUSNAHN 202 641 F           E-Mail:    bailly at

More information about the Taxacom mailing list