Renaud Fortuner
Thu Mar 30 11:26:38 CST 1995
Thu Mar 30 11:26:38 CST 1995

In a recent exchange of messages with Mike Dallwitz, a question has come up
which I believe to be important, considering the current interest in
biodiversity: should we attempt to use "all" the characters or only the
characters that have been "critically evaluated and reconciled."

First, we
should clearly make the distinction between "recording" and "using" the
characters (e.g., for identification). It is obvious that critical evaluation
cannot even begin without a list of the things that need to be
evaluated.

Second, we should make a distinction among qualitative characters
between ordinal and nominal characters. (ordinal chr can be ordered - e.g.,
color from red to violet - nominal chr cannot be ordered - e.g., a structure
that exist either as a disc or a tube).

What happens when a new state is
added to an ordinal character? Obviously, this new state has to be correctly
placed among the existing states: e.g., orange will be placed between red and
yellow, not between blue and violet. Now, how can we make sure that similarity
is not changed by the greater number of states? The quick and easy way is to
systematically reset all the states on a 0 to 5 scale (if you have 5 states,
state #5 is 5, if you have 20 states, state #20 is 20x5/20=5). This can be
done in a second and you can continue your identification without having to go
to a taxonomist and ask him what to do about orange. (A more elaborate way
would be to identify a relation between the qualitative character and a
quantitative one (e.g., matching color with wave length) and use fuzzy logic
to make the computer understand what "orange" means).

Does this mean that an
orange specimen does not belong to a red species? Not at all, let's not
confuse identification and taxonomy. It only means that your specimen is
closer in color to a red species than to a blue species.

characters? That'll be for another day).

The other problems raised by Mike
synonyms?) can be solved if the traditional concept of character is
re-evaluated in the light of modern computer science. This is not the place to
would be interested in such an issue. If anybody has a suggestion, I would
appreciate to hear it. Thanks).

The other points, quickly:
"number of
differences" is one thing, "coefficient of overall similarity" is something
else. Do I prefer one or the other? No, I want both.
I have a similarity
algorithm for individual specimens and another for populations.
I do consider
discriminating power of the characters for the group of taxa that remain to be
discriminated.
I also consider reliability based on metadata.
The way I enter
general information about, e.g., color, is based on this new character concept

"There may be no point in prolonging this discussion
unless we get some input from other people."

Mike does have a point here
(although I did receive a couple of private messages). On the other hand,
nobody has told us to shut up yet. I guess we can continue until people start
throwing heavy objects at us.

Renaud Fortuner
fortuner at math.u-bordeaux.fr

