Text Extraction Again (from Taxonomic e-text)

Mike Dallwitz mike.dallwitz at NETSPEED.COM.AU
Tue Jan 27 23:58:57 CST 2004

- From: Mary Barkworth <Mary at BIOLOGY.USU.EDU>

> ... I would be somewhat kinder about the quality of descriptions in
> existing treatments, probably because I have written some of them. I
> agree that existing descriptions (including my own) are usually not
> parallel. Humans do not need complete parallelism; they have a brain.

People can make assumptions about missing or ambiguous information, and
this is an essential part of everyday communication; but it's often
unsatisfactory in taxonomic descriptions. For example, if some taxa are
said to have a certain feature, and nothing is said in this regard
about other taxa, do those taxa lack the feature?

Descriptive databases don't necessarily need a high degree of
parallelism, but the information that they contain should be
comparative, i.e. expressed in terms of well formed characters. This is
often lacking in conventional descriptions. For example, 'petal color'
could be described in terms of large numbers of different words and
phrases, but to form a _character_ for petal color, we need a fairly
small, mutually exclusive set of terms (states).

For a discussion of the inadequacies of conventional descriptions, see
'Watson, L. (1971). Basic taxonomic data: the need for organization
over presentation and accumulation. Taxon 20, 131-136'.

> I had a class use an interactive key to Utah plants. The success rate
> was no higher than using ordinary keys.  Moreover, at the end of the
> exercise, the students had learned little about the plants they were
> identifying (the program is really irrelevant; it was not Delta
> [Intkey]).

There are a few published comparisons of this type, but they are all
rather unsatisfactory. A comparison should take account of the
following points.

(1) The conventional key should use a subset of the data used by the
interactive key. (Otherwise, any observed difference could be due to
the quality of the data.)

(2) The program used _is_ relevant. Most interactive-key programs lack
many of the features necessary to make the keys work well. See
'Dallwitz, M. J. (2000 onwards). A comparison of interactive
identification programs. http://delta-intkey.com/www/comparison.htm'

(3) Some training in the use of interactive keys is necessary. See
'Strategies for using interactive keys' in 'Dallwitz, M. J., Paine, T.
A. and Zurcher, E. J. (2000 onwards). Principles of interactive keys.
http://delta-intkey.com/www/interactivekeys.htm'. A very bad strategy
is to use each character, in the order in which they are stored in the
key, until a result is obtained. One unpublished comparison of
interactive and conventional keys actually stipulated that this
strategy should be used.

> Granted interactive keys, which require much greater parallelism
> because they are being interpreted by a computer not a human, have
> the potential of permitting identification of fragmentary material.

Interactive keys don't _require_ greater parallelism - you can make an
interactive key using only the data from a conventional key. However,
most of the advantages of interactive keys stem from redundancy in the
data, i.e. the possibility of arriving at the answer via different
paths. This has nothing to do with 'interpretation' of the data by the
computer - it is completely mechanical. In both kinds of key, the main
difficulty is the interpretation, by humans, of the characters.
Interactive keys minimize this difficulty, because they can give a
correct answer in spite of errors by the user (or errors in the key

It's certainly not necessary for interactive keys to have complete or
nearly complete data matrices. In general, I'd advise against it,
because the time required to completely fill in the matrix would
usually be better spent on adding more characters for poorly separated
subsets of taxa (Intkey can find these subsets).

