[Taxacom] Reproducibility of descriptive data

Stephen Thorpe s.thorpe at auckland.ac.nz
Wed Sep 9 16:03:36 CDT 2009

[Mike Dallwitz wrote] _Whatever_ we want to say about a taxon (e.g. what its boundaries, distribution, abundance, or uses are), we need to define the
taxon that we want to talk about. And the only way to do that is to describe it in a reproducible way, so that people can identify individuals as belonging or not belonging to the taxon

[reply] Species are defined by their name-bearing types (holotypes or lectotypes or neotypes or syntypes). A description of a species is a circumscription of its boundaries, according to the describer. So, we don't describe a taxon in order to define it so that we can then talk about boundaries. Rather, by describing it, we ARE talking about its boundaries, but the species is defined by its type.

>describe it in a reproducible way, so that people can identify individuals as belonging or not belonging to the taxon
No, describing it in a reproducible way only allows people to identify individuals as being within or else outside the boundaries of the species as circumscribed in the description. These boundaries could be wrong, so the description is certainly not a DEFINITION of the species (definitions are true by definition and cannot be wrong!)

What you say applies more to genera and other "subjective" taxa, but not to species, which are objectively defined once a type is designated...


From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mike Dallwitz [m.j.dallwitz at netspeed.com.au]
Sent: Thursday, 10 September 2009 1:56 a.m.
Subject: [Taxacom] Reproducibility of descriptive data

Stephen Thorpe wrote (under the subject 'Read... and believe...'):

"the objectivity/subjectivity question about species boundaries has little
to do with the reproducibility/reliability of identifications from a given

I think it does. _Whatever_ we want to say about a taxon (e.g. what its
boundaries, distribution, abundance, or uses are), we need to define the
taxon that we want to talk about. And the only way to do that is to
describe it in a reproducible way, so that people can identify individuals
as belonging or not belonging to the taxon.

Bob Mesibov wrote (under the subject 'Response to Dallwitz'):

"Another point of disagreement is that you seem to be taking a
particularly hardline approach. Please correct me if I'm misinterpreting,
but your position seems to be

- unless a character can be explicitly defined as one that can be observed
in mutually exclusive states, it isn't a proper character

- unless a description consists entirely of reports on characters of this
kind, it isn't a proper description

- unless taxonomists produce descriptions of this kind, they aren't doing
proper taxonomy."

Yes, that's more or less correct, except that I would tone down the last
two points a bit! In taxonomy, it _is_ an unusually hardline approach to
require reproducible data, but less so in most other sciences. Don't
bother telling me that it's harder to do in taxonomy - I know! I'm just
suggesting that it's a worthwhile aim, and suggesting methods that might
help to achieve it.

"You go on to criticise people who go down this route but fail, in your
view ("... they generally don't succeed very well...")"

That _is_ my view, but, as I'm not a taxonomist, I base it mainly on
published studies rather than on personal observations - see
'Effectiveness of Identification Methods – References'
(http://delta-intkey.com/www/idtests.htm). As a taxonomist with direct
experience of this subject, do you think (a) that the error rates reported
in these studies are abnormally high; or (b) that they are acceptable, and
there's no point in trying to improve them?

"More reliable, in the Dallwitz view, is to follow a structured key which
has had non-reproducibility 'nipped in the bud' by being constructed ..."

The context from which the original quote was taken was the construction
of descriptive databases. But, in the more general context of our current
discussion, the relevant suggestion is that you make sure that people will
understand your terminology _before_ you record a lot of data (sensu lato)
using that terminology. I don't think that's unreasonable.

"my point was that when an apparently apomorphic *state* is later
recognised as a plesiomorphic *state*, then the *character* may get
dropped from the description."

My points are (1) that if you drop all characters that have a
plesiomorphic state, you will have few if any characters left; and (2) in
some contexts, particularly identification, whether states are
plesiomorphic is irrelevant.

"I didn't get the impression that Watson was talking only about
contemporary taxonomic publications. I don't see how he could have been,
since *comparative* character data in a publication are only *comparative*
when tested against other works"

The concept of 'comparative' data intended by Watson (and by me) is
certainly not restricted to comparisons _between_ works, as I think is
clear even from the small excerpt I gave. Here's part of it again:

"Perusal of the average taxonomic-descriptive work usually reveals that as
a source of comparative data it is hopeless. One genus will be described
in terms of criteria that receive no mention in the next. Even species in
the same genus may be described inconsistently. It is often impossible to
distinguish with any degree of conviction between actual observation and
extrapolation, between absence of a feature and mere failure to seek or
comment on it."

But obviously:

"Given the situation prevailing in individual publications it is not
surprising that scanning across them is even less satisfactory."

"You're happy to have images incorporated holus-bolus into descriptions
and keys, but you want text to be treated differently: restricted to
explicitly defined strings tested in particular ways."

Yes, by and large. The images are mainly (but not entirely) a convenient
substitute for obtaining, preparing, storing, and accessing the actual
specimens. Most (but not all) of the value added by the expert taxonomist
is in the descriptions (or perhaps in the annotation of the images, as you
suggest later). This added value is most useful if it is reproducible,
comparative data. Less formal information can also be useful (DELTA allows
for it in the form of comments and text characters).

"You write ""This was for a phylogenetic analysis." Then it doesn't really
matter whether the data are reproducible; the odds are no one will ever
know or care." Excuse me?"

OK, you got me. That was an exaggeration, and written somewhat tongue in

"I and other taxonomists have referred to that structured data for

But are the data reproducible? And have they been sanitized? Sometimes
people say 'These data are not for identification; they are for
classification (or information retrieval)'. To me, this means 'These data
are not reproducible; and I don't care (they'll serve the purpose for
which I intended them)'.

""(a) and (b) imply that characters can change. A properly defined
character can't change." This is hardline to the point of being
nonsensical, but it's at the core of our disagreement. I know of
explicitly definable characters that have changed when observers moved
from fixed to fresh material, or (more often) from optical to electron
microscopy. Sure, in a strictly logical sense the formerly understood
character didn't change. You can still observe the old, incorrect or
misleading states of the structure in question and report on them, if you

Yes, I did mean it in a strictly logical sense. It's _not_ the same
character if its meaning has changed. Obviously, the data recorded against
the two characters will be different. You can't interpret the data
correctly unless you know which character was used. You _need_ to observe
the "old, incorrect or misleading states" if you want know whether a
specimen belongs or doesn't belong to a taxon as described by the original

"But the taxonomy of those taxa has moved on from there, and so has the
character, in the sense used by taxonomists."

Yes. Excellent. But it's now a different character. If you want to use it
(which doubtless you should), you'll have to go back to your specimens,
and record new data against the new character.

"We agree, I hope, on one thing. It is impossible to completely describe
any specimen. A description *must* consist of a subset of all the
possible, explicitly definable characters and their states. The selection
of this subset is made by a taxonomist. It will be based on what that
taxonomist and previous specialists regard as diagnostic characters, and
will also include (this is important) observations similar to those made
by earlier workers, no matter how non-diagnostic - not just to continue a
silly tradition, but to make the contemporary and earlier descriptions

Some people see this selection process as 'arbitrary' and 'subjective'.
They would prefer to select a different or more restricted subset of
characters and states which is suited to machine analysis and
presentation, as though this approach wasn't arbitrary and subjective as
well (i.e., a matter of preference). Whether you see this difference as
one of slackness vs rigour, or wetware vs software is less important than
whether you think the difference matters. I use both approaches and find
both of them useful. It's disappointing that you reject one of them."

Yes, I do agree with the first paragraph; I'm not sure what you mean by
the second. However, I know that taxonomists mostly can't ensure (by using
their wetware), that their data are diagnostic. One of the first things I
do when I am asked to look at a dataset is to use Intkey check whether the
taxa are adequately separated; they seldom are. This has little to do with
the quality of the taxonomic work (how well the characters have been
defined, etc.). It simply reflects the fact that computers can do _some_
things better than people - particularly analyzing large amounts of data.
So it's wise to make use of those computer abilities, to supplement the
human ones. Intkey can guide the taxonomist in the efficient selection of
a good subset of characters. I won't go into details here.

"I attach a drawing of a millipede gonopod (part of the male genitalia).
The drawing was published in 2003 by a millipede taxonomist (not me).
There is not, so far as I know, another organism on this planet that
carries a structure that could be confused with this one. This drawing is
both necessary and sufficient evidence for both circumscribing a species
and identifying a gonopod-bearing specimen.

As it happens, this species is in a family whose *genera* can only very
rarely be morphologically diagnosed or identified *without* reference to a
drawing like this one. Not only do most species look much the same ..."

Yes, I understand your point. However, it probably requires a specialist
to identify by matching images in this way, particularly if there are
large numbers of possible taxa (e.g. hundreds or thousands). If that's the
only way it can be done in some cases, then so be it. But taxonomists do
usually try to communicate identification methods by means of characters,
preferably defined with the help of images. Also, the extraction of
something like characters is essential for classification, even if the
extraction is done informally by wetware (e.g. 'I think these ones look

Mike Dallwitz
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com


Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

More information about the Taxacom mailing list