Comprehensible descriptions (x Delta-l)
miked at ENTO.CSIRO.AU
Mon Mar 27 16:42:36 CST 1995
27 March 1995
> From: Mike Dallwitz
> To: Taxacom
> Natural-language descriptions generated by CONFOR from DELTA data are
> comprehensible, unambiguous, and comparative, provided the DELTA data are
> well constructed. ... I have always been surprised at the vehemence with
> which some people oppose the use of the semicolon to separate characters ...
> From: Curtis Clark <jcclark at csupomona.edu>
> To: Mike Dallwitz
> My point was more that nonsense almost never makes sense in botanical Latin,
> but often "seems okay" in English. One of the strengths of English as a
> "natural language" is that parts of speech are not rigid, so that nouns can
> be adjectives and verbs be nouns. From the standpoint of description,
> however, this is a weakness.
> I didn't realize [that the use of semicolons] was an issue. If I
> understand you correctly, I have always used semicolons in this sense even
> in Latin diagnoses (or at least intended to).
> From: "Diana G. Horton" <dhorton at vaxa.weeg.uiowa.edu>
> To: Mike Dallwitz
> Your comment on people's reluctance, opposition, to the use of semicolons in
> descriptions motivates me to write (amazing what motivates some people,
> eh?!). I have always used semicolons & was unaware that it isn't considered
> appropriate, but perhaps I'm not using them in the way that you are
> referring to. Would you indulge me and take a look at the following portion
> of one of my descriptions & tell me if my usage is the type that people
> object to?
> Upper laminal cells 7-12(14)um wide, 7-12(16) um long, with 2-5 papillae,
> each +- "c"-shaped; upper marginal cells (9)12-14 um wide, 7-12 um long;
> transitional cells with walls smooth above basal cells on abaxial surface,
> on adaxial surface walls smooth well above basal cells; basal laminal cells
> 40-90 um long, 9-23 um wide, prominent, transverse walls dark-orange,
> longitudinal walls dark-orange, superficial walls smooth, irregularly +-
> perforated; basal marginal cells yellowish, distinctly differentiated in 3-5
> If this is what they object to, I don't know how one could make sense of
> such a sentence w/out the semicolons.
Other powerful but potentially dangerous features of English are: the
dependence of meaning on word order; the dependence of meaning on punctuation;
and the ability to omit words. Natural-language descriptions are generated
from character lists and coded descriptions by well-defined rules, which can
potentially eliminate all ambiguity provided that care is taken with the
construction of the character list and with the coding of the data
(particularly the inclusion of qualifying comments). Simpler rules tend to
give less ambiguity, but more stilted and wordy descriptions. There is strong
pressure from users for more flexibility in generating natural-language
descriptions, and we are going to provide this in the new CONFOR. However, it
must be realized that the use of this flexibility is going to require more
care if ambiguity is to be avoided.
The objection to which I referred is to the use of semicolons BETWEEN
CHARACTERS. As produced by CONFOR, Diana's description might look something
Upper laminal cells 7-12(14)um wide; 7-12(16) um long; with 2-5 papillae; each
+- "c"-shaped. Upper marginal cells (9)12-14 um wide; 7-12 um long.
Transitional cells with walls smooth above basal cells on abaxial surface;
smooth well above basal cells on adaxial surface. Basal laminal cells 40-90 um
long; 9-23 um wide; prominent; transverse walls dark-orange; longitudinal
walls dark-orange; superficial walls smooth; irregularly +- perforated. Basal
marginal cells yellowish; distinctly differentiated in 3-5 rows.
By default, CONFOR places each character in a separate sentence, and places a
semicolon between states of the same character (when there is intra-taxon
variability). The comma is not used, and is therefore available for internal
use in character states.
The user may specify that characters are to be combined in sentences, as in
the above example. In that case, a semicolon is used between characters, and a
comma between states of the same character. In the simplest case, the
`feature', a noun plus possible qualification, is the same in each character
of a sentence, and is omitted from the second and subsequent characters. (E.g.
in the above example `with 2-5 papillae' means `upper laminal cells with 2-5
papillae'.) If the features are not the same, those words from the start of
the second and subsequent feature which are the same as words at the start of
the first feature are omitted. Even this simple `improvement' leads to
potential ambiguity, as it is impossible to mechanically reconstruct the
`feature' of the second and subsequent characters. In practice, however, it is
not too difficult to produce unambiguous descriptions in this way.
The user may also specify that some semicolons are to be replaced by commas.
This leaves no punctuation mark available for use between or within states.
The DELTA User's Guide contains the following warning. `This directive can
lead to poor wording when used with complex character descriptions or complex
Mike Dallwitz Internet md at ento.csiro.au
CSIRO Division of Entomology Fax +61 6 246 4000
GPO Box 1700, Canberra ACT 2601, Australia Phone +61 6 246 4075
More information about the Taxacom