[Taxacom] Reproducibility of descriptive data
m.j.dallwitz at netspeed.com.au
Sat Aug 8 00:42:02 CDT 2009
Bob Mesibov wrote:
> what we need is ... structured character data in perpetually interpretable
> digital form
DELTA provides such a form. It's fairly easily interpretable without a
computer program, unlike most other formats (e.g. Nexus, SDD). See, for example,
Britton, E.B. 1986. A revision of the Australian chafers
(Coleoptera: Scarabaeidea: Melolonthinae). Vol. 4. Tribe Liparetrini:
genus Colpochila. Aust. J. Zool., Suppl. ser. 118: 1-135.
in which descriptions are published (on paper) in DELTA format.
> Structuring of character data too iffy because of interpretation errors ...?
Yes - taxonomists don't pay enough attention to reproducibility of data.
This issue should be addressed in the earliest stages of a new project - see
below. As far as I know, no one has ever followed these suggestions.
As far as I know, the only tests of data reproducibility have been in the
context of testing keys, where the emphasis has been on the key methodology
rather than on the data per se. See http://delta-intkey.com/www/idtests.htm.
(And, incidentally, the methods used for assessing the key methodologies
were often poor.)
DELTA-L: 'Starting a new DELTA dataset', M. Dallwitz, 25 Feb 09
Pay particular attention to this paragraph [from the 'User’s guide to the
DELTA Editor' (http://delta-intkey.com/www/delta-ed.htm)]:
'If you are starting a data set from scratch, begin by entering only a few
characters and a few taxa (say about 5 of each), and recording the data for
these taxa. Then test all the applications you intend to use — for example,
produce natural-language descriptions, a conventional key, an interactive
key, and a cladistic tree. Then add a few more characters and taxa, and
repeat the testing. This iterative procedure helps you detect any problems,
particularly poor character definitions, before you have recorded much data.'
Many users ignore this advice. For example, they may put together a large
character list before starting to enter data for the taxa.
The computing aspects are fairly easy, but the biological aspects are
difficult for most people - though they don't necessarily realise it. It's
difficult to define characters in ways that will achieve your goals, e.g.
easy identification, readable descriptions, classification. This is true no
matter what software you are using.
Ideally, a character list should be tested by having several people
independently record descriptions of about 10 disparate taxa. The different
versions of the descriptions will inevitably be different, i.e. the results
will not be reproducible. You need to know about this very early in the process.
The Differences option in Intkey can easily pinpoint the discrepancies. The
reasons for them can then be discussed, and the character list refined on
this basis. This process should then be iterated, adding a few more taxa
each time, until the character list seems satisfactory.
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com
More information about the Taxacom