[Taxacom] (Fwd) Dichotomous key software

Kevin Thiele K.Thiele at cbit.uq.edu.au
Wed Aug 30 19:02:13 CDT 2006


This thread on the relative advantages of dichotomous vs "interactive" keys
is good. I'd like to summarise a few things in order to be clear.

1. We often make the mistaken equation "dichotomous key" = "paper key" and
"interactive key" = "computer key". Neither of these is strictly correct, as
"dichotomous" keys can be on computer and "interactive" keys can be on
paper. Even dichotomous is a bad term, as such keys may be polytomous. In
the following I'll use "pathway keys" and "matrix-based keys" - dichotomous
and similar keys use a hard-coded pathway through a set of statements or
questions to get to the answer, while matrix-based keys use an underlying
matrix of score data and a filtering or matching algorithm to get to the
answer.

Examples are:
Computer matrix-based keys: 
Lucid (www.lucidcentral.org) 
DELTA's Intkey (http://delta-intkey.com/)
SLIKS (www.stingersplace.com/SLIKS)
(there are many others)

Paper matrix-based keys:
early published polyclave keys printed on file cards and using a knitting
needle for their filtering algorithm! There are also early printed tabular
matrix keys (these never really took off)

Computer pathway keys:
Phoenix (www.lucidcentral.org)
EFG (http://efg.cs.umb.edu/)
There are also many hand-built static-html pathway keys on the web

Paper-based pathway keys
Virtually every flora and monograph has one

2. We need to distinguish two classes of advantage/disadvantage for these
keys - platform and algorithmic. Platform properties reflect
advantages/disadvantages of the platform on which the key operates (paper or
computer); algorithmic properties reflect differences in the ways the keys
(pathway or matrix) actually work, independent of the platform.

So - with that out of the way, here are some comments on the preceding
thread.

Advantages of a paper platform to me are:
1. You don't need to turn the bloody thing on!
2. It can go in the back pocket and if well-made is fairly waterproof
3. You flip through it instead of clicking through it

Disadvantages of the paper platform are:
1. It's expensive to produce (you can get much more content much more
cheaply onto a computer-based key than a paper-based key, as the cost is
basically the cost of assembling the material with no or negligible printing
costs)
2. They are impossible to maintain in realtime (that's why we have editions
of paper products, and why many paper products are out of print
3. You can't store the results (every time you open a book you're
essentially doing a cold boot in computer terms. With some computer
platforms you can save the results for later.)
4. What you see is absolutely what you get (there are few options other than
physical rearrangement for on-the-fly filtering of a paper key, to customize
a paper key, whereas some computer keys can be endlessly customized by the
user. In Phoenix, for example, you could have a pathway key to the trees of
the United States, and ask the computer filter out all trees that don't
occur in Nebraska, thus creating on-the-fly a key to the trees of Nebraska)

Advantages and disadvantages of the computer platform are basically the
inverse of those for paper.

Advantages of matrix-based keys:
1. They are usually random-access - you can answer features in any order and
can't get stuck on an unanswerable couplet
2. They can produce an answer very quickly - a very few features addressed
can give an identification
3. A good "Best" algorithm (see e.g. Lucid and Intkey) can suggest the next
best feature to address to find the shortest path to the identification
4. They are infinitely filterable (you can use subsets of features to
restrict the key to only those that are available or easy to use)
5. They are often more fun to use

Advantages of pathway keys:
1. They are not random-access (this may seem odd at first, but providing a
pathway through the data will often help the user)
2. For highly distinctive taxa, they can produce an identification quickly
(of course the bulk of taxa are not highly distinctive)
3. They are highly "efficient" even with high levels of polymorphism (see
the discussion on the contingency problem below)

Disadvantages of matrix-based keys:
1. They are random-access (naive users sometimes flounder around until they
work out the best way to use a Best algorithm)
2. They are relatively inefficient when the matrices have high levels of
polymorphism (the contingency problem)
3. Absences are the most important scores (especially if they use a
filtering algorithm), but absences are also the hardest scores to be certain
bout

Disadvantages of pathway keys:
1. They constrain the user to a single pathway (you can see that random
access is a blessing and a curse)
2. The Unanswerable Couplet Problem

The contingency problem is an interesting problem with matrix-based keys
that is rarely addressed. Consider a key to Families of Flowering Plants.
Consider a family F in which some members have opposite and some have
alternate leaves, some have red and some have blue flowers. Leaf arrangement
and flower colour will both be scored polymorphically for the family. Now it
may be that all the species in that family that have alternate leaves also
have red flowers - there are no blue-flowered alternate-leaved species. So
if I have a specimen with blue flowers and alternate leaves it would be nice
(and efficient) if the key were to exclude that family from the
identification. Unfortunately, it logically can't. Hence, matrix-based keys
that have polymorphic scoring will be relatively inefficient - you need to
answer more characters before you can discard that family. The flip side is
that some families which are highly polymorphic (in plants, think Rutaceae
and Scrophulariaceae) are often *very* hard to get rid of in matrix-based
keys

Disadvantage 3 of matrix-based keys above is another interesting one. Most
matrix-based keys use a filtering algorithm, in which taxa are thrown away
from the Remaining list if they don't match a chosen state. They are thrown
away on the basis of their "absent" scores - if taxon A is scored as not
having yellow flowers and the user has said that their specimen has yellow
flowers, then A will be thrown out of contention. But - it's theoretically
impossible to make an absolute assertion that A never has yellow flowers. Of
course, pathway keys work in the same way, except that they use a
restrictive set of characters, and the key builder can usually be fairly
certain of all the absences. Matrix-based keys potentially use a broader
range of characters for the identification, and hence the likelihood of a
false-negative (Type II?) error is greater. In my experience, this is far
and away the most common reason for errors with matrix-based keys.

Cheers - Kevin Thiele (Lucid)




_______________________________________________
Taxacom mailing list
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom



More information about the Taxacom mailing list