[Taxacom] Towards a consensus higher classification of organisms (was: List of Orders of the world), misspellings, etc...

Wed Jun 11 22:14:56 CDT 2008

Dear Taxacom-ers,

Back in February there was some discussion on this list, under the
thread "List of Orders of the world", regarding what resources exist
that compile existing higher taxon names (presumably for all life, i.e.
extinct and extant), e.g. see David Remsen's reply
to a post by Joel Hannan, which I have only just become aware of (having
not previously subscribed to Taxacom, a situation I have now
rectified!!), and also there was some relevant (serious and
not-so-serious) discussion under the related thread "complete list of
all species". Actually I think that both topics are good ones and
deserve a considered response (of course I would class David's reply in
the latter category already).

With regard to a complete listing of higher taxa, my ideal request would
be for some sort of "benchmark" publication along the lines of Parker
(ed.)'s "Synopsis and Classification of Living Organisms" (publ. 1982),
for extant taxa, and Benton (ed.)'s "The Fossil Record 2" (publ. 1993),
combined into a single unified classification, by acknowledged experts
for different groups, as a "snapshot" of best achievable consensus as at
time YYYY, that could be updated periodically. The first of these was
commercially funded by the publisher I believe, the second was sponsored
by learned societies (the Palaeontological Association, the Royal
Society, and the Linnean Society), and both were huge tasks beyond the
scope of any one author.

In the absence of such a benchmark list, various individual or small
group efforts do provide some steps along the road, hopefully at least
internally consistent, as we know, namely Catalogue of Life for 50%-60%
of extant taxa, The Paleobiology Database for fossils, and other
initiatives such as CuStar
(http://starcentral.mbl.edu/custar/portal.php) and the Taxonomicon
(http://www.taxonomy.nl/taxonomicon/) which already contain quite a
range of content (their various authors can advise as to how "complete"
these might be). For example (to answer Joel's original question), a lot
of what he is seeking can be obtained by going (e.g.) to the
Taxonomicon, selecting the option "Taxon by Scientific name (above
family group)", and typing (e.g.) the letter "A" in the search box, to
view all higher taxa currently treated in that system beginning with
that letter, and so on. As pointed out by others, quite a lot of
supporting information is also available (in variable quality, but often
surprisingly useful) on Wikipedia, at least once you have a list of
terms to search for...

My own IRMNG database (search entry point
http://www.cmar.csiro.au/datacentre/irmng/ ) was up to now
web-searchable only at the level of species, genus, and family, although
internally there is a "working" hierarchy to ease the data management.
With some trepidation (since it is not the principal focus of my
project, and also has not been scrutinized for inconsistencies as yet),
I have made a set of higher taxon entry points, available via the
(non-advertised, hopefully non crawlable) link xxxxx/index2.html, where
xxxxx = http://www.cmar.csiro.au/datacentre/irmng/ : look for the
section "Browse the hierarchy / generate taxon lists" near the top. The
present range of classes, orders, families and genera will also be
expanded in the near future as more data are added. As I said, compiling
these higher taxa is not the "main game" for this activity, however some
may possibly find them useful.

Finally, a comment on the question of misspellings and variant
spellings, which as many persons on this list will know is not trivial
in the area of taxonomic data (or any biological data, in reality)
compilations. My efforts in this respect are currently implemented via
the IRMNG "search" box at the level of both genus and species. For
example if you enter "Apseudes latreillei" and then press "Check species
name", you will get not only the exact match (if it exists) but also
some near matches that are candidate misspellings. The near match
algorithm is my own design (i.e. not available elsewhere), however will
hopefully be published in due course. Of course it is a separate
question as to which (if any) is the "correct" spelling, however
identifying candidate near matches is obviously a good first step.

I would be very happy to receive any feedback on any of the above via
this list.

