[Taxacom] Language tags for scientific names
andy at pigsonthewing.org.uk
Sat Jun 28 02:54:32 CDT 2008
<CFE4C8AA9C89E744B4D0FE50DC5AE303E0C97F at exactn2-cbr.nexus.csiro.au>,
Donald.Hobern at csiro.au writes
>I agree it is important to consider how we should represent scientific
>names in XML data,
I think this debate is about how to represent the language in which
taxonomic names are rendered (in other words: "Under what IETF-language
code are taxonomic names classified?"). The debate should be
technologically neutral (in other words, forget XML, HTML, Dublin Core,
>We need to ask:
>1. What do we (as the interested community) really want to represent by
>such codes? When we want to give additional information about a
>particular scientific name (including e.g. code), why don't we just use
>proprietary tags (like the TDWG RDF TaxonName properties)?
We must remember that this is one interested community, not THE total
interested community. For example, a news site such as the BBC may wish
to mark up taxonomic names in its news reports:
Between 1977 and 2000, house sparrow (Passer domesticus) numbers
in the UK declined by 65%.
to improve their accessibility for people using assertive technology,
without caring which code is used, or even what a code is.
>2. Is this compatible with the intended use of the language tag? If
>something is a proper name with no translation into different languages,
>what would ISO expect? Is the language code in any way appropriate in
>such a case?
Conversely, in what language are taxonomic names written? In the above
example, is "Passer domesticus" written in English? Does it become
German on a page written in German, and Taiwanese likewise? What
pronunciation rules apply, in each case?
>A single code, (null, "zxx", "tax" or something) would encourage
>consistent interpretation better
Better than the status quo, certainly; but not better than a more
>but I still think we are
>misunderstanding the point of the language tag. Surely its primary role
>is in situations in which there may be different versions of some
>content in different languages and software may choose the most
>appropriate for a given user.
No; langauge tags have no defined "primary" role, other than to indicate
the language (or "non-language") used. In fact, RFC 4646 states:
This document describes the structure, content, construction,
and semantics of language tags for use in cases where it is
desirable to indicate the language used in an information
There are many reasons why one would want to identify the
language used when presenting or requesting information.
In addition, knowledge about the particular language used by
some piece of information content might be useful or even
required by some types of processing; for example,
spell-checking, computer- synthesized speech, Braille
transcription, or high-quality print renderings.
>This is not the case here and I suspect
>the ISO recommendation would be to use "zxx" or nothing at all.
I'm not sure that the ISO would make any such recommendation; the
relevant body is the IETF-languages group, and we may well need to ask
them, having first clarified our collective preference, and outlined our
One such use-case would be spell checking, where "zxx" is unhelpful, but
"zxx-tx" (or whatever) can tell a user agent to spell check against a
dictionary of taxonomic terms.
Likewise, "zxx" says nothing about how to pronounce the relevant text
(is the "ph" in pronounced like an "f" in every language? How would a
culture with no knowledge of that practice pronounce "Drosophila"?
"Dro-sop-hi-la"?), but "zxx-tx" may indicate the pronunciation to be
It may well be that, absent of other arguments, the IETF-languages group
would suggest "la-x-taxonomy" or nothing at all.
More information about the Taxacom