Language tags for scientific names

Andy Mabbett andy at pigsonthewing.org.uk
Sat Jun 28 02:54:32 CDT 2008

>I agree it is important to consider how we should represent scientific
>names in XML data,

I think this debate is about how to represent the language in which
taxonomic names are rendered (in other words: "Under what IETF-language
code are taxonomic names classified?"). The debate should be
technologically neutral (in other words, forget XML, HTML, Dublin Core,


>We need to ask:
>1. What do we (as the interested community) really want to represent by
>such codes?  When we want to give additional information about a
>particular scientific name (including e.g. code), why don't we just use
>proprietary tags (like the TDWG RDF TaxonName properties)?

We must remember that this is one interested community, not THE total
interested community. For example, a news site such as the BBC may wish
to mark up taxonomic names in its news reports:


        Between 1977 and 2000, house sparrow (Passer domesticus) numbers
        in the UK declined by 65%.

to improve their accessibility for people using assertive technology,
without caring which code is used, or even what a code is.

>2. Is this compatible with the intended use of the language tag?  If
>something is a proper name with no translation into different languages,
>what would ISO expect?  Is the language code in any way appropriate in
>such a case?

Conversely, in what language are taxonomic names written? In the above
example, is "Passer domesticus" written in English? Does it become
German on a page written in German, and Taiwanese likewise? What
pronunciation rules apply, in each case?


>A single code, (null, "zxx", "tax" or something) would encourage
>consistent interpretation better

Better than the status quo, certainly; but not better than a more
specific code.

>but I still think we are
>misunderstanding the point of the language tag.  Surely its primary role
>is in situations in which there may be different versions of some
>content in different languages and software may choose the most
>appropriate for a given user.

No; langauge tags have no defined "primary" role, other than to indicate
the language (or "non-language") used. In fact, RFC 4646 states:

        This document describes the structure, content, construction,
        and semantics of language tags for use in cases where it is
        desirable to indicate the language used in an information


        There are many reasons why one would want to identify the
        language used when presenting or requesting information.


        In addition, knowledge about the particular language used by
        some piece of information content might be useful or even
        required by some types of processing; for example,
        spell-checking, computer- synthesized speech, Braille
        transcription, or high-quality print renderings.

>This is not the case here and I suspect
>the ISO recommendation would be to use "zxx" or nothing at all.

I'm not sure that the ISO would make any such recommendation; the
relevant body is the IETF-languages group, and we may well need to ask
them, having first clarified our collective preference, and outlined our

One such use-case would be spell checking, where "zxx" is unhelpful, but
"zxx-tx" (or whatever) can tell a user agent to spell check against a
dictionary of taxonomic terms.

Likewise, "zxx" says nothing about how to pronounce the relevant text
(is the "ph" in pronounced like an "f" in every language? How would a
culture with no knowledge of that practice pronounce "Drosophila"?
"Dro-sop-hi-la"?), but "zxx-tx" may indicate the pronunciation to be

It may well be that, absent of other arguments, the IETF-languages group
would suggest "la-x-taxonomy" or nothing at all.

Andy Mabbett

