[Taxacom] Language tags for scientific names

Andy Mabbett andy at pigsonthewing.org.uk
Sat Jun 28 02:54:32 CDT 2008


In message
<CFE4C8AA9C89E744B4D0FE50DC5AE303E0C97F at exactn2-cbr.nexus.csiro.au>,
Donald.Hobern at csiro.au writes

>I agree it is important to consider how we should represent scientific
>names in XML data,

I think this debate is about how to represent the language in which
taxonomic names are rendered (in other words: "Under what IETF-language
code are taxonomic names classified?"). The debate should be
technologically neutral (in other words, forget XML, HTML, Dublin Core,
etc.).

[...]

>We need to ask:
>
>1. What do we (as the interested community) really want to represent by
>such codes?  When we want to give additional information about a
>particular scientific name (including e.g. code), why don't we just use
>proprietary tags (like the TDWG RDF TaxonName properties)?

We must remember that this is one interested community, not THE total
interested community. For example, a news site such as the BBC may wish
to mark up taxonomic names in its news reports:

        <http://news.bbc.co.uk/1/hi/england/london/4743250.stm>

        Between 1977 and 2000, house sparrow (Passer domesticus) numbers
        in the UK declined by 65%.

to improve their accessibility for people using assertive technology,
without caring which code is used, or even what a code is.

>2. Is this compatible with the intended use of the language tag?  If
>something is a proper name with no translation into different languages,
>what would ISO expect?  Is the language code in any way appropriate in
>such a case?

Conversely, in what language are taxonomic names written? In the above
example, is "Passer domesticus" written in English? Does it become
German on a page written in German, and Taiwanese likewise? What
pronunciation rules apply, in each case?

[...]

>A single code, (null, "zxx", "tax" or something) would encourage
>consistent interpretation better

Better than the status quo, certainly; but not better than a more
specific code.

>but I still think we are
>misunderstanding the point of the language tag.  Surely its primary role
>is in situations in which there may be different versions of some
>content in different languages and software may choose the most
>appropriate for a given user.

No; langauge tags have no defined "primary" role, other than to indicate
the language (or "non-language") used. In fact, RFC 4646 states:

        This document describes the structure, content, construction,
        and semantics of language tags for use in cases where it is
        desirable to indicate the language used in an information
        object.

        [...]

        There are many reasons why one would want to identify the
        language used when presenting or requesting information.

        [...]

        In addition, knowledge about the particular language used by
        some piece of information content might be useful or even
        required by some types of processing; for example,
        spell-checking, computer- synthesized speech, Braille
        transcription, or high-quality print renderings.

>This is not the case here and I suspect
>the ISO recommendation would be to use "zxx" or nothing at all.

I'm not sure that the ISO would make any such recommendation; the
relevant body is the IETF-languages group, and we may well need to ask
them, having first clarified our collective preference, and outlined our
use-cases.

One such use-case would be spell checking, where "zxx" is unhelpful, but
"zxx-tx" (or whatever) can tell a user agent to spell check against a
dictionary of taxonomic terms.

Likewise, "zxx" says nothing about how to pronounce the relevant text
(is the "ph" in pronounced like an "f" in every language? How would a
culture with no knowledge of that practice pronounce "Drosophila"?
"Dro-sop-hi-la"?), but "zxx-tx" may indicate the pronunciation to be
used.

It may well be that, absent of other arguments, the IETF-languages group
would suggest "la-x-taxonomy" or nothing at all.

-- 
Andy Mabbett




More information about the Taxacom mailing list