[Taxacom] Language tags for scientific names

Andy Mabbett andy at pigsonthewing.org.uk
Fri Jun 27 18:46:58 CDT 2008

In message
<5ebbead70806271530o4f73358do204a7179445ec51e at mail.gmail.com>, Gregor
Hagedorn <g.m.hagedorn at gmail.com> writes

>The point of the proposal to use xml:lang is to offer a general way to
>denote certain passages of free-form text or structured text elements
>as general as possible.

xml:lang is one use for an IETF-language tag; but the same tags can also
be used in other cases, outside XML, such as the "lang" attribute in

>It is possible to alternatively agree on a microformat (using the class
>attribute) for xhtml,

For clarity, microformats can be used in HTML, not just XHTML.

>to quote from rfc 4646:
>    de-CH-1901 (German as used in Switzerland using the 1901 variant
>    sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

The difference between those two language tags and recent proposals for
including ICBN and the like, is that they "degrade gracefully", though a
hierarchy. For instance, if a parser does not understand "de-CH-1901",
it will fall back to using "de-CH", and it if doesn't understand that,
to "de".

This will work for, say, tx-QQ-ICBN ("tx" instead of "tax"; whatever
"QQ" might be), and for zxx-TX-ICBN, but only to a very limited degree
for zxx-x-ICBN/ zxx-x-TAX or zxx-ICBN/ zxx-TAX.

>It would be a community decision to use language tags as one of several

Indeed - isn't this the community debate about which tag(s) should be
used? Though the IETF-languages community is also part for the wider
community, which will make the final choice.

>Using the private use areas (everything after an "-x-" subtag) we could
>start without even registering. But it would be painful if everyone
>uses different private notations.

Quite - that's exactly why I decided against using -x-, after due
deliberation. I think taxonomic names are sued too widely, and by too
many publishers, to take that road.

Perhaps we are now ready to start to draw up a list of requirements,
which will no doubt need further tweaking:

   1    Valid according to RFC 4646

   2    Acceptable to IETF-languages community

   3    hierarchical (there must  be a parent, generic "taxonomy" level
        above any levels for specific codes)

   4    sub-tags for individual codes (if deemed appropriate)

   5    using -x- only as a last resort

My personal view is that (4) is unnecessary; and unlikely to succeed in
meeting (2), but I have worded this list as neutrally as possible, Would
anyone like to add to, or change, it?

[Apologies if I've misconstrued any of your points; it would be easier
to avoid dosing so if you would kindly quote some of the post to which
you are replying, to give context. Thank you.]

Andy Mabbett

More information about the Taxacom mailing list