[Taxacom] Language tags for scientific names

Andy Mabbett andy at pigsonthewing.org.uk
Thu Jun 26 14:07:22 CDT 2008

In message
<5ebbead70806260533h3b6afe8ai11cd1d387549a7b5 at mail.gmail.com>, Gregor
Hagedorn <g.m.hagedorn at gmail.com> writes

>Scientific names not only occur as labels of an
>object, but also in comments and other free-form text. Most projects
>devise special markup for these, but we believe that many more
>purposes can be served (including when using scientific names in
>DublinCore based metadata schema) if a language tag were available. In
>practice "la" for latin is frequently abused for this purpose, but a
>more precise and correct tag seems desirable.

I agree. Thank you for re-raising the issue.

>In general, it seems desirable to have a generic form as well as forms
>specific to the codes of nomenclature (BC, ICVCN, ICBN, ICNCP, ICZN).

What is your reasoning for this (especially given that it would be
desirable for the new code(s) to be usable by non-taxonomists, who may
not be familiar with such bodies)?

I think the relevant authorities are more likely to be persuaded to
allow a single code, than a set.

>The name for the generic form is perhaps most difficult to agree, my
>proposal would be to use TAX (for taxonomic community), but I welcome
>other proposals.

So long as nothing ridiculous is proposed, the actual term used is not
as important as the need to have one in the first place, That said, we
need to encourage understanding amongst people who speak many languages,
and perhaps not English. Possibilities might include:

        TAX     Taxonomy
        BIO     Biology (or BLG, BLY, BGY)
        BTA     Biota (or BIA, BOA)
        SCI     Scientific name
        LIN     Linnaean

Two letter equivalents might be needed; BT and TX are available, for
instance, but these are not: BI, BO, SC, SI, LI, LN, TA. All the above
three-letter options are free (per

>Available options for denoting scientific names using standard
>xml:lang or html:lang attributes are thus:
>* Register a new basic language code like sc / stn (scientific taxon name)
>** Such a proposal was made in 2003 by Andy Mabbett
>   (see http://www.alvestrand.no/pipermail/ietf-languages/2003-February
>   and refuted on reasons that do not convince me.

Nor me!

Perhaps we should return there? It might be that my proposal suffered by
being made from an amateur (not that that's bad thing!) individual,
rather than someone - or a group - with the backing of an organisation
or academic institution(s). Also, personal circumstances meant that I
was unfortunately unable to participate in the cited debate, which
followed my initial proposal.

Though that debate discussed pronunciation, the issue of translation is
more crucial ("Circus cyaneus" in an English page should not be
translated to "Zirkus cyaneus" if the page is translated into German!).

>** Example name codes: sc-TAX, sc-ICVCN, sc-BC, sc-ICZN

The first of those is redundant, but the others are good, if ether must
be multiple codes, because publishers and parsers which do not
understand the sub-codes fall back to a taxonomy-specific parent ("sc").

>* Use zxx- for "non-language dependent"
>** Example name codes: zxx-TAX, zxx-ICVCN, zxx-BC, zxx-ICZN
>* Use x- for "experimental or extension range" (IETF only)
>** Example name codes: x-TAX, x-ICVCN, x-BC, x-ICZN

I don't think those are as good as the first option, if there is more
than one code, because they do not fall back to a common
taxonomy-specific parent (which is "sc" in your above example).

>My preference would be to use the zxx- range because it probably
>informs processors not knowing the specific codes best how to handle
>this information (i.e. that it would be appropriate in any linguistic

That's a fair point, if there's only one taxonomic sub-code ("zxx-TAX",
say); but otherwise means that "zxx-ICVCN" is not a child of "zxx-TAX".

>I look forward for a good discussion, including the pointer to where
>someone else has already solved the problem and I have not found it

Another option is to retain the Latin code as a fallback, but with a
taxonomy sub-code ("la-TAX"). I see this as less desirable, but perhaps
more easily achieved. Again, one issue would be that "la-ICVCN" is not a
child of "la-TAX".

You might also be interested in the draft "Species" microformat:


that offers an alternative method for marking up (X)HTML content so as
to indicate which parts are taxonomic names:

        <p>Yesterday, I saw a
        <span class="biota">
        <span class="vernacular">House sparrow</span>
        (<i class="binominal">Passer domesticus</i>)

Of course, the microformat could be used in conjunction with a language

        <p>Yesterday, I saw a
        <span class="biota">
        <span class="vernacular">House sparrow</span>
        (<i class="binominal" lang="SC">Passer domesticus</i>)

Much work remains to be done to develop that microformat, but the draft
is already widely used on Wikipedia.

Please let me know if I can assist further.

Andy Mabbett

More information about the Taxacom mailing list