[Taxacom] Language tags for scientific names

Gregor Hagedorn g.m.hagedorn at gmail.com
Sun Jun 29 04:03:10 CDT 2008

Jim Croft proposes:

  <Label xml:lang="en">Japanese Maple</Label>
  <Label xml:lang="de">Fächer-Ahorn</Label>
  <Label name_lsid="yadda.yadda.yadda">Acer palmatum</Label>
  <Label concept_lsid="bhaj.blah.blah">Acer palmatum</Label>

Yes, that would be possible. Lets assume name_lsid and concept_lsid
are in some unique namespace. And lets not discuss whether the Mozilla
foundation or Microsoft LSID resolvers to Firefox and Internet
Explorer and Wikipedia will add name_lsid to mediawiki software or
others to their content management systems.

The problem I see is that general software would need to know the xml
attributes names for TDWG concepts, NIH concepts, publication lsids,
geoname classes according to a few dozen standards, person name
classes, etc. (representation here is meant not as definition, but as
method enabling software that does not fully understand all parts of
the object definition to present human users with a representation). I
therefore believe that using a general method **in addition** to what
you propose may be helpful:

  <Label xml:lang="en">Japanese Maple</Label>
  <Label xml:lang="de">Fächer-Ahorn</Label>
  <Label xml:lang="zxx-scitaxname" name_lsid="yadda.yadda.yadda">Acer

(xml:)lang is supported in RDF as well as xml or html. It is a very
general solution many system would fall back to in the absence of
understanding more detailed markup.

I therefore think that Rich is wrong in assuming the proposal is meant
to be a solution to all problems; it is only meant to solve some
problems in a way that is very widely usable rather than a special
purpose solution.

However, we have a more serious problem in our current implementation
of SDD identification keys (and you may call that a design problem of
SDD, but then SDD is the first truly multilingual TDWG xml standard I
know off). xml:lang tags in any xml processor will be inherited.
Originally SDD required language tags throughout, but since most
datasets are and will be monolingual, we changed this to allow a
single language point as well. Reality thus looks like:

<Dataset xml:lang="de">
  <Label xml:lang="en">Japanese Maple</Label>
  <Label name_lsid="yadda.yadda.yadda">Acer

that is, Acer palmatum is German (any xml parser will return "de" when
asked what the value of xml:lang on  <Label
name_lsid="yadda.yadda.yadda">Acer palmatum</Label> is).

A processor would thus show it as German (moderate problem when
dealing with grammar, spelling, prounouniation) but it would NEVER
show it when the user selects English content (Big Problem). This is
the background reason that brings me to propose this here. The thanks
are due to Edwin from the ETI to have brought this to the point that
we need to deal with it.

For the Key to Nature identification keys we have to find a solution
for this and I hoped we could find a general agreement. From the
responses so far, with only Andy and Thomas supporting the proposal
(thanks!) it seems we will have to go for a private solution
(zxx-x-scitax) instead. I would have liked to support Andy in
attempting to register either a tag or subtag, but this has no chance
of success without wider support from the community.


