[Taxacom] Language tags for scientific names

Donald.Hobern at csiro.au Donald.Hobern at csiro.au
Sat Jun 28 07:25:39 CDT 2008


Thanks, Gregor,

Some clarifications:

>> 1. What do we (as the interested community) really want to represent
by
>> such codes?  When we want to give additional information about a
>> particular scientific name (including e.g. code), why don't we just
use
>> proprietary tags (like the TDWG RDF TaxonName properties)?
>
> a) Because current technology of RDF does not seem to support markup
> of text in publications (xhtml, PDF, etc.) very well. This issue is
> not primarily about cases where the scientific name is an atomized
> piece of data.

Actually this seems precisely to be about when we want to indicate that
a string within a longer section of text is to be treated as an atomised
piece of data.

> b) RDF is limited to RDF-based applications like XMP, but cannot be
> extended to more general xml-schema based data exchange, including
> EML, SDD or TaxonX.
> 
> Please correct me if am wrong, I may not be up to date!

I actually simply intended that we should consider adopting the
vocabularies from TaxonName (or the equivalent TCS, if you prefer) and
find safe ways (i.e. ways that will not disrupt unsuspecting parsers) to
wrap scientific names in XHTML, PDF or whatever else.  As I have
explained before, I see the RDF vocabularies as the most appropriate way
for us currently to represent our entity-relationship modelling but that
we will reuse the terms and implied structures in other contexts (XHTML
micro-tags, XML Schema structures, etc.).

>> 2. Is this compatible with the intended use of the language tag?  If
>> something is a proper name with no translation into different
languages,
>> what would ISO expect?  Is the language code in any way appropriate
in
>> such a case?
>
> zxx is a standard ISO and IETF code for this case, nothing proposed to
> introduce here.
> 
> zxx is especially important for binary media objects: many images are
> zxx, but some may be language specific.
>
> About processing: since xml:lang is part of the xml specification, any
> xml processor is known how to behave on a low level.

This is what I want to be sure about.  If we are doing this, we should
be sure that the most popular processors and (more importantly)
applications using xml:lang do indeed do what we expect.  If an
application user has indicated that their language is French and the
application accordingly displays only French text and perhaps English as
the default when French is not available, will it display text marked
"zxx" or will it treat this the same way it might treat Chinese or
Arabic text?  The software probably does work as we expect, but this is
the kind of real world example that xml:lang is intended to make
possible and we should make sure we understand it correctly.

>> I believe it is an abuse of the language code to use it to identify
the
>> nomenclatural code, and something which may alienate non-taxonomic
>> users.  A single code, (null, "zxx", "tax" or something) would
encourage
>> consistent interpretation better, but I still think we are
>> misunderstanding the point of the language tag.  Surely its primary
role
>> is in situations in which there may be different versions of some
>> content in different languages and software may choose the most
>> appropriate for a given user.  This is not the case here and I
suspect
>> the ISO recommendation would be to use "zxx" or nothing at all.
>
> I have no problem if a consensus emerges that the nomenclatural codes
> should not be considered here.
>
> I disagree with your last point. Language tags are important for many
> purposes, including presentation or screen or for readers, for spell
> checking of documents with mixed language use (most people have to use
> several languages in parallel rather than choosing one). When using
> scientific organisms names in Chinese, arab, or hebrew texts,
> appropriate language tags would inform software on script, reading
> direction, etc.

Surely it is the use of different Unicode characters which should
control which script, reading direction, etc. is used, not the xml:lang
tag?  

Best wishes,

Donald

Donald Hobern, Director, Atlas of Living Australia
CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 
Email: Donald.Hobern at csiro.au





More information about the Taxacom mailing list