[Taxacom] Wanted: a page on marking up

Weitzman, Anna WEITZMAN at si.edu
Tue May 15 08:32:24 CDT 2007

Dear Mary,
The way in which we mark up taxonomic work is being addressed by several groups at the moment, and is subject to a TDWG group's work.  Currently we do not have an agreed standard, but some generalities and possibilities are beginning to appear.  
Two key questions are: "'what do we want to put into XML?" and "how detailed do we need the atomisation of the content of the paper?"
We can take a broad brush approach, and identify the different components of a paper.  This is much the way that most schemas have developed, and is the overall approach of the taxonX.  This clearly is a baseline, upon which we might build further as needed.  
Within that, we might want to focus on characters, in order to be able to extract them and use them in conjunction with the species/ taxa / concepts to build, for example, keys, descriptions coming from several sources, or field guides, some examples of automating this have been done at University of Illinois with Bryan Heidorn, especially by Hong Cui, one of his former students.  
We might also wish to focus on the nomenclatural, taxonomic, citation and specimen side of the data.  This is the approach we have taken with taXMLit (http://www.sil.si.edu/digitalcollections/bca/status.cfm <https://webaccess.si.edu/exchweb/bin/redir.asp?URL=http://www.sil.si.edu/digitalcollections/bca/status.cfm>  ).  The implementation of this will include interoperability with TDWG standards for names and specimen data, and allow simultaneous access to literature elements and to specimen data, catalogues and other distributed resources.  Currently we have marked up a set of taxonomic publications, including a volume of the Biologia Centrali-Americana, and are developing the implementation, a preliminary version of which we plan to show at the TDWG meeting in Bratislava.
We are also working on instructions for using taXMLit to mark up taxonomic documents.  That said, the long-term solution is to develop tools (which we and others are working on) to parse these documents into such a format using computer capabilities, including simple logic but also artificial intelligence ('machine learning').  The latter has been used in other areas, including molecular bioinformatics and there are developments that we should be able to use to our advantage in taxonomy.
Anna & Chris
Anna L. Weitzman, PhD
Informatics, Botany and Biodiversity Research
National Museum of Natural History
Smithsonian Institution
weitzman at si.edu
Christopher H.C. Lyal, PhD
Beetle Diversity and Evolution Programme,
Department of Entomology,
The Natural History Museum,
Cromwell Road,
London SW7 5BD
tel: +44 (0) 207 942 5113
fax: +44 (0) 207 942 5661
e-mail c.lyal at nhm.ac.uk


From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Donat Agosti
Sent: Tue 15-May-07 8:30 AM
To: 'Mary Barkworth'; Taxacom at mailman.nhm.ku.edu
Cc: 'Terry Catapano'; 'Christiana Klingenberg'; 'Guido Sautter'
Subject: Re: [Taxacom] Wanted: a page on marking up

Dear Mary

We developed a mark-up schema for taxonomic work (taxonx
(http://taxonx.org <http://taxonx.org/> )), started to mark up literature (eg the ant literature
of Madagascar (http://antbase.org/databases/madagascar.htm) and plan to
build a dedicated treatment server so the descriptions can easily be
accessed. For the mark up process we developed (and are still in the process
of refining it) a semiautomatic program (goldenGate
http://idaho.ipd.uka.de/GoldenGATE/). You can download it and find there
also a manual.

Taxonx is a leight weight
(http://wiki.cs.umb.edu/twiki/bin/view/Ants/WebHome) schema which can be
integrated into publisher's schema, and which builds upon existing schemas
and standards.

If you have any questions, please contact us at any time

Good luck and welcome to the party


-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mary Barkworth
Sent: Tuesday, May 15, 2007 1:51 PM
To: Taxacom at mailman.nhm.ku.edu
Subject: [Taxacom] Wanted: a page on marking up

This arises indirectly from the EoL discussion but it is not that.
There have been many posts about marking things up so they can be found.
Is there an abc level page on how to do this? Written for someone so old
she can remember electric typewriters as being new? I have servers with
a fair amount of taxonomic content on the Web (see
http://herbarium.usu.edu/webmanual/ and http://utc.usu.edu/keys/ )  With
Marina Olanova's help I have a translation of Tsvelev's 2006 global
treatment of Glyceria almost ready to post.  It contains a discussion,
listing of species with publication information and some comments, and a
key, and a list of excluded species. It would be nice if this could be
found a year from now because it was marked up correctly.

I would be delighted to mark these up so it would be easier for people
to find them. I am working on putting key words in the headers, but
reading about the semantic Web and following these discussions, I have
the impression there are better ways to do this. So, those of you who
know - please - is there a Web page that explains how in words of one
syllable?  Or do I simply define my own mark up language at the top of
the document?

My suspicion is that there are others like myself who would make their
work more accessible if they knew how - hence my public decision of
admit ignorance. I have already seen some of the grass descriptions
appear, with minor changes (rounding of limits to the nearest 0.5 mm) on
other pages with a somewhat different format. The print publication that
they come from was cited, slightly inaccurately but it was a reasonable


Taxacom mailing list
Taxacom at mailman.nhm.ku.edu

Taxacom mailing list
Taxacom at mailman.nhm.ku.edu

More information about the Taxacom mailing list