[Taxacom] Wanted: a page on marking up
WEITZMAN at si.edu
Tue May 15 09:12:01 CDT 2007
The versions of taxonX I have seen have indeed identified those elements, but does not allow for further breakdown into indexed components that are fully compatible with other TDWG standards or for the level of atomization that creates a fully useful database of literature & specimen citations.
Anna L. Weitzman, PhD
Informatics, Botany and Biodiversity Research
National Museum of Natural History
weitzman at si.edu
From: Terry Catapano [mailto:thc4ster at gmail.com]
Sent: Tue 15-May-07 10:04 AM
To: Weitzman, Anna
Cc: Donat Agosti; Mary Barkworth; Taxacom at mailman.nhm.ku.edu; Christiana Klingenberg; Guido Sautter; Hong Cui; P. Bryan Heidorn
Subject: Re: [Taxacom] Wanted: a page on marking up
Taxonx does in fact address characters, and the "nomenclatural,
taxonomic, citation and specimen side of the data". It can be used
both in a "broad brush" way to provide a suitable "baseline", but it
can as well be used to encode information at a finer degree of
On 5/15/07, Weitzman, Anna <WEITZMAN at si.edu> wrote:
> Dear Mary,
> The way in which we mark up taxonomic work is being addressed by several groups at the moment, and is subject to a TDWG group's work. Currently we do not have an agreed standard, but some generalities and possibilities are beginning to appear.
> Two key questions are: "'what do we want to put into XML?" and "how detailed do we need the atomisation of the content of the paper?"
> We can take a broad brush approach, and identify the different components of a paper. This is much the way that most schemas have developed, and is the overall approach of the taxonX. This clearly is a baseline, upon which we might build further as needed.
> Within that, we might want to focus on characters, in order to be able to extract them and use them in conjunction with the species/ taxa / concepts to build, for example, keys, descriptions coming from several sources, or field guides, some examples of automating this have been done at University of Illinois with Bryan Heidorn, especially by Hong Cui, one of his former students.
> We might also wish to focus on the nomenclatural, taxonomic, citation and specimen side of the data. This is the approach we have taken with taXMLit (http://www.sil.si.edu/digitalcollections/bca/status.cfm <https://webaccess.si.edu/exchweb/bin/redir.asp?URL=http://www.sil.si.edu/digitalcollections/bca/status.cfm> ). The implementation of this will include interoperability with TDWG standards for names and specimen data, and allow simultaneous access to literature elements and to specimen data, catalogues and other distributed resources. Currently we have marked up a set of taxonomic publications, including a volume of the Biologia Centrali-Americana, and are developing the implementation, a preliminary version of which we plan to show at the TDWG meeting in Bratislava.
> We are also working on instructions for using taXMLit to mark up taxonomic documents. That said, the long-term solution is to develop tools (which we and others are working on) to parse these documents into such a format using computer capabilities, including simple logic but also artificial intelligence ('machine learning'). The latter has been used in other areas, including molecular bioinformatics and there are developments that we should be able to use to our advantage in taxonomy.
> Anna & Chris
> Anna L. Weitzman, PhD
> Informatics, Botany and Biodiversity Research
> National Museum of Natural History
> Smithsonian Institution
> weitzman at si.edu
> Christopher H.C. Lyal, PhD
> Beetle Diversity and Evolution Programme,
> Department of Entomology,
> The Natural History Museum,
> Cromwell Road,
> London SW7 5BD
> tel: +44 (0) 207 942 5113
> fax: +44 (0) 207 942 5661
> e-mail c.lyal at nhm.ac.uk
> From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Donat Agosti
> Sent: Tue 15-May-07 8:30 AM
> To: 'Mary Barkworth'; Taxacom at mailman.nhm.ku.edu
> Cc: 'Terry Catapano'; 'Christiana Klingenberg'; 'Guido Sautter'
> Subject: Re: [Taxacom] Wanted: a page on marking up
> Dear Mary
> We developed a mark-up schema for taxonomic work (taxonx
> (http://taxonx.org <http://taxonx.org/> <http://taxonx.org/> )), started to mark up literature (eg the ant literature
> of Madagascar (http://antbase.org/databases/madagascar.htm) and plan to
> build a dedicated treatment server so the descriptions can easily be
> accessed. For the mark up process we developed (and are still in the process
> of refining it) a semiautomatic program (goldenGate
> http://idaho.ipd.uka.de/GoldenGATE/). You can download it and find there
> also a manual.
> Taxonx is a leight weight
> (http://wiki.cs.umb.edu/twiki/bin/view/Ants/WebHome) schema which can be
> integrated into publisher's schema, and which builds upon existing schemas
> and standards.
> If you have any questions, please contact us at any time
> Good luck and welcome to the party
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mary Barkworth
> Sent: Tuesday, May 15, 2007 1:51 PM
> To: Taxacom at mailman.nhm.ku.edu
> Subject: [Taxacom] Wanted: a page on marking up
> This arises indirectly from the EoL discussion but it is not that.
> There have been many posts about marking things up so they can be found.
> Is there an abc level page on how to do this? Written for someone so old
> she can remember electric typewriters as being new? I have servers with
> a fair amount of taxonomic content on the Web (see
> http://herbarium.usu.edu/webmanual/ and http://utc.usu.edu/keys/ ) With
> Marina Olanova's help I have a translation of Tsvelev's 2006 global
> treatment of Glyceria almost ready to post. It contains a discussion,
> listing of species with publication information and some comments, and a
> key, and a list of excluded species. It would be nice if this could be
> found a year from now because it was marked up correctly.
> I would be delighted to mark these up so it would be easier for people
> to find them. I am working on putting key words in the headers, but
> reading about the semantic Web and following these discussions, I have
> the impression there are better ways to do this. So, those of you who
> know - please - is there a Web page that explains how in words of one
> syllable? Or do I simply define my own mark up language at the top of
> the document?
> My suspicion is that there are others like myself who would make their
> work more accessible if they knew how - hence my public decision of
> admit ignorance. I have already seen some of the grass descriptions
> appear, with minor changes (rounding of limits to the nearest 0.5 mm) on
> other pages with a somewhat different format. The print publication that
> they come from was cited, slightly inaccurately but it was a reasonable
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
More information about the Taxacom