[Taxacom] Consider using the draft "species" microformat...

Shorthouse, David dps1 at ualberta.ca
Sat Nov 3 17:19:45 CDT 2007

> The microformat aims to make taxonomic names within the content of
> published web pages discoverable to parsing tools.
Rather than convincing a developer or web page author to mark-up HTML that
differentiates taxonomic names from the sea of other text in the hope that
there might someday be a parser, I'd much rather see an organization like
uBio first index all names and expose these as LSIDs. With names parsed and
exposed as LSIDs, the door is open to cross-domain querying and aggregation.
uBio's FindIT already does parse published web pages and OCR'd PDFs for
taxonomic names and does so quite well without the need for additional
mark-up (e.g. http://www.biodiversitylibrary.org/). Microformats have uses
elsewhere for previously unstructured content and I think it probably would
be quite useful for common names on web pages. But, we already have some
structure with taxonomic names, an ever-growing index of all names, and a
parsing algorithm in production (parsing PDFs is more important at this
stage), so let's use these to our advantage.

Microformats would be so much more attractive if we could point users to
existing parsers. It would also help to give providers a clear reason for
expending effort marking up their content. At this stage of microformat
development, a content provider doesn't get anything in return for keeping
up with a standard in flux. A possible solution to this can be seen here:
http://ispiders.blogspot.com/2007/08/json-is-kewl.html (& my comment

David P. Shorthouse
Department of Biological Sciences
CW-403, Biological Sciences Centre
University of Alberta
Edmonton, AB   T6G 2E9
mailto:dps1 at ualberta.ca

More information about the Taxacom mailing list