[Taxacom] Consider using the draft "species" microformat...

Andy Mabbett andy at pigsonthewing.org.uk
Sat Nov 3 19:19:47 CDT 2007

In message <001801c81e67$a3ada4b0$eb08ee10$@ca>, "Shorthouse, David"
<dps1 at ualberta.ca> writes

>> The microformat aims to make taxonomic names within the content of
>> published web pages discoverable to parsing tools.
>Rather than convincing a developer or web page author to mark-up HTML that
>differentiates taxonomic names from the sea of other text in the hope that
>there might someday be a parser,

You appear to be under a misapprehension. There is already a parser.

Furthermore, by the end of the year, that parsing ability should be
built into the second most popular browser on the Internet, allowing
people to more simply write their own parser, in the same way that the
tens of thousands of extensions to Firefox currently available have been

> I'd much rather see an organization like
>uBio first index all names and expose these as LSIDs. With names parsed and
>exposed as LSIDs, the door is open to cross-domain querying and aggregation.

I don't doubt that that's a useful thing to happen; but I don't see it
as a binary choice. Nor will it resolve many of the issues addressed by
the species microformat.

>uBio's FindIT already does parse published web pages and OCR'd PDFs for
>taxonomic names and does so quite well without the need for additional
>mark-up (e.g. http://www.biodiversitylibrary.org/).

And then what does it do with them? Does it address the use-cases
outlined for the species microformat? Does it cater for vernacular
names? What happens when that service hits a false positive, such as
"Baracus" in:

        B. A. Baracus, a character on the television series The A-Team

        <http://en.wikipedia.org/wiki/Ba> ?

The 'species' microformat directly addresses and can prevent such

Unless it has changed since we last discussed it, FindIT does not act
upon the page currently viewed in the browser (which is what happens
with the microformat), but requires the page's URL to be manually
submitted to it. Nor does it allow the user to select a service or
services to which the parsed name is submitted.

> Microformats have uses
>elsewhere for previously unstructured content and I think it probably would
>be quite useful for common names on web pages. But, we already have some
>structure with taxonomic names, an ever-growing index of all names, and a
>parsing algorithm in production (parsing PDFs is more important at this
>stage), so let's use these to our advantage.

Again, why is this a binary choice?

>Microformats would be so much more attractive if we could point users to
>existing parsers.

I already did that, just today.

> It would also help to give providers a clear reason for
>expending effort marking up their content. At this stage of microformat
>development, a content provider doesn't get anything in return for keeping
>up with a standard in flux.

They get to contribute to the development and solidification of an
emerging standard. I realise that not everyone may wish, or be able, to
devote resources to this, but the invitation is made; and I've made it
without suggesting - much less on false premise - that people turn away
from any other initiative.

Your post seems to be a rehash of many issues I though we'd resolved
nearly a year ago:




Andy Mabbett

            *  Are you using Microformats, yet: <http://microformats.org/> ?

More information about the Taxacom mailing list