[Taxacom] Biodiv Informatics Challenges
faunaplan at aol.com
faunaplan at aol.com
Tue Feb 17 04:31:35 CST 2009
when following the interesting TAXACOM thread on 'species pages' I was waiting that someone would point to the enormous current problems in retrieving relevant information on the internet. I didn't see such postings so here are some of my thoughts (&dreams - I'm not IT-savvy enough, if at all...), on that issue:
Most descriptions are published along with nomenclatural content which must be printed on paper since the internet is not (yet) accepted as a Code-compliant publication media. Not sure it will be accepted in the near future, but even if the change comes, the printed information will continue to play a major role. Therefore, scanning, digitization and markup of literature seems to be what is needed in the very first place. We have tools for extracting at least some taxonomic names from PDFs, but are we really able to work with that and combine it with info from other web resources in an "industrialized" manner?
Probably it is GBIF that could play a pioneer role in developing new strategies simply because it's in the nature of the project that is has to cope with more unresolved misspellings, synonyms, homonyms, in litteris names, etc. than any other major web project (just take a look into GBIF's "kingdom unknown" basket!). I believe we urgently do need what David Remsen has alluded to - an Electronic Catalogue of Names that can pick out at least available names from the salad.
As for animal names, a first step could be an official ZooBank list of all principally available nomina,=2
0knowing that these are not simply extractable from printed and electronic resources (e.g., the available nomen <Cicindela trisignata> was published by the name string "C.Tri-Signata"; the printed string "N. Kratteri valonensis" is available as <Nebria valonensis> even if it has never been used in that form; etc.).
Then, what can we do with available nomina strings? Are they not useful for assigning unique identifiers to ascertained names?
For example, how about adding more information to LSIDs? In my imagination, a vetted (ascertained) name could be recognized as such by an human- & computer-readable LSID-substring carrying information on its ascertained nomenclatural basis (and, in case of misspellings etc., pointing to its correct name basis). The benefits would be, in my mind, that vetted/ unvetted names could be distinguished easily and involvement of taxon experts would be facilitated (& attracted) very much better than with those *cryptic* LSIDs that are in use so far. Once a name is ascertained, you can see it and the computer can know it by the composition of it's LSID; unvetted names would stay with computer-readable numbers only.
could an LSID for an ascertained taxonomic name in a GBIF occurrence record be something like:
... building upon a basic LSID nomen substring issued by ZooBank:
The human- & computer(?)-readable substrings in that imaginary example would be:
"Z" = animal na
me governed by ICZN
"S" = name of species-group ("G" for genus-, "F" for family-group)
"Carabus depressus" original generic combination (linking to type material)
"=Licinus depressus" subsequent generic combination (up to here all metadata incl. author, date, name history, etc. would be resolvable via ZooBank)
"19020147" unique GBIF ID for a name associated with a dataset (where the name could be a misspelling that should not be deleted from the record if it is a verbatim citation e.g. from a specimen label).
Not thinkable? I know the devil is always in the details..., but where would IT savvy taxacomers say 'principally impossible'?
In my mind, one of THE challenges in biodiversity informatics - is how to distinguish unfinished (manuscript) and finished content (reviewed reliable publication). As for the issue of taxonomic names, maybe a difference in the LSID could be part of the solution?
Wolfgang Lorenz, Tutzing, Germany
AOL eMail auf Ihrem Handy! Ab sofort können Sie auch unterwegs Ihre AOL email abrufen. Registrieren Sie sich jetzt kostenlos.
More information about the Taxacom