dipteryx at freeler.nl
dipteryx at freeler.nl
Thu Sep 17 03:28:04 CDT 2009
Van: David Remsen (GBIF) [mailto:dremsen at gbif.org]
Verzonden: wo 16-9-2009 21:52
> You might try [...]
FWIW I would like to point out that the example at the top has
the basionymAuthor and the combinationAuthor interchanged;
the same example in the code below that is correct in that respect
(although iffy in a different respect).
* * *
> You said there are likely 2-3 million names, at most. In what
> sense of the word "name" since I get different answers from
> botanists than zoologists as to what they mean and it affects
> the cardinality of the estimate.
I am using the word scientific name in what looks to me as the
accepted way (as indicated at ttp://www.globalnames.org/about);
that is, taking "biological" in a fairly strict sense,
(excluding many formalized ways to indicate organisms), this
looks to me to exclude names of viruses (which follow a different
In the ICBN the definition is in Art. 6.3.:
"In this Code, unless otherwise indicated, the word "name" means
a name that has been validly published, whether it is legitimate
or illegitimate (see Art. 12)." and in Art. 12.1:
"12.1. A name of a taxon has no status under this Code unless it
is validly published [...]."
Obviously, this presents a problem to such projects as GNI in
that strings like 'Faba faba' are not validly published, nor are
many 'manuscript names' scattered through the literature, although
by form they are indistinguishable from actual scientific names.
Leaving this aside, authorship is emphatically not part of the
scientific name. This is of more importance for botanical names
than for zoological names as a zoological name has only one kind
of author (for a zoological name variations in author
representation will stop at something between half a dozen and
a dozen?), while the authorship in a botanical name can include
up to five kinds of "authors" (authors in the sense of the ICBN).
If one goes by the recommended form of at most two authors per
kind-of-author that leads to a maximum of ten authors per name.
Most author-names have two fairly commonly used forms (some have
more), which means that without anything out of the ordinary quite
a few different representations of authorship will be possible.
This is per author attribution, as with new research or a change
in the Rules the attributed authorship may well change (with the
publication of the 2006 Vienna Code a number of family names
instantly became attributed to different authors). All in all,
there are many possible ways to represent the authorship, for
one particular scientific name, without any change to the
scientific name itself, or what it applies to.
If more of the literature were to be scanned and processed the
number of 18 million text strings could be expanded enormously,
without this adding one itty bit of information, or adding one
single scientific name.
* * *
> As to the interface and ordering of the GNI index I am not
> sure how they should be organised and I suppose it's based on
> what the index is for. I want to use it to access links to things
> that I want dynamically updated.
To me it looks that the interface should be by scientific name
(which after all is what it claims to be indexing). The entries
will need to be 'disambiguated' anyway (for homonyms), no matter
* * *
> I want the orthographic matching to ensure that a link is made
> regardless of the specific orthography.
That would be nice, but likely will take quite a bit of further
I guess the parser algorithms work as well could be expected
(the horrendous output of the "genus epitheton" aside), and that
many of the silly errors are in the underlying databases (see the
amusing Bos taurus Skotsk hoejlandskvaeg caused by
More information about the Taxacom