[Taxacom] FW: formation of zoological names with Mc, Mac, et
deepreef at bishopmuseum.org
Sat Sep 5 23:10:25 CDT 2009
Jim: The term "nameString" has evolved a bit over time. In the context of
LinneanCore (which was wrapped into TCS), it was intended to specifically
exclude authorship stuff
More recently, in the context of discussions about GNA/GNI, it has come to
include the full string including name bits and authorship bits.
In either case, the reason for qualifying the word "name" with "string" was
to emphasize that it's a taxon name represented by nothing more than a text
string (in contrast to a taxon name represented as a data object unto
itself, with rich metadata). There are many, many, many data records that
represent taxon names as nothing more than a string of text characters. The
prime example would be uBio/NameBank; but also text strings harvested from
literature scanning/OCRing efforts such as BHL. In many ways, the DwC names
data would fall into this category.
To answer your question, I would say they exist in the sense that we need to
deal with them because of your option "a" (that is, many datasets use them
because they only have a single unparsed field for taxon name, and/or even a
parsed taxon name with no other qualifying nomenclatural metadata other than
the name[+authorship] itself. They also exist as your "b" inside the GNI,
as a way of building bridges between datasets with fully parsed name
"objects" (e.g., from nomenclators) to those namestring-only datasets (via
"wizardry", aka clever parsers and text-matching services). So I guess the
answer is "c".
As for which elements should be concatenated, I think that's still an open
question. Certainly Uninomial/binomial/trinomial/etc. + author (botanical
style or zoological style) + year. Not sure about page numbers and "sec"
authorship (I tossed all my "sec"-formatted strings into GNI, just to see
what would happen).
There's some good information on the GNI website (www.globalnames.org --
check the links to "Help" and "API" in the upper right), and there is more
information coming soon about GNA/GNI/GNUB.
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Jim Croft
> Sent: Saturday, September 05, 2009 7:19 AM
> To: David Remsen (GBIF)
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] FW: formation of zoological names with
> Mc, Mac, et
> A quick point of clarification... These days many (most?)
> databases enable/require atomized entry of name data and name
> Is the design intent of the 'namestring' thingy to: a) act as
> a placeholder for an unparsed string until someone can get
> around to doing something wioth it; b) act as a repository
> for a concatenation of atomized elements in anticipation of
> some mysterious operational wizardry; or c) both?
> and, if b), what elements could or should be concatenated?
> just wondering is all....
> On Thu, Sep 3, 2009 at 7:07 PM, David Remsen
> (GBIF)<dremsen at gbif.org> wrote:
> > Just a couple of points regarding this thread.
> > We use the term 'namestring' when referring to the literal
> > of a name as it has been used in a particular instance because the
> > same name generally has many distinct orthographies. A
> name may or
> > may not include authorship. Authorship may or may not include a
> > publication year. Authors are abbreviated, etc. Processing
> > namestrings is necessary in this digital age and the
> reality of this
> > wide latitude in orthography presents special difficulties in
> > effectively grouping the right sets of namestrings together and
> > excluding the wrong ones. This is less an issue when working with
> > single datasets but becomes significant when integrating data from
> > many sources. It's amazing how many different ways a name can be
> > written and all are essentially correct, just more or less compete.
> > Regarding name atomisation, we are quite close to being able to
> > effectively atomise 99.99% of all namestrings into distinct and
> > identified components so that the use of a parsed or unparsed
> > scientific name in data management practices will be less an issue
> > than it might be now. The development of name parsing tools and
> > services is quite active at the moment and is currently testing
> > against rather esoteric orthographies. In fact, we are
> interested in
> > finding cases that either break the parsers or are simply too
> > ambiguous to effectively parse.
> > A reference implementation can be tried at
> > http://globalnames.org/parsers/new
> > Cheers,
> > David Remsen
> > ------ David Remsen, Senior Programme Officer Electronic Catalog of
> > Names of Known Organisms Global Biodiversity Information Facility
> > Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark
> > Tel: +45-35321472 Fax: +45-35321480
> > Mobile +45 27201472
> > Skype: dremsen
> > ------
> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
> ... in pursuit of the meaning of leaf ...
> ... 'All is leaf' ('Alles ist Blatt') - Goethe
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> The Taxacom archive going back to 1992 may be searched with
> either of these methods:
> (1) http://taxacom.markmail.org
> Or (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
More information about the Taxacom