[Taxacom] GenBank (was The economics of biodiversity database initiatives)

Martin Wiemers martin.wiemers at univie.ac.at
Mon Oct 28 09:33:55 CDT 2013

Hi Adam,

the "country" field in GenBank can actually also be used to provide more 
accurate information:

 From the GenBank Submissions Handbook:

*The /country modifier can also include province, state, region, oceans, 
or other locality names*. The name of the country (or ocean) must be 
provided first, followed by a colon (:) before the additional location 

For example:
/country=USA: Lancaster County, PA
/country=Canada: SW coast of Newfoundland
/country=USA: Syracuse State Park in upstate New York
/country=Atlantic Ocean: 24.5 miles east of Bermuda
/country=Pacific Ocean: Stubing Marine Station

However, I agree that this is not perfect solution.

GenBank now also includes a field for geographical coordinates (lat_lon) 
in the format "39.7 N 42.1 W".

Although GenBank does not store voucher images, these can be stored to 
linked databases such as MorphBank via "LinkOut to external resources", 
see e.g. http://www.ncbi.nlm.nih.gov/nuccore/AY556892.1

Best wishes,

Am 28.10.2013 15:10, schrieb Adam Cotton:
> ----- Original Message -----
> From: "Roderic Page" <r.page at bio.gla.ac.uk>
> To: "Adam Cotton" <adamcot at cscoms.com>
> Cc: <taxacom at mailman.nhm.ku.edu>
> Sent: Monday, October 28, 2013 8:16 PM
> Subject: Re: [SPAM?] Re: [Taxacom] GenBank (was The economics of
> biodiversity database initiatives)
> Hi Adam,
> Locality data in GenBank is variable, but increasingly sequences (specially
> barcodes) are appearing with GPS-derived coordinates. Some sequences are
> linked to voucher specimens (admittedly a lot fewer than would be
> desirable), in other cases you can go to the publication that made the
> sequences available and get the data there (again, less than desirable, it
> would be nice to automate this).
> I disagree that
>> There is absolutely no way to verify that a sequence belongs to taxon A
>> rather than taxon B, other than the say-so of the researcher submitting
>> the
>> sequence to GenBank.
> One of the advantages of sequences is that we can build a tree and discover
> potentially misidentified sequences, indeed this is one of the quickest ways
> to discover potential problems. It won't always work, but it is more
> testable than a simple assertion that a sequence belongs to a given taxon.
> Regards
> Rod
> Rod,
> Sorry for the lack of clarity. Of course it is possible to spot erroneous
> sequences by comparing them with verified ones, and erroneous sequences
> often stand out like sore thumbs in a tree, so I should have said:
> "There is absolutely no way to verify that a sequence belongs to taxon A
> rather than taxon B from information on the GenBank webpage for that
> sequence."
> The point is there will always be errors, but providing useful information
> about the specimen from which the sequence originated would actually make
> the data much more useful to more people, and mean users would have to spend
> less time verifying the identity of the actual taxon that the sequence came
> from.
> Just as an example, assume a sequence is added with locality "Russia". That
> could be anywhere from Europe almost to Alaska, which is not very helpful.
> Of course including a GPS location is ideal, but even an approximate
> locality is more useful than just country.
> Adam.
> PS. I have had private comunication with other researchers who agree that it
> would be much more useful if GenBank entries were inputted using end-taxon
> names for identification purposes. This also works the other way, for
> samples that could only be positively identified to genus or species group
> for whatever reason.
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched with either of these methods:
> (1) by visiting http://taxacom.markmail.org
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> Celebrating 26 years of Taxacom in 2013.

Dr. Martin Wiemers
Department of Community Ecology
Helmholtz Centre for Environmental Research - UFZ
Theodor-Lieser-Str. 4
06120 Halle
Tel. +49 345 558-5322
e-mail: martin.wiemers at ufz.de

Wielandstr. 8
06114 Halle
Tel. +49 345 27950187
Mobile +49 157 85401271
Fax +49 3212 6968883
e-mail: martin at wiemers1.de

More information about the Taxacom mailing list