[Taxacom] GenBank (was The economics of biodiversity database initiatives)

Adam Cotton adamcot at cscoms.com
Mon Oct 28 09:10:32 CDT 2013

----- Original Message ----- 
From: "Roderic Page" <r.page at bio.gla.ac.uk>
To: "Adam Cotton" <adamcot at cscoms.com>
Cc: <taxacom at mailman.nhm.ku.edu>
Sent: Monday, October 28, 2013 8:16 PM
Subject: Re: [SPAM?] Re: [Taxacom] GenBank (was The economics of 
biodiversity database initiatives)

Hi Adam,

Locality data in GenBank is variable, but increasingly sequences (specially 
barcodes) are appearing with GPS-derived coordinates. Some sequences are 
linked to voucher specimens (admittedly a lot fewer than would be 
desirable), in other cases you can go to the publication that made the 
sequences available and get the data there (again, less than desirable, it 
would be nice to automate this).

I disagree that

> There is absolutely no way to verify that a sequence belongs to taxon A
> rather than taxon B, other than the say-so of the researcher submitting 
> the
> sequence to GenBank.

One of the advantages of sequences is that we can build a tree and discover 
potentially misidentified sequences, indeed this is one of the quickest ways 
to discover potential problems. It won't always work, but it is more 
testable than a simple assertion that a sequence belongs to a given taxon.




Sorry for the lack of clarity. Of course it is possible to spot erroneous 
sequences by comparing them with verified ones, and erroneous sequences 
often stand out like sore thumbs in a tree, so I should have said:

"There is absolutely no way to verify that a sequence belongs to taxon A 
rather than taxon B from information on the GenBank webpage for that 

The point is there will always be errors, but providing useful information 
about the specimen from which the sequence originated would actually make 
the data much more useful to more people, and mean users would have to spend 
less time verifying the identity of the actual taxon that the sequence came 

Just as an example, assume a sequence is added with locality "Russia". That 
could be anywhere from Europe almost to Alaska, which is not very helpful. 
Of course including a GPS location is ideal, but even an approximate 
locality is more useful than just country.


PS. I have had private comunication with other researchers who agree that it 
would be much more useful if GenBank entries were inputted using end-taxon 
names for identification purposes. This also works the other way, for 
samples that could only be positively identified to genus or species group 
for whatever reason. 

More information about the Taxacom mailing list