[Taxacom] inapplicability of mtDNA barcoding to insects

Amanda Roe roexx068 at umn.edu
Mon Oct 1 09:56:10 CDT 2007

As an extension to  Karl's concerns, I would like to point out that  
in the barcoding literature, an explicit discussion of why the first  
600 bps of COI were used rather than any of the other regions in the  
1500bp gene did not occur.  I recently examined this question in a  
paper that was published in Molecular Phylogenetics and Evolution  
(Roe and Sperling 2007 Patterns of evolution of mitochondrial  
cytochrome c oxidase i and II DNA and implications for DNA barcoding  
44: 325-345).  I examined the choice of the barcode region relative  
to similar sized fragments in COI and COII, as well as smaller and  
larger fragments.

The take home message was that variation within the gene appears to  
be concentrated into peaks of variability, although these peaks were  
not considered different than random variation. This suggests that  
longer fragments are going to be much better than shorter ones at  
capturing this variation.  Restricting sequencing to 100 bps, or even  
600 bps is likely going to miss regions of variation, and therefore  
misrepresent the species boundaries, particularly in closely related,  
recently diverged taxa.  It is important to also note that our  
current sequencing capabilities allow us to obtain between 800-900 bp  
of data in a single pass, so restricting to 600 bps seems very  

As always, more data is always better than less.


Amanda Roe
Postdoctoral Researcher
1980 Folwell Ave,
Rm 219 Hodson Hall
Dept. Entomology
University of Minnesota
St Paul MN 55108
(office) 612-624-1710
roexx068 at umn.edu

On Sep 30, 2007, at 12:17 AM, Karl Magnacca wrote:

> On Fri, September 28, 2007 10:42 am, Doug Yanega wrote:
>>> Why imply through nay-saying that barcoding should be
>>> dismissed outright because it is not 100% accurate 100% of the time?
>> Because if we rely exclusively on COI to make species IDs, and are
>> sinking huge amounts of funding into it (whether or not it is to the
>> exclusion of morphological taxonomy), then ~70% accuracy is certainly
>> not much of a selling point for that MASSIVE investment.
> That depends on how the 70% is distributed.  If it works on 70% of
> families, including some where identification is currently  
> difficult, then
> that's useful, since you can define where it will work and where it  
> won't.
>  If it works on an average of 70% of the species in any given  
> family or
> genus, then that's not very useful.
>> The failure of COI barcoding *is* a problem if COI is being treated
>> as the only viable source of taxonomic data, and - if what Karl says
>> is true - there are apparently a number of barcoders who are NOT
>> using COI as an adjunct to traditional taxonomy, but as a replacement
>> for it (or treating taxonomy as an adjunct to COI).
> My point was more that there are people who have jumped on the  
> bandwagon
> and gotten excited about finding things that *might* be cryptic  
> species,
> without really doing the grunt work biology to determine if they  
> really
> *are* - in the process, overlooking alternative explanations as to  
> why the
> morphology and genes conflict.
> There are also places where having a greater knowledge of other  
> methods
> would vastly improve the usefulness of even the current barcoding  
> methods.
>  For example, there were several talks at the conference where  
> people were
> only able to obtain short (100-200 bp) sequences from degraded  
> specimens.
> Unsurprisingly, these often failed to reliably separate species  
> using the
> standard neighbor-joining method.  Yet I don't recall any of them  
> trying
> parsimony or other phylogenetic algorithms, which would give much  
> better
> results when fewer variable sites are available or sequences only
> partially overlap.  As with using COI as opposed to other sites (see
> below), NJ is useful because it's fast and the phylogeny is  
> unimportant,
> but that doesn't mean you stick with it when you run up against its
> weaknesses.
>> At least several
>> MAJOR papers on the topic I have read concluded that the
>> discrepancies between what the taxonomists called species versus what
>> COI grouped as species meant that the taxonomy was wrong (i.e., there
>> were cryptic species the morphological taxonomists had completely
>> overlooked).
> Again, this is part of what I see as the problem of overpublicity  
> causing
> problems down the road.  Finding populations with widely divergent
> sequences indicates places where there might be cryptic species,  
> but you
> need to investigate further to see what else correlates with it  
> before you
> can make the claim formally (AFAIK, no one has actually published new
> species based solely on barcodes).  Incidentally, the existence of  
> cryptic
> species illustrates an area of failure for traditional taxonomy  
> (often in
> the assumption that minute morphological differences represent
> intraspecies variation), but certainly nobody's giving up on that yet.
>> It leaves us with a dilemma as far as any debate here: if everyone
>> here claims to be a "good" barcoder, for whom COI is just one of an
>> array of character sets used to determine species, but if those folks
>> acknowledge that there are "bad" barcoders that believe in using (or
>> trusting) COI alone, then we may actually be condemning a common
>> enemy, and should be acting as allies rather than adversaries.
>> So, let's sit back one second, take a breather, and ask two  
>> questions:
>> (1) Can we agree upon a definition of "barcoding"? As far as I knew,
>> it meant using COI, and ONLY COI in order to make species  
>> assignments.
> It means doing that *once you have an established database of  
> sequences
> where it's known to work*.  So far that exists for a relatively small
> number of taxa.  If it doesn't work, then other data sets need to be
> investigated, including other genes.  But (despite my concerns in the
> previous message) I think it is a good idea to have a consistent  
> default
> data set that can be used as widely as any one data set can.
>> (2) Can we agree that there ARE researchers who use COI only, or give
>> COI results *greater weight* than taxonomy?
> Yes, but those people seems to be mainly making bold pronouncements  
> about
> the existence of thousands of cryptic species rather than actually
> describing species based on COI.  Perhaps I'm being too trusting in  
> the
> scientific process, but I think that as it evolves and we  
> understand more
> about how well and where it works and fails, those people will be  
> weeded
> out.  As Rich Pyle pointed out, even some of the people who made those
> early bold pronouncements have realized that it's only a piece in the
> puzzle.
>> If so, can we then treat the debate here as a matter of whether THAT
>> approach is worthy of support, or should be put aside in favor of
>> genuinely *integrated* studies?
> This seems to be a somewhat simplistic way of framing it.  I think  
> it's
> worth investigating whether COI works, but that absolutely doesn't  
> mean we
> should close ourselves off to other alternatives.  And part of that is
> integrating morphology and multiple DNA data sets to make sure we've
> delineated species boundaries correctly, at least to the extent  
> that it's
> possible (that of course get into another whole issue).
> Karl
> =====================
> Karl Magnacca, UC-Berkeley
> ESPM Dept., 137 Mulford Hall #3114
> 510-642-4148
> http://nature.berkeley.edu/~magnacca
> _______________________________________________
> Taxacom mailing list
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

More information about the Taxacom mailing list