[Taxacom] Molecular issues

John Grehan jgrehan at sciencebuff.org
Fri Nov 7 06:41:09 CST 2008

Alan meant to send the following to the list. I have inserted my
comments in bold. John Grehan

-----Original Message-----
From: Alan Forrest [mailto:aforrest at rjb.csic.es] 
Sent: Thursday, November 06, 2008 3:36 PM
To: John Grehan
Subject: Re: [Taxacom] molecular nonsense?


Thanks for your comments also. I am glad that there is some common
ground here - and I speak as someone who tries to use all forms of data
appropriate for answering specific questions. My comments were intended
in a general sense, as I have not read the papers on the human relatives

Regarding 'good quality' molecular data, there are clearly quality
control issues. Rigorous protocols are a start, but I am sure they are
not always adhered to: many sequences on GenBank are not what they
appear to be and there is no way to check them directly (which is why I
always try to include additional seqs to check against GenBank!). With
sequence data the scoring is quite objective - more so than with other
types of molecular markers - but of course the analyses are something
else. I find that people often use the analysis that tells the story
they want to tell, and I am aware of papers that tell one story when an
alternate analysis of the same data gives a different picture. When all
is said and done, peer review can only work on the data presented - and
often there is a lot more available that is never assessed.

I see these problems as complications that compound the problem. If
there were not problems with the molecular data itself in terms of their
phylogenetic meaning (e.g. with respect to their cladistic or
non-cladistic qualities) one could argue that as long as one used
original data the result would be unproblematic. I am arguing that the
molecular sequence approach is fundamentally flawed by failing to meet
cladistic criteria. This is not to say that there may not be aspects of
molecular structure that are cladistically meaningful. 

Perhaps a little worrying is that, as a young researcher, cladistics vs
phenetics is something we are taught about but didn't really experience
directly - and now everything is Bayesian. There is perhaps a lack of
awareness of the full range of analyses available and how appropriate
they are.

I think its not so much the range of analyses as having a
'clear'understanding of what is cladistic and what is not.

Regarding our work, we included seqs of over 80 taxa in 29 genera in the
Antirrhineae, and around 15 taxa outside the tribe. As well as
uncovering an unexpected relationship, we also detected mis-labelled
GenBank sequences supporting the traditonal relationships, as well as
re-sequencing and cloning to verify GenBank accessions. The Gambelia
Galvezia relationship was supported by a chloroplast gene published in
2000, but we re-sequenced that gene and another one and found something
very different. This also highlights the farcity of single gene or
single data analyses.

Which begs the quesiton of how many genes one must analyze to be
confident of the result - if one had confidence in molecular similarity
as a direct phylogenetic measure in the first place. The problem in
human origins analysis is that people have studied quite a few genes and
because they get the same answer they think it must be right, even
though it clashed with robust morphological evidence.

The inheritance question is more straightforward to apply to
historically more recent analyses (for example in hybrid zones etc,
where complexgenetic control of morphological traits can generate novel
forms that would simply be additive in comparative molecular studies).
At a deeper scale things can be a little more woolly in my opinion -
maybe as woolly as an orang.

Perhaps one lesson molecular systematists have not learned from their
morphological predecessors is that extensive sampling is always a good
thing. I hope that, having proceeded from a science based on morphology
to one where molecular studies, as you say, are quick and easy (lab
work, then press buttons on a software program) it is time to use all
data with no a priori assumptions. This has held back barcoding work
(which I consider to be a technical exercise which could have great
value) because people seem intent on fighting for corners that should
not exist.

Best wishes,


