[Taxacom] Why character-tracking doesn't happen?

Mario Blanco mblanco at flmnh.ufl.edu
Sat Sep 13 19:52:46 CDT 2008

Bob Mesibov wrote:
"I used to think that each position in a nucleic acid sequence was a
character. In fact, this is not how molecular phylogeny programs work. A
multiple sequence alignment program first does its best to find matching
patterns around indels. The five character states then analysed (A,T,G,C
and missing) are states of an artificially created character - a
vertical column in a multiple sequence alignment. This 'column'
character varies from analysis to analysis, depending on which sequences
are used. How can you test the independence of such things? Aren't all
the columns covering an indel really a single entity, so that the A,T,G
and C's in the 'non-indel' sequences at this point are non-independent?"

This is analogous to the situation in which you score morphological data 
(e.g., color of petals), and then you hava a species with no petals.  
For this species, the character state here becomes "missing", and you 
effectively create a gap in the matrix (and indels are simply gaps in an 
aligned matrix).  The use of gaps as characters has been discussed at 
length [for example, by Simmons & Ochoterena (2000) Gaps as characters 
in sequence-based phylogenetic analyses. Systematic Biology 49: 369-381].

Now, the secondary structure of both coding and non-coding DNA (and also 
that of transcripts and protein sequences) probably renders many 
adjacent nucleic acid positions non-independent, and this is a whole 
different can of worms...

Taxacom mailing list
Taxacom at mailman.nhm.ku.edu

More information about the Taxacom mailing list