More on the 'cladistics' of sequences

B.J.Tindall bti at DSMZ.DE
Fri Jun 11 10:23:33 CDT 2004

I have to agree with Dick 100% here. I think that there is far too much
confusion surrounding some of the terms we are using:
a) when Cain & Harrison coined the term phenetic they explicity stated that
it was based on "overall similariry" and the data set on which it was based
may include phenotypic and genotypic data (when it becomes available - in
their original paper back at the beginning of the 1960s they also link
phenetic prinicples with "natural classifications" in the sense of Gilmour
- see Huxley, The New Systematics). They contrast phenetic with phyletic
and patristic relationships (remember that "cladistics" wasn't referred to
as such then!!)
b) a couple of years later Harrsion clearly stated that it was a mistake to
coin the term phenetic because people were confusing the link "genotypic =
genetic" and "phenotypic = phenetic"
c) it is interesting to note that Hennig used the term "phylogenetic
systematics", and this has become equivalent to "cladistics". However, if I
recall correctly the term "clade" was coined by Huxley (in about 1959),
with no reference to methodology and simply defined a clade as a
"monophyletic group". I some of the literature I read I am seeing the term
"monophyletic clade" with increasing frequency!!
d) principles such as "numerical taxonomy" are often equated with phenetic
approaches, but the emphasis on phenetic prinicples (natural
classifications in the sense of Gilmour) is largely associated with the
view point of Sokal and Sneath. Strangely it also escapes the attention of
most people that Sokal and Sneath deal with both "phenetic principles" and
"cladistics" in their book on "Numerical Taxonomy" - in other words the
application of computer based methods to the evaluation of data...... We
can of course extent this to concepts such as "information content" and I
think the couter arguments that the prinicple of "information content can
also be applied to cladistic methods.

So what are we left with? As I pointed out in a recent article it all boils
down to how you handle the data set. You can take phenotypic data
(morphology for example) and either try to evaluate the significance of
each individual character (character analysis) or you generate an overall
similarity matrix. You can, of course do exactly the same with gene or
protein sequence data. At no point do the data themselves magically become
"phenetic" or "cladistic". In "character analysis" (I suppose what most of
you on this list would call "cladistics") you would end up with a cladogram
with branching order, but no branch lengths. In "overall similarity"
measures you get both branching order and branch lengths. The other
alternative is to use systems whereby you analyse the individual
characters, and at the same time take into account the number of characters
which contribute to that branch. In the case of morphological data it is
unclear how you would measure that "degree of changes", but when applied to
gene or protein sequences the degree of change may (and I emphasise MAY!!)
be linked to mutation in the gene sequence over geological time. There are
a number of articles now showing that some of the simple assumptions
associated with sequence change over geological time may not hold true.

The next "problem" would seem to be the "phylogenetic" approach, and here I
am never sure what people mean exactly. Depending on which area one is
working in this may mean dendrograms generate by Neighbor-Joining (which,
with its similarity matrix based on "overall similarity" of the gene or
protein sequence looks very "phenetic" to me - and I don't wish to imply
that I consider this to be a negative evaluation) or you use maximum
parsimony, maximum likelihood, or even Baysian methods. The general trend I
see at present is the fact that if its a gene or protein sequence, then its
"phylogenetic", irrespective of what you do to the data. The strange thing
is that it has usually escapes people's attention that under certain
circumstances a "phenetic" interpretation may also reflect the true
phylogeny - that is in the absence of effects such as parallelism,
convergent evolution, gene transfer etc. Strangely, EXCATLY the same
problems relate to ANY other forms of evaluation where you have to work out
the significance of the individual characters. If you look at what is
happening you will see that it is very easy to go around in circles -
something which I think Don Colless alluded to in an e-mail some time ago.
As a matter of interest many automatic sequence alignment programmes make
use of phenetic prinicples to generate the alignment!!!!

Some time ago we had a meeting where I invited Peter Sneath and John
McNeill, both of whom were active in the early 1960s. Both expressed a deja
vu feeling when looking at some of the current debate. If you go back 20-40
years and look at some of the discussion then you will see similar trends.
The phenetic view point was often that if you had enough characters you
would get "natural relationships" - this how now changed to "when we have
access to whole genomes (i.e. the total information) we will get the
natural relationships" - however, I am not sure what one means by "natural"
(which has taken on 3 different meanings in the course of the last 2000
years.......). One should also consider that this is another aspect
relating to "information content". In the case of microbiology the access
to increasing numbers of genomes hasn't provided with an "absolute" answer
- at least not yet. However, it HAS significantly expanded our appreciation
of the complexity of the problem.

These words are written without the intent of causing offense or advocating
one method over another. I simply think that if we do not get our
defintions and concepts right we are back to the Tower of Babel - anyone
got a Babel Fish!?

At 13:30 10.6.2004 -0500, Richard Jensen wrote:
>John Grehan wrote:
>>I've been to cladistic courses so I've been exposed to the principles.
>However, such courses focused on the algorithms, >not the biological (rather
>than logical) issue of whether a posteriori rooting of phenetic characters
>is really cladistics or just >a version of phenetics dressed up in cladistic
Richard Jensen wrote:
>You're still ignoring what has been explained here before.  Characters, in
>and of themselves, are not phenetic or cladistic.  It's the method of
>analysis of the data matrix that determines whether the result may be
>described as phenetic or cladistic.
>Richard J. Jensen              | tel: 574-284-4674
>Department of Biology      | fax: 574-284-4716
>Saint Mary's College         | e-mail: rjensen at
>Notre Dame, IN 46556    |

* Dr.B.J.Tindall      E-MAIL bti at                           *
* DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH *
* Mascheroder Weg 1b, D-38124 Braunschweig, Germany                *
* Tel.: ++ 531 2616 0 (general)                                    *
* Tel.: ++ 531 2616 224 (direct)                                   *
* Fax:  ++ 531 2616 418                                            *
*                                                                  *
* Homepage:                          *
* E-MAIL: contact at (general enquiries)                      *
*         sales at (sales)                                    *

More information about the Taxacom mailing list