# [Taxacom] cladistics (was: clique analysis in textbooks)

Herbert Jacobson jakejudy at hotmail.com
Mon Aug 22 21:24:42 CDT 2011

```I don't think clustering is "...grouping by a data matix." Quite the opposite, it grouping by the "coefficient matrix" which is the result of some sort of data matrix manipulation.

Herb

>
> I think taxacomers who lack decisive training in phenetic analysis, which is most of us, figure clustering is grouping by a data matrix that compares one taxon and one variable and then some similarity algorithm. Thus, Sergio is correct that an instant similarity or distance tree is different from a parsimony tree, in terms of what we have been told: i.e. that phenetics and parsimony are different.
>
> On the other hand, I took a tutorial course (3 days) in clustering techniques (didn't learn much, of course) at a meeting of the Classification Socity from the then president of the Society and Pierre Legendre. I asked, ahem, if parsimony was a clustering technique. The two glanced at each other furtively, then opined that indeed parsimony is a clustering technique. Thus, authority says it is.
>
> Yes, parsimony does calculate a bunch of distance trees and selects recursively (I think) the shortest tree because it is NP-complete (NP-hard), i.e., can't complete an exact solution in polynomial time. So...does the fact that we have to do heuristic sampling to get any sort of tree make parsimony not clustering? I think this is what this thread is about.
>
> Surely the product is a distance tree based on shortest transformation set?
>
>
>
> On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas <sevragorgia at gmail.com> wrote:
> "...because clustering can be done (computationally) efficiently
> whereas searching for an optimal tree using phylogenetic methods
> cannot."
>
> It's fair enough that some or even all biologists might have a usage
> of "clustering" that meet all of your explanation, and perhaps even
> that this should be agreed to by all of the readership of taxacom. I
> wouldn't know. But in statistical pattern recognition and datamining,
> not everything called clustering can be done computationally
> efficiently. Many techniques those disciplines call clustering are
> intractable in the sense that they are NP-hard. Informally, this means
> that  (with presently understood computational complexity theory),
> they fundamentally scale at least exponentially with size of the data
> and no algorithm can circumvent that, just as for optimal tree
> induction problems.  So I can only understand your text as meaning
> "...because clustering as meant by all practicing phylogeneticists can
> be done (computationally) efficiently...",  and that is why you are
> prepared to subsequently say that the rest of your explanation  "[...]
> is so basic I cannot believe I am explaining it".
>
> I do wonder a little whether in fact all practicing phylogeneticist
> readers of taxacom understand by "clustering" only tractable
> algorithms.
>
> On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas <sevragorgia at gmail.com> wrote:
> > Hi,
> >
> >  >Clustering is clustering is clustering. Group some things together and
> > you are clustering - however it is done.
> >
> > no you are not. Grouping is not clustering, there are many ways to group
> > things together not involving clustering. Maximum parsimony, maximum
> > likelihood and bayesian analysis are not clustering. It is simply
> > incorrect to call to these methods clustering. When you run either of
> > the above analyses you are not clustering, despite the result being
> > something similar to a cluster. If you could reduce phylogenetic
> > inference to clustering everything would be so easy (computationally
> > speaking) because clustering can be done (computationally) efficiently
> > whereas searching for an optimal tree using phylogenetic methods cannot.
> > Taxa are only "clustered" (randomly or sequentially) together to build
> > the first tree, afterwards entire topologies are evaluated, taxa are not
> > clustered. This is so basic I cannot believe I am explaining it.
> >
> > sergio
> >
