# [Taxacom] cladistics (was: clique analysis in textbooks)

Herbert Jacobson jakejudy at hotmail.com
Mon Aug 22 21:24:42 CDT 2011

```I don't think clustering is "...grouping by a data matix." Quite the opposite, it grouping by the "coefficient matrix" which is the result of some sort of data matrix manipulation.

Herb

> Date: Sat, 20 Aug 2011 12:36:30 -0500
> From: Richard.Zander at mobot.org
> To: morris.bob at gmail.com; sevragorgia at gmail.com
> CC: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] cladistics (was: clique analysis in textbooks)
>
> I think taxacomers who lack decisive training in phenetic analysis, which is most of us, figure clustering is grouping by a data matrix that compares one taxon and one variable and then some similarity algorithm. Thus, Sergio is correct that an instant similarity or distance tree is different from a parsimony tree, in terms of what we have been told: i.e. that phenetics and parsimony are different.
>
> On the other hand, I took a tutorial course (3 days) in clustering techniques (didn't learn much, of course) at a meeting of the Classification Socity from the then president of the Society and Pierre Legendre. I asked, ahem, if parsimony was a clustering technique. The two glanced at each other furtively, then opined that indeed parsimony is a clustering technique. Thus, authority says it is.
>
> Yes, parsimony does calculate a bunch of distance trees and selects recursively (I think) the shortest tree because it is NP-complete (NP-hard), i.e., can't complete an exact solution in polynomial time. So...does the fact that we have to do heuristic sampling to get any sort of tree make parsimony not clustering? I think this is what this thread is about.
>
> Surely the product is a distance tree based on shortest transformation set?
>
>
>
> * * * * * * * * * * * *
> Richard H. Zander
> Missouri Botanical Garden, PO Box 299, St. Louis, MO 63166-0299 USA
> Web sites: http://www.mobot.org/plantscience/resbot/ and http://www.mobot.org/plantscience/bfna/bfnamenu.htm
> Modern Evolutionary Systematics Web site: http://www.mobot.org/plantscience/resbot/21EvSy.htm
>
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Bob Morris
> Sent: Friday, August 19, 2011 10:53 PM
> To: Sergio Vargas
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] cladistics (was: clique analysis in textbooks)
>
> On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas <sevragorgia at gmail.com> wrote:
> "...because clustering can be done (computationally) efficiently
> whereas searching for an optimal tree using phylogenetic methods
> cannot."
>
> It's fair enough that some or even all biologists might have a usage
> of "clustering" that meet all of your explanation, and perhaps even
> that this should be agreed to by all of the readership of taxacom. I
> wouldn't know. But in statistical pattern recognition and datamining,
> not everything called clustering can be done computationally
> efficiently. Many techniques those disciplines call clustering are
> intractable in the sense that they are NP-hard. Informally, this means
> that  (with presently understood computational complexity theory),
> they fundamentally scale at least exponentially with size of the data
> and no algorithm can circumvent that, just as for optimal tree
> induction problems.  So I can only understand your text as meaning
> "...because clustering as meant by all practicing phylogeneticists can
> be done (computationally) efficiently...",  and that is why you are
> prepared to subsequently say that the rest of your explanation  "[...]
> is so basic I cannot believe I am explaining it".
>
> I do wonder a little whether in fact all practicing phylogeneticist
> readers of taxacom understand by "clustering" only tractable
> algorithms.
>
> Bob Morris
>
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> IT Staff
> Filtered Push Project
> Harvard University Herbaria
>
>
>
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
>
>
>
> On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas <sevragorgia at gmail.com> wrote:
> > Hi,
> >
> >  >Clustering is clustering is clustering. Group some things together and
> > you are clustering - however it is done.
> >
> > no you are not. Grouping is not clustering, there are many ways to group
> > things together not involving clustering. Maximum parsimony, maximum
> > likelihood and bayesian analysis are not clustering. It is simply
> > incorrect to call to these methods clustering. When you run either of
> > the above analyses you are not clustering, despite the result being
> > something similar to a cluster. If you could reduce phylogenetic
> > inference to clustering everything would be so easy (computationally
> > speaking) because clustering can be done (computationally) efficiently
> > whereas searching for an optimal tree using phylogenetic methods cannot.
> > Taxa are only "clustered" (randomly or sequentially) together to build
> > the first tree, afterwards entire topologies are evaluated, taxa are not
> > clustered. This is so basic I cannot believe I am explaining it.
> >
> > sergio
> >
> > --
> > Sergio Vargas R., M.Sc.
> > Dept. of Earth&  Environmental Sciences
> > Palaeontology&  Geobiology
> > Ludwig-Maximilians-Universität München
> > Richard-Wagner-Str. 10
> > 80333 München
> > Germany
> > tel. +49 89 2180 17929
> > s.vargas at lrz.uni-muenchen.de
> > sevra at marinemolecularevolution.org
> >
> > check my webpage:
> > http://www.marinemolecularevolution.org
> >
> > check my research ID:
> > http://www.researcherid.com/rid/A-5678-2011
> >
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with either of these methods:
> >
> > (1) by visiting http://taxacom.markmail.org
> >
> > (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> >
>
>
>
> --
> Robert A. Morris
>
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> IT Staff
> Filtered Push Project
> Department of Organismal and Evolutionary Biology
> Harvard University
>
>
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of these methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of these methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

```