[Taxacom] Procedure request

David Campbell pleuronaia at gmail.com
Mon Feb 4 09:47:41 CST 2013

>  What is the value of bootstrap that can be considered acceptable for an
> array of 61 characters for 40 species? I feel that the bootstrap is very
> sensitive to the size of the data matrix, so for morphological traits,
> generally much less numerous than molecular ones, one cannot expect an
> equally high support. Is my statement correct? Are there any references
> dealing with this specific issue?

The relative values will be complicated by the fact that morphological
traits are typically selected for analysis because they are thought to
be reasonably informative, whereas molecular data tends to get
analyzed as a complete sequence, with much less exclusion of
homoplasous characters.  This tends to lower support values for
molecular data.

I don't immediately know of a paper exactly on the relation of
bootstrap values to matrix size and character type, but would not be
at all surprised if a search through back issues of Systematic Biology
or the like turned one up; certainly there would be many studies on
the general sensitivity of bootstrapping to various factors.

> Is it correct to insert more than one species with an identical set of
> characters, that is, the same state for each character in the matrix? Could
> this introduce biases in the results? Are there any published work
> concerning this case?

This can cause problems.  A previous version of the MrBayes manual
mentioned this as potentially creating errors, though I didn't find it
in the most recent (and not yet complete) version.  It also will give
you poorer resolution in parsimony.  Although common sense says that
two identical taxa ought to group together, computers have no common
sense.  When you tell the computer to measure how parsimonious the
various trees are, the computer will find, for example, that a subset
of two identical taxa plus a very similar third one is equally
parsimonious under any arrangement of the three.

Thus, eliminating duplicates is probably generally advisabel unless
the analytical technique explicitly says that it can handle them.  Of
course, this is easier said than done-checking a large data set for
duplicates can be rather tedious, especially if you also may have
varying amounts of missing data and want to select the one with the
most data.

Dr. David Campbell
Visiting Professor
Department of Natural Sciences
Gardner-Webb University
Boiling Springs NC 28017

More information about the Taxacom mailing list