# [Taxacom] Random taxonomy

Peter Rauch peterar at berkeley.edu
Fri Nov 29 14:31:59 CST 2013

```Making no assumptions about the likely relative abundance of the 50 species
in nature, and about the relative likelihood of those species being
collected, then ...

The probability that each specimen would be correctly determined by
randomly assigning one of the fifty names to to each specimen --i.e., pick
one specimen from the 43 and randomly assign it a name from among the fifty
names-- is 1 in 50.

The probability of correctly naming the second specimen is exactly the
same: 1 in 50.

Etc. through all 43 specimens.

[Note that by ignoring any assumptions, as stated above, the 1-in-50
probability holds little credibility.)

For the robot (i.e., the random process of assigning the name to the
specimen) to "do the job as well as the human", the robot would need only
identify one specimen correctly --i.e., a correctly named specimen only
once among the 43 specimens.

The probability of the robot doing that is actually very high (esp.
relative to the notion that it is infinitely small, as some have suggested).

However, because the question relates to a real situation --about actual
blowflies collected in a particular country-- the assumption that each of
the fifty species known to occur in that country is equally likely to be
collected is probably a very weak assumption.  More likely, some of the
fifty species are very likely to be collected repeatedly, and other species
will be rarely collected (this also assumes that blowfly collectors are not
out collecting blowflies with a biased focus on obtaining particular
species, or collecting in specific "habitats", etc).

So, assuming that the likelihood that the frequency distribution of species
represented in the collection of 43 specimens is more like the one found in
nature, the game of simple random assignment of species names to a specimen
is a worse case model; the model could be improved --to be more realistic--
if the species names were being pulled randomly from a bucket of names that
were found in the bucket with the same frequency as those species are
encountered in nature.

To look at it another way, this person could have named every one of the 43
specimens with the name of the most common-occuring species in the country.
Unless the collection of 43 specimens was built in a very biased manner, it
is highly likely that ONE specimen would be correctly identified by the
person.

All in all, the answer to the problem is going to be quite suspect because
of these various factors of biases being likely to come into play (making
the worst case model a very poor representation of the reality).

Peter

On Fri, Nov 29, 2013 at 7:55 AM, Knut Rognes <knut at rognes.no> wrote:

> Thanks to all for replying outside and within the list.
>
> My raising the question of random taxonomy was inspired by a real case
> study. 43 specimens of blowflies was identified by a certain person. In the
> person's country there are about 50 species of blowflies. All his
> identifications was erroneous, except for one. My thought was then: Would a
> robot have done better, given a label dispenser?
>
> Some replies I have got suggest that the robot might have done a job as
> good
> as the human.
>
> Knut
>
>
>
>
> On 29 November 2013 11:24, Knut Rognes <knut at rognes.no> wrote:
> > Dear Taxacomers,
> >
> >
> >
> > I have a statistical problem.
> >
> >
> >
> > Consider 50 black boxes, within each is a specimen of fly. Each fly
> > has been identified by someone, its name written on the inside of the
> > box, but this is invisible to you. You cannot peek inside. Each fly
> > belong to one of 50 possible species.
> >
> >
> >
> > You have at your disposal the 50 possible species names for these
> > flies, each name printed on an adhesive label, the supply of printed
> > labels for each name is limitless.
> >
> >
> >
> > Here is the game: you affix a random label on the outside of a random
> box.
> >
> >
> >
> > Now the problem: What is the likelihood that you put a correct label
> > on the box, i.e. that the name on the label matches the identity of the
> fly within?
> >
> >
> >
> > Knut Rognes
> >
> > Oslo, Norway
>
>

```