[Taxacom] Random taxonomy

Knut Rognes knut at rognes.no
Fri Nov 29 15:05:26 CST 2013

```Both cases were meant to be the same. The supply of labels is infinite for
each species name. (Which is sampling with replacement).

Knut R

-----Opprinnelig melding-----
Fra: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] På vegne av JF Mate
Sendt: 29. november 2013 21:47
Til: Taxacom
Emne: Re: [Taxacom] Random taxonomy

Are we talking about the hypothetical 50 boxes and 50 labels example or the
real life example? Because they are completely different. One has sampling
with replacement and the other doesn´t.

Jason

On 29 November 2013 21:31, Peter Rauch <peterar at berkeley.edu> wrote:
> Making no assumptions about the likely relative abundance of the 50
> species in nature, and about the relative likelihood of those species
> being collected, then ...
>
> The probability that each specimen would be correctly determined by
> randomly assigning one of the fifty names to to each specimen --i.e.,
> pick one specimen from the 43 and randomly assign it a name from among
> the fifty
> names-- is 1 in 50.
>
> The probability of correctly naming the second specimen is exactly the
> same: 1 in 50.
>
> Etc. through all 43 specimens.
>
> [Note that by ignoring any assumptions, as stated above, the 1-in-50
> probability holds little credibility.)
>
>
> For the robot (i.e., the random process of assigning the name to the
> specimen) to "do the job as well as the human", the robot would need
> only identify one specimen correctly --i.e., a correctly named
> specimen only once among the 43 specimens.
>
> The probability of the robot doing that is actually very high (esp.
> relative to the notion that it is infinitely small, as some have
suggested).
>
> However, because the question relates to a real situation --about
> actual blowflies collected in a particular country-- the assumption
> that each of the fifty species known to occur in that country is
> equally likely to be collected is probably a very weak assumption.
> More likely, some of the fifty species are very likely to be collected
> repeatedly, and other species will be rarely collected (this also
> assumes that blowfly collectors are not out collecting blowflies with
> a biased focus on obtaining particular species, or collecting in specific
"habitats", etc).
>
> So, assuming that the likelihood that the frequency distribution of
> species represented in the collection of 43 specimens is more like the
> one found in nature, the game of simple random assignment of species
> names to a specimen is a worse case model; the model could be improved
> --to be more realistic-- if the species names were being pulled
> randomly from a bucket of names that were found in the bucket with the
> same frequency as those species are encountered in nature.
>
> To look at it another way, this person could have named every one of
> the 43 specimens with the name of the most common-occuring species in the
country.
> Unless the collection of 43 specimens was built in a very biased
> manner, it is highly likely that ONE specimen would be correctly
> identified by the person.
>
> All in all, the answer to the problem is going to be quite suspect
> because of these various factors of biases being likely to come into
> play (making the worst case model a very poor representation of the
reality).
>
> Peter
>
> On Fri, Nov 29, 2013 at 7:55 AM, Knut Rognes <knut at rognes.no> wrote:
>
>> Thanks to all for replying outside and within the list.
>>
>> My raising the question of random taxonomy was inspired by a real
>> case study. 43 specimens of blowflies was identified by a certain
>> person. In the person's country there are about 50 species of
>> blowflies. All his identifications was erroneous, except for one. My
>> thought was then: Would a robot have done better, given a label
dispenser?
>>
>> Some replies I have got suggest that the robot might have done a job
>> as good as the human.
>>
>> Knut
>>
>>
>>
>>
>> On 29 November 2013 11:24, Knut Rognes <knut at rognes.no> wrote:
>> > Dear Taxacomers,
>> >
>> >
>> >
>> > I have a statistical problem.
>> >
>> >
>> >
>> > Consider 50 black boxes, within each is a specimen of fly. Each fly
>> > has been identified by someone, its name written on the inside of
>> > the box, but this is invisible to you. You cannot peek inside. Each
>> > fly belong to one of 50 possible species.
>> >
>> >
>> >
>> > You have at your disposal the 50 possible species names for these
>> > flies, each name printed on an adhesive label, the supply of
>> > printed labels for each name is limitless.
>> >
>> >
>> >
>> > Here is the game: you affix a random label on the outside of a
>> > random
>> box.
>> >
>> >
>> >
>> > Now the problem: What is the likelihood that you put a correct
>> > label on the box, i.e. that the name on the label matches the
>> > identity of the
>> fly within?
>> >
>> >
>> >
>> > Knut Rognes
>> >
>> > Oslo, Norway
>>
>>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.

_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom Archive back to 1992 may be searched with either of these
methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom