# [Taxacom] Random taxonomy

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Fri Nov 29 15:32:36 CST 2013

```Can someone who understands probability explain the following to me?

Suppose that there is a symmetrical fork in the road leading to two equivalent residential areas. A car approaches the fork. What is the probability that it will go left?

There seems to me to be two quite different notions of probability involved, and I don't know which of them is what people mean by 'probability':

(1) For a large enough sample of randomly selected cars, the probability will be 50% (i.e. for 1000 cars, approx. 500 will turn left); but

(2) For any particular car, the probability of it turning left will depend on such factors as where the driver lives, etc., and is therefore extremely unlikely to be 50%. Though it is unlikely to be either 100% or 0% either, since they may be going to visit a friend down the opposite fork, or any number of other factors might cause them to turn down the other fork occasionally, and all possible probabilities from 0-100% are possible!

So, what is the probability of the car going left?

Stephen

From: JF Mate <aphodiinaemate at gmail.com>
To: Taxacom <taxacom at mailman.nhm.ku.edu>
Sent: Saturday, 30 November 2013 10:13 AM
Subject: Re: [Taxacom] Random taxonomy

My mistake. I didn´t read you initial post carefully.

Jason

On 29 November 2013 22:05, Knut Rognes <knut at rognes.no> wrote:
> Both cases were meant to be the same. The supply of labels is infinite for
> each species name. (Which is sampling with replacement).
>
> Knut R
>
> -----Opprinnelig melding-----
> Fra: taxacom-bounces at mailman.nhm.ku.edu
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] På vegne av JF Mate
> Sendt: 29. november 2013 21:47
> Til: Taxacom
> Emne: Re: [Taxacom] Random taxonomy
>
> Are we talking about the hypothetical 50 boxes and 50 labels example or the
> real life example? Because they are completely different. One has sampling
> with replacement and the other doesn´t.
>
> Jason
>
> On 29 November 2013 21:31, Peter Rauch <peterar at berkeley.edu> wrote:
>> Making no assumptions about the likely relative abundance of the 50
>> species in nature, and about the relative likelihood of those species
>> being collected, then ...
>>
>> The probability that each specimen would be correctly determined by
>> randomly assigning one of the fifty names to to each specimen --i.e.,
>> pick one specimen from the 43 and randomly assign it a name from among
>> the fifty
>> names-- is 1 in 50.
>>
>> The probability of correctly naming the second specimen is exactly the
>> same: 1 in 50.
>>
>> Etc. through all 43 specimens.
>>
>> [Note that by ignoring any assumptions, as stated above, the 1-in-50
>> probability holds little credibility.)
>>
>>
>> For the robot (i.e., the random process of assigning the name to the
>> specimen) to "do the job as well as the human", the robot would need
>> only identify one specimen correctly --i.e., a correctly named
>> specimen only once among the 43 specimens.
>>
>> The probability of the robot doing that is actually very high (esp.
>> relative to the notion that it is infinitely small, as some have
> suggested).
>>
>> However, because the question relates to a real situation --about
>> actual blowflies collected in a particular country-- the assumption
>> that each of the fifty species known to occur in that country is
>> equally likely to be collected is probably a very weak assumption.
>> More likely, some of the fifty species are very likely to be collected
>> repeatedly, and other species will be rarely collected (this also
>> assumes that blowfly collectors are not out collecting blowflies with
>> a biased focus on obtaining particular species, or collecting in specific
> "habitats", etc).
>>
>> So, assuming that the likelihood that the frequency distribution of
>> species represented in the collection of 43 specimens is more like the
>> one found in nature, the game of simple random assignment of species
>> names to a specimen is a worse case model; the model could be improved
>> --to be more realistic-- if the species names were being pulled
>> randomly from a bucket of names that were found in the bucket with the
>> same frequency as those species are encountered in nature.
>>
>> To look at it another way, this person could have named every one of
>> the 43 specimens with the name of the most common-occuring species in the
> country.
>> Unless the collection of 43 specimens was built in a very biased
>> manner, it is highly likely that ONE specimen would be correctly
>> identified by the person.
>>
>> All in all, the answer to the problem is going to be quite suspect
>> because of these various factors of biases being likely to come into
>> play (making the worst case model a very poor representation of the
> reality).
>>
>> Peter
>>
>> On Fri, Nov 29, 2013 at 7:55 AM, Knut Rognes <knut at rognes.no> wrote:
>>
>>> Thanks to all for replying outside and within the list.
>>>
>>> My raising the question of random taxonomy was inspired by a real
>>> case study. 43 specimens of blowflies was identified by a certain
>>> person. In the person's country there are about 50 species of
>>> blowflies. All his identifications was erroneous, except for one. My
>>> thought was then: Would a robot have done better, given a label
> dispenser?
>>>
>>> Some replies I have got suggest that the robot might have done a job
>>> as good as the human.
>>>
>>> Knut
>>>
>>>
>>>
>>>
>>> On 29 November 2013 11:24, Knut Rognes <knut at rognes.no> wrote:
>>> > Dear Taxacomers,
>>> >
>>> >
>>> >
>>> > I have a statistical problem.
>>> >
>>> >
>>> >
>>> > Consider 50 black boxes, within each is a specimen of fly. Each fly
>>> > has been identified by someone, its name written on the inside of
>>> > the box, but this is invisible to you. You cannot peek inside. Each
>>> > fly belong to one of 50 possible species.
>>> >
>>> >
>>> >
>>> > You have at your disposal the 50 possible species names for these
>>> > flies, each name printed on an adhesive label, the supply of
>>> > printed labels for each name is limitless.
>>> >
>>> >
>>> >
>>> > Here is the game: you affix a random label on the outside of a
>>> > random
>>> box.
>>> >
>>> >
>>> >
>>> > Now the problem: What is the likelihood that you put a correct
>>> > label on the box, i.e. that the name on the label matches the
>>> > identity of the
>>> fly within?
>>> >
>>> >
>>> >
>>> > Knut Rognes
>>> >
>>> > Oslo, Norway
>>>
>>>
>> _______________________________________________
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>>
>> (1) by visiting http://taxacom.markmail.org/
>>
>> (2) a Google search specified as:
>> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>>
>> Celebrating 26 years of Taxacom in 2013.
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org/
>
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom
>
> Celebrating 26 years of Taxacom in 2013.
>
>

_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom Archive back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org/