Symbolic Math + A Probability Monad

You can do some fun things if you replace the floats in a probability monad with symbolic representations. As long as the computation is exact over the description, you can operate without any need to resort to a numeric representation (as would be the case for a sampling based approach). This yields something quite interesting that could be of pedagogical or even exploratory interest. In effect, since running the computation builds a mathematical formula, we get to observe how probabilities accumulate down each branch into a node (without explicitly instantiating a tree visualization). The result is something that's more general than a simulation but less general than an equation.

Simple Coin Flipping Examples

In this example we look at 3 flips of a coin that's heads with probability $p$.

In [2]:
let rec coinflip flips n = cont{
   if n = 0 then return flips
   else
    let! flip = bernoulliChoice p ("H","T") 
    return! coinflip (flip+flips) (n-1) }

Model(coinflip "" 3).Reify() 
|> latexTable
Out[2]:
$$\begin{array} \hline Probability & Value \\ \hline {p}^{3} & HHH \\ \hline {p}^{2}\left(1 - p\right) & HHT \\ \hline {p}^{2}\left(1 - p\right) & HTH \\ \hline p{\left(1 - p\right)}^{2} & HTT \\ \hline {p}^{2}\left(1 - p\right) & THH \\ \hline p{\left(1 - p\right)}^{2} & THT \\ \hline p{\left(1 - p\right)}^{2} & TTH \\ \hline {\left(1 - p\right)}^{3} & TTT \\ \hline\end{array}$$

The next example asks: what is the chance of at least n heads in m coin flips where heads has a probability $p$. Sound familiar? It's like the binomial distribution except you have a limited number of trials.

In [3]:
let hasHeads count = List.filter ((=) "H") >> List.length >> ((<=) count)

let rec atleast_n_in_m_flips count flips n = cont{
    if n = 0 then
     if hasHeads count flips then return true 
     else return false
    else let! flip = bernoulliChoice p ("H","T")  
         return! atleast_n_in_m_flips count (flip::flips) (n-1) }
        
Model(atleast_n_in_m_flips 2 [] 2).Reify()   
|> latexTable
Out[3]:
$$\begin{array} \hline Probability & Value \\ \hline 2p\left(1 - p\right) + {\left(1 - p\right)}^{2} & False \\ \hline {p}^{2} & True \\ \hline\end{array}$$

One can try and work out if there is some kind of general pattern or at least gain some idea towards the general behavior of model:

In [4]:
Model(atleast_n_in_m_flips 2 [] 3).Reify()  
|> latexTable
Out[4]:
$$\begin{array} \hline Probability & Value \\ \hline 3p{\left(1 - p\right)}^{2} + {\left(1 - p\right)}^{3} & False \\ \hline {p}^{3} + 3{p}^{2}\left(1 - p\right) & True \\ \hline\end{array}$$
In [5]:
Model(atleast_n_in_m_flips 2 [] 4).Reify()  
|> latexTable
Out[5]:
$$\begin{array} \hline Probability & Value \\ \hline 4p{\left(1 - p\right)}^{3} + {\left(1 - p\right)}^{4} & False \\ \hline {p}^{4} + 4{p}^{3}\left(1 - p\right) + 6{p}^{2}{\left(1 - p\right)}^{2} & True \\ \hline\end{array}$$
In [6]:
Model(atleast_n_in_m_flips 3 [] 3).Reify()  
|> latexTable
Out[6]:
$$\begin{array} \hline Probability & Value \\ \hline 3{p}^{2}\left(1 - p\right) + 3p{\left(1 - p\right)}^{2} + {\left(1 - p\right)}^{3} & False \\ \hline {p}^{3} & True \\ \hline\end{array}$$
In [7]:
Model(atleast_n_in_m_flips 3 [] 4).Reify()  
|> latexTable
Out[7]:
$$\begin{array} \hline Probability & Value \\ \hline 6{p}^{2}{\left(1 - p\right)}^{2} + 4p{\left(1 - p\right)}^{3} + {\left(1 - p\right)}^{4} & False \\ \hline {p}^{4} + 4{p}^{3}\left(1 - p\right) & True \\ \hline\end{array}$$
In [8]:
Model(atleast_n_in_m_flips 3 [] 5).Reify()  
|> latexTable
Out[8]:
$$\begin{array} \hline Probability & Value \\ \hline 10{p}^{2}{\left(1 - p\right)}^{3} + 5p{\left(1 - p\right)}^{4} + {\left(1 - p\right)}^{5} & False \\ \hline {p}^{5} + 5{p}^{4}\left(1 - p\right) + 10{p}^{3}{\left(1 - p\right)}^{2} & True \\ \hline\end{array}$$
In [9]:
Model(atleast_n_in_m_flips 3 [] 6).Reify()  
|> latexTable
Out[9]:
$$\begin{array} \hline Probability & Value \\ \hline 15{p}^{2}{\left(1 - p\right)}^{4} + 6p{\left(1 - p\right)}^{5} + {\left(1 - p\right)}^{6} & False \\ \hline {p}^{6} + 6{p}^{5}\left(1 - p\right) + 15{p}^{4}{\left(1 - p\right)}^{2} + 20{p}^{3}{\left(1 - p\right)}^{3} & True \\ \hline\end{array}$$

Notice how probabilities represent paths but unless you are exceptionally good at maths, the generalization might be hard to deduce (such as that the pattern of the numeric coefficients are the same as would be found in Pascal's triangle). Nonetheless, the more concrete description paired with some expanded examples might aid in becoming more comfortable with the more general but quite abstract binomial's equation. It's about the number of paths through a tree like structure, where each plus can be read as an or, to our destination.

Next, we will try to get a better understanding by listing the possibilities.

In [10]:
let rec atleast_n_in_m_flips_list count flips n = cont{
    if n = 0 then
     do! (observe (hasHeads count flips))
     return (List.rev flips)
    else let! flip = bernoulliChoice p ("H","T") 
         let flips' = flip :: flips
         if hasHeads count flips' then return List.rev flips' 
         else return! atleast_n_in_m_flips_list count flips' (n-1) 
}

Model(atleast_n_in_m_flips_list 2 [] 4).Reify()  
|> latexTable
Out[10]:
$$\begin{array} \hline Probability & Value \\ \hline {p}^{2} & [H,H] \\ \hline {p}^{2}\left(1 - p\right) & [H,T,H] \\ \hline {p}^{2}{\left(1 - p\right)}^{2} & [H,T,T,H] \\ \hline {p}^{2}\left(1 - p\right) & [T,H,H] \\ \hline {p}^{2}{\left(1 - p\right)}^{2} & [T,H,T,H] \\ \hline {p}^{2}{\left(1 - p\right)}^{2} & [T,T,H,H] \\ \hline\end{array}$$

Geometric Distribution

The geometric distribution is recursively defined and will not terminate. We therefore limit exploration to depth 10.

In [11]:
Model(cont{ return! geometric 18 p }).Reify(10) |> latexTable
Out[11]:
$$\begin{array} \hline Probability & Value \\ \hline p{\left(1 - p\right)}^{11} & thunk \\ \hline {\left(1 - p\right)}^{12} & thunk \\ \hline p & 18 \\ \hline p\left(1 - p\right) & 19 \\ \hline p{\left(1 - p\right)}^{2} & 20 \\ \hline p{\left(1 - p\right)}^{3} & 21 \\ \hline p{\left(1 - p\right)}^{4} & 22 \\ \hline p{\left(1 - p\right)}^{5} & 23 \\ \hline p{\left(1 - p\right)}^{6} & 24 \\ \hline p{\left(1 - p\right)}^{7} & 25 \\ \hline p{\left(1 - p\right)}^{8} & 26 \\ \hline p{\left(1 - p\right)}^{9} & 27 \\ \hline p{\left(1 - p\right)}^{10} & 28 \\ \hline\end{array}$$

It's easy to read the meaning of the paths here compared to the previous: N failures and then a success. Thunk here, signifies unexpanded/unexplored paths.

Fair coin

John von Neumann is said to have invented a technique to make any coin fair: flip it twice, if both values are the same, try again; otherwise, take the first flip. Notice that this algorithm has a non-zero chance of never terminating. However, to get an intuition, simply exploring a very shallow number of depths can be a large aid for getting an intuitive understanding of the method. Indeed, a too high max depth can be obscuring. The essence of the technique is in how it contrives to have there be an equal number of paths to either a T or an H. One can also notice something about flips that are even numbers; only on odd number of max flips do the possibilities change.

In [12]:
let rec faircoin path p = cont {
    let! flip1 = bernoulliChoice p ("H", "T") 
    let! flip2 = bernoulliChoice p ("H", "T") 
    if flip1 = flip2 then return! faircoin ((sprintf "%s\&%s\\;" flip1 flip2) + path) p
    else return (sprintf "[%s]\&%s\\;;\\;" flip1 flip2) + path 
}
 
Model(faircoin "" p).Reify(limit = 5) 
|> List.filter notThunk 
|> latexTable
Out[12]:
$$\begin{array} \hline Probability & Value \\ \hline p\left(1 - p\right) & [H]\&T\;;\; \\ \hline {p}^{3}\left(1 - p\right) & [H]\&T\;;\;H\&H\; \\ \hline {p}^{5}\left(1 - p\right) & [H]\&T\;;\;H\&H\;H\&H\; \\ \hline {p}^{3}{\left(1 - p\right)}^{3} & [H]\&T\;;\;H\&H\;T\&T\; \\ \hline p{\left(1 - p\right)}^{3} & [H]\&T\;;\;T\&T\; \\ \hline {p}^{3}{\left(1 - p\right)}^{3} & [H]\&T\;;\;T\&T\;H\&H\; \\ \hline p{\left(1 - p\right)}^{5} & [H]\&T\;;\;T\&T\;T\&T\; \\ \hline p\left(1 - p\right) & [T]\&H\;;\; \\ \hline {p}^{3}\left(1 - p\right) & [T]\&H\;;\;H\&H\; \\ \hline {p}^{5}\left(1 - p\right) & [T]\&H\;;\;H\&H\;H\&H\; \\ \hline {p}^{3}{\left(1 - p\right)}^{3} & [T]\&H\;;\;H\&H\;T\&T\; \\ \hline p{\left(1 - p\right)}^{3} & [T]\&H\;;\;T\&T\; \\ \hline {p}^{3}{\left(1 - p\right)}^{3} & [T]\&H\;;\;T\&T\;H\&H\; \\ \hline p{\left(1 - p\right)}^{5} & [T]\&H\;;\;T\&T\;T\&T\; \\ \hline\end{array}$$

Boy or Girl Paradox

The boy or girl paradox is a seeming paradox of probability where multiple interpretations of a sentence result in ambiguity as to what the correct answer is. The problem as stated by Martin Gardner in 1959 goes as:

  • Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?
  • Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

The first problem is straightforward:

In [13]:
let firstproblem p =
    cont{
        let! child1 = bernoulliChoice p ("boy", "girl")
        let! child2 = bernoulliChoice p ("boy", "girl")
        do! observe(child1 = "girl")
        return (child1,child2) }
    
Model(firstproblem p).Reify() 
|> normalize  
|> latexTable
Out[13]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{p\left(1 - p\right)}{p\left(1 - p\right) + {\left(1 - p\right)}^{2}} & (girl, boy) \\ \hline \frac{{\left(1 - p\right)}^{2}}{p\left(1 - p\right) + {\left(1 - p\right)}^{2}} & (girl, girl) \\ \hline\end{array}$$
In [14]:
Model(firstproblem (1/2Q)).Reify() 
|> normalize 
|> latexTable
Out[14]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & (girl, boy) \\ \hline \frac{1}{2} & (girl, girl) \\ \hline\end{array}$$

Now, the second problem is trickier because there seem to be two valid interpretations. The below is how I interpreted it and, I was confused by how one might consider it any other way:

In [15]:
let paradox p =
    cont{
        let! child1 = bernoulliChoice p ("boy", "girl")
        let! child2 = bernoulliChoice p ("boy", "girl")
        do! observe(child1 = "boy" || child2 = "boy")
        return (child1,child2)
    }
Model(paradox p).Reify() 
|> normalize 
|> List.map (keepRight Rational.expand)
|> latexTable
Out[15]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{p}{2 - p} & (boy, boy) \\ \hline \frac{1 - p}{2 - p} & (boy, girl) \\ \hline \frac{1 - p}{2 - p} & (girl, boy) \\ \hline\end{array}$$
In [16]:
Model(paradox (1/2Q)).Reify() 
|> normalize 
|> latexTable
Out[16]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{3} & (boy, boy) \\ \hline \frac{1}{3} & (boy, girl) \\ \hline \frac{1}{3} & (girl, boy) \\ \hline\end{array}$$

With some thought, I was able to realize that the second interpretation was about whether the first of the two children seen was a boy.

In [17]:
let paradoxform2 p = cont{
    let! child1 = bernoulliChoice p ("boy", "girl")
    let! child2 = bernoulliChoice p ("boy", "girl") 
    let! seefirst = uniform [child1; child2]
    do! observe(seefirst = "boy")
    return (child1,child2) 
}
Model(paradoxform2 p).Reify()  
|> normalize  
|> latexTable      
Out[17]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{{p}^{2}}{{p}^{2} + p\left(1 - p\right)} & (boy, boy) \\ \hline \frac{p\left(1 - p\right)}{2\left({p}^{2} + p\left(1 - p\right)\right)} & (boy, girl) \\ \hline \frac{p\left(1 - p\right)}{2\left({p}^{2} + p\left(1 - p\right)\right)} & (girl, boy) \\ \hline\end{array}$$
In [18]:
Model(paradoxform2 (1Q/2Q)).Reify() 
|> normalize 
|> latexTable   
Out[18]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & (boy, boy) \\ \hline \frac{1}{4} & (boy, girl) \\ \hline \frac{1}{4} & (girl, boy) \\ \hline\end{array}$$

It's also noticeable, by comparing the generated equations, that even though the answer to both questions can be $\frac{1}{2}$, the manner or path there is quite different. I'll posit that there is yet another way to approach the problem and that approach is the most common (and incorrect). It goes as: if one of them is a boy then either the other is a boy or is a girl. That is, it's given the same structure as the first problem.

In [19]:
latexTable [1Q/2Q, Value("boy","boy");
            1Q/2Q, Value("boy","girl") ] 
Out[19]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & (boy, girl) \\ \hline \frac{1}{2} & (boy, boy) \\ \hline\end{array}$$

This is not a valid inference to make as it does not properly account for the added probability of which child was seen first. I've seen this problem argued over at length on the internet; even the wikipedia treatment is quite long. I think, however, that this treatment is clear on the matter while being compact and manipulable.

Streaks

The following is a puzzle I found on twitter:

Imagine three basketball players & hoops all lined up in a row, with each player attempting a single shot at the exact same time. Assume each player is a 50% shooter and not influenced by the other players.

After all three players have taken their shot, tell each player:

  1. Put on a red shirt if the player to your left hit their shot
  2. Put on a blue shirt if the player to your left missed their shot
  3. Put on a grey shirt otherwise

You ask for a player with a blue shirt to step forward. What is the probability his/her shot was successful?

My straightforward interpretation of how it is written yields 50% as the answer.

In [20]:
type Shot = Score | Miss | NoShot

let assignShirtWith = function Some Score -> "red" | Some Miss -> "blue" | _ -> "grey"

cont {
 //Sample Shots
 let! shotResult1 = bernoulliChoice (1Q/2Q) (Score, Miss)
 let! shotResult2 = bernoulliChoice (1Q/2Q) (Score, Miss)
 let! shotResult3 = bernoulliChoice (1Q/2Q) (Score, Miss)
 //Get Colors
 let color1 = assignShirtWith None 
 let color2 = assignShirtWith (Some shotResult1)
 let color3 = assignShirtWith (Some shotResult2) 
 //Sample random person
 let! shotresult,color = uniform [shotResult1,color1; shotResult2,color2; shotResult3,color3]
 //condition on blue
 do! (observe (color = "blue"))
 return shotresult 
} |> exact_reify 
  |> normalize 
  |> latexTable   
Out[20]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & Score \\ \hline \frac{1}{2} & Miss \\ \hline\end{array}$$

This is incorrect however, with the issue being not accounting for streaks. Although, I also think the occluding nature of the problem description is problematic. Divesting of extraneous detail, it's clear that the problem is really about collecting streaks of misses greater than some length. Then, considering all the ways the shots could have gone, what is the chance that a randomly selected shot from the streak of misses is a score? I then normalize, as we've filtered away some paths. In this, I will skip using variables and work directly with $p=0.5$. For a lower number of shots, working with rational numbers is still clearer than with floats (at higher numbers, rational numbers become essentially ungraspable). For clarity, I also show the possible collected shots.

In [21]:
let prettify (shot,coll) = sprintf "%A\;\mathrm{from}\;[%s]" shot (Strings.joinToStringWith "\;\&\;" coll)

let rec shootStreak streaklen streakcount collected prevShooterResult n = cont {
     if n = 0 then 
        let! shot = uniform collected                                                                                                                                                                                       
        return shot, collected 
     else
       let! shotResult = bernoulliChoice (1/2Q) (Score, Miss) 
       let streakcount' = if prevShooterResult = Miss then streakcount + 1 else 0
       let collected' = if streakcount' >= streaklen then shotResult::collected else collected
       return! (shootStreak streaklen streakcount' collected' shotResult (n-1))
} 
shootStreak 1 0 [] NoShot 3
|> exact_reify 
|> ProbVal.map prettify
|> normalize
|> latexTable   
Out[21]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & Score\;\mathrm{from}\;[Score] \\ \hline \frac{1}{12} & Score\;\mathrm{from}\;[Score\;\&\;Miss] \\ \hline \frac{1}{12} & Miss\;\mathrm{from}\;[Score\;\&\;Miss] \\ \hline \frac{1}{6} & Miss\;\mathrm{from}\;[Miss] \\ \hline \frac{1}{6} & Miss\;\mathrm{from}\;[Miss\;\&\;Miss] \\ \hline\end{array}$$

Here is an example with a higher number of shots to more clearly show what's going on.

In [22]:
shootStreak 3 0 [] NoShot 7
|> exact_reify 
|> normalize
|> ProbVal.map prettify
|> latexTable
Out[22]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{2} & Score\;\mathrm{from}\;[Score] \\ \hline \frac{1}{10} & Score\;\mathrm{from}\;[Score\;\&\;Miss] \\ \hline \frac{1}{40} & Score\;\mathrm{from}\;[Score\;\&\;Miss\;\&\;Miss] \\ \hline \frac{1}{160} & Score\;\mathrm{from}\;[Score\;\&\;Miss\;\&\;Miss\;\&\;Miss] \\ \hline \frac{1}{10} & Miss\;\mathrm{from}\;[Score\;\&\;Miss] \\ \hline \frac{1}{20} & Miss\;\mathrm{from}\;[Score\;\&\;Miss\;\&\;Miss] \\ \hline \frac{3}{160} & Miss\;\mathrm{from}\;[Score\;\&\;Miss\;\&\;Miss\;\&\;Miss] \\ \hline \frac{1}{10} & Miss\;\mathrm{from}\;[Miss] \\ \hline \frac{1}{20} & Miss\;\mathrm{from}\;[Miss\;\&\;Miss] \\ \hline \frac{1}{40} & Miss\;\mathrm{from}\;[Miss\;\&\;Miss\;\&\;Miss] \\ \hline \frac{1}{40} & Miss\;\mathrm{from}\;[Miss\;\&\;Miss\;\&\;Miss\;\&\;Miss] \\ \hline\end{array}$$

Differential Privacy

This differential privacy example is, I think, the neatest, in that we are not computing a truncated slice of some more general equation. The specification by code also fully generates the corresponding equation. It's really neat how writing such a general natural specification of a problem also generates a detailed mathematical description of it. It's rare to run into such an example. By the way, I wrote an intro to Differential Privacy in this article and you can read the wiki article too. The essence of differential privacy is that it is a way of gathering data while offering plausible deniability to the respondent.

In [24]:
Model(cont{
    let! belief = bernoulliChoice p_antibacon (@"Hate\;bacon🛑🥓",@"Love\;bacon❤🥓")
    let! flip = bernoulli p_coin
    if flip then return belief
    else return "Hate\;bacon🛑🥓"  
}).Reify() 
|> latexTable
Out[24]:
$$\begin{array} \hline Probability & Value \\ \hline {p_{antibacon}}{p_{coin}} + {p_{antibacon}}\left(1 - {p_{coin}}\right) + \left(1 - {p_{antibacon}}\right)\left(1 - {p_{coin}}\right) & Hate\;bacon🛑🥓 \\ \hline \left(1 - {p_{antibacon}}\right){p_{coin}} & Love\;bacon❤🥓 \\ \hline\end{array}$$

In the above example, an unethical bacon loving society polls people on their thoughts on bacon. Differential privacy is used to avoid shaming alleged heretic respondents. Notice that the hate bacon choice is ambiguous due to there being multiple paths to it. A slightly more complex example below: a survey with 3 options for a favorite sandwich type. Notice that this version offers a bit more privacy since all end points now have multiple paths to them. Once again, it's really neat how a declarative specification in a straightforward way yields the mathematical formulas below.

In [25]:
Model(cont{
    let! sandwich = categorical [p_hotdog,"hotdog🌭";p_sw,"Sandwich🥖";p_vburg,@"Vegan\;Hamburger🍔"] 
    let! flip = bernoulli p_coin
    if flip then return sandwich
    else return! uniform ["hotdog🌭";"Sandwich🥖";"Vegan\\;Hamburger🍔"]  
}).Reify() 
|> List.map (keepRight Rational.reduce) 
|> latexTable
Out[25]:
$$\begin{array} \hline Probability & Value \\ \hline \frac{1}{3}\left(1 - {p_{coin}}\right){p_{hotdog}} + {p_{coin}}{p_{sandwich}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{sandwich}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{vburger}} & Sandwich🥖 \\ \hline \frac{1}{3}\left(1 - {p_{coin}}\right){p_{hotdog}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{sandwich}} + {p_{coin}}{p_{vburger}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{vburger}} & Vegan\;Hamburger🍔 \\ \hline {p_{coin}}{p_{hotdog}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{hotdog}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{sandwich}} + \frac{1}{3}\left(1 - {p_{coin}}\right){p_{vburger}} & hotdog🌭 \\ \hline\end{array}$$

Conclusion

In this fun little article, I've combined an exact discrete probability monad with a symbolic algebra system. In so doing, the running/unfolding of the computation also generates a mathematical description. The description, due to limits placed by tractability, is usually (but not always just!) a snapshot. However, it can often be more illuminating and more general than a full simulation. I believe this has high potential as a pedagogical tool or aid, especially when looking at causality examples and phenomena like Berkson's or Simpson's Paradox (which I plan to do someday hopefully soonish).

One surprising take away from this is that mathematics is (probably) not declarative. In the descriptions above, the style of thinking that gets you to the equations is much more a detailed investigation of what must/is occurring than it is a specification. This is not the first time I've wondered if everyday math is not a simple imperative assembly language meant to run on a powerful computer capable of incredible deductive inference.

2018-11-04 - Deen Abiola