Other parts:

The previous post ended on this note.

let MaiaJointProb attitude action =
    match attitude with
    | Happy     -> happyActions |> List.assoc action
    | UnHappy   -> unHappyActions |> List.assoc action
    | Quiet     -> quietActions |> List.assoc action

This is just a two by two matrix. It simply represents which probability is associated to an (attitude, action) tuple. It is useful to think about it in these terms, because it makes easier to grasp the following function:

/// Conditional probability of a mental state, given a particular observed action
let MaiaLikelihood action = fun attitude -> MaiaJointProb attitude action

This is simply a row in the matrix. It answers the question: given that I observe a particular action, what is the probability that Maia has a certain attitude?. This is called “likelihood function” in statistics. Its general form is: given that a I observe an outcome, what is the probability that it is generated by a process with a particular parameter?

A related question is then: what if I observe a sequence of independent actions? What is the probability that the baby has a certain attitude then? This is answered by the following:

/// Multiple applications of the previous conditional probabilities for a series of actions (multiplied)
let MaiaLikelihoods actions =
    let composeLikelihoods previousLikelihood action  = fun attitude -> previousLikelihood attitude * MaiaLikelihood action attitude
    actions |> Seq.fold composeLikelihoods (fun attitude -> 1.)

It is a trivial extension of the previous function (really), once you know that to combine likelihoods you multiply them.

We now need to describe what our prior is. A prior is our preconceived notion about a particular parameter (in this case the baby’s attitude). You might be tempted to express that notion with a single value, but that would be inaccurate. You need to indicate how confident you are about it. In statistics you do that by choosing a distribution for your belief. This is one of the beauties of Bayesian statistics, everything is a probability distribution. In this case, we really don’t have any previous belief, so we pick the uniform distribution.

let MaiaUniformPrior attitude = 1. / 3.

Think of this as: you haven’t read any baby-attitude-specific study or received any external information about the likely attitude of Maia, so you cannot prefer one attitude over another.

We are almost done. Now we have to apply the Bayesian theorem and get the un-normalized posterior distribution. Forget about the un-normalized word. What is a posterior distribution? This is your output, your return value. It says: given my prior belief on the value of a parameter and given the outcomes that I observed, this is what I now believe the parameter to be. In this case it goes like: I had no opinion on Maia’s attitude to start with, but after I observed her behavior for a while, I now think she is Happy with probability X, UnHappy with probability Y and Quiet with probability Z.

/// Calculates the unNormalized posterior given prior and likelihood
let unNormalizedPosterior (prior:'a -> float) likelihood =
    fun theta -> prior theta * likelihood theta

We then need to normalize this thing (it doesn’t sum to one). The way to do it is to divide each probability by the sum of the probabilities for all the possible outcomes.

/// All possible values for the unobservable parameter (mental state)
let support = [Happy; UnHappy; Quiet]
/// Normalize the posterior (it integrates to 1.)
let posterior prior likelihood =
    let post = unNormalizedPosterior prior likelihood
    let sum = support |> List.sum_by (fun attitude -> post attitude)
    fun attitude -> post attitude / sum

We are done. Now we can now start modeling scenarios. Let’s say that you observe [Smile;Smile;Cry;Smile;LookSilly]. What could the underlying attitude of Maia be?

let maiaIsANormalBaby = posterior MaiaUniformPrior (MaiaLikelihoods [Smile;Smile;Cry;Smile;LookSilly])

We can then execute our little model:

maiaIsANormalBaby Happy
maiaIsANormalBaby UnHappy
maiaIsANormalBaby Quiet

And we get (0.5625, 0.0625, 0.375). So Maia is likely to be happy and unlikely to be unhappy. Let’s now model one extreme case:

/// Extreme cases
let maiaIsLikelyHappyDist = posterior MaiaUniformPrior (MaiaLikelihoods [Smile;Smile;Smile;Smile;Smile;Smile;Smile])
maiaIsLikelyHappyDist Happy
maiaIsLikelyHappyDist UnHappy
maiaIsLikelyHappyDist Quiet

And we get (0.944, 0.000431, 0.05). Now Maia is almost certainly Happy. Notice that I can confidently make this affirmation because my end result is exactly what I was looking for when I started my quest. Using classical statistics, that wouldn’t be the case.

A related question I might want to ask is: given the posterior distribution for attitude that I just found, what is the probability of observing a particular action? In other words, given the model that I built, what does it predict?

let posteriorPredictive jointProb posterior =
    let composeProbs previousProbs attitude = fun action -> previousProbs action + jointProb attitude action * posterior attitude
    support |> Seq.fold composeProbs (fun action -> 0.)
let nextLikelyUnknownActionDist = posteriorPredictive MaiaJointProb maiaIsLikelyHappyDist

I don’t have the strength right now to explain the mathematical underpinning of this. In words, this says: considering that Maia can have one of the possible three Attitudes with the probability calculated above, what is the probability that I observe a particular action? Notice that the signature for it is: (Action -> float), which is the compiler way to say it.

Now we can run the thing.

nextLikelyUnknownActionDist Smile
nextLikelyUnknownActionDist Cry
nextLikelyUnknownActionDist LookSilly

And we get (0.588, 0.2056, 0.2055). Why is that? We’ll talk about it in the next post.