Bayesian inference in F# - Part I - Background
Luca Bolognese -Other posts:
My interest in Bayesian inference comes from my dissatisfaction with ‘classical’ statistics. Whenever I want to know something, for example the probability that an unknown parameter is between two values, ‘classical’ statistics seems to answer a different and more convoluted question.
Try asking someone what the 95% confidence interval for X is (x1, x2) means. Very likely he will tell you that it means that there is a 95% probability that X lies between x1 and x2. That is not the case in classical statistics. It is the case in Bayesian statistics. Also all the funny business of defining a Null hypothesis for the sake of proving its falseness always made my head spin. You don’t need any of that in Bayesian statistics. More recently, my discovery that statistical significance is an harmful concept, instead of the bedrock of knowledge I always thought it to be, shook my confidence in ‘classical’ statistics even more.
Admittedly, I’m not that smart. If I have an hard time getting an intuitive understanding of something, it tends to go away from my mind after a couple of days I’ve learned it. This happens all the time with ‘classical’ statistics. I feel like I have learned the thing ten times, because I continuously forget it. This doesn’t happen with Bayesian statistics. It just makes intuitive sense.
At this point you might be wandering what ‘classical’ statistics is. I use the term classical, but I really shouldn’t. Classical statistics is normally just called ‘statistics’ and it is all you learn if you pick up whatever book on the topic (for example the otherwise excellent Introduction to the Practice of Statistics). Bayesian statistics is just a footnote in such books. This is a shame.
Bayesian statistics provides a much clearer and elegant framework for understanding the process of inferring knowledge from data. The underlying question that it answers is: If I hold an opinion about something and I receive additional data on it, how should I rationally change my opinion?. This question of how to update your knowledge is at the very foundation of human learning and progress in general (for example the scientific method is based on it). We better be sure that the way we answer it is sound.
You might wander how it is possible to go against something that is so widely accepted and taught everywhere as ‘classical’ statistics is. Well, very many things that most people believe are wrong. I always like to cite old Ben on this: The fact that other people agree or disagree with you makes you neither right nor wrong. You will be right if your facts and your reasoning are correct… This little rule always served me well.
In this series of posts I will give examples of Bayesian statistics in F#. I am not a statistician, which makes me part of the very dangerous category of ’people who are not statisticians but talk about statistics. To try to mitigate the problem I enlisted the help of Ralf Herbrich, who is a statistician and can catch my most blatant errors. Obviously I’ll manage to hide my errors so cleverly that not even Ralf would spot them. In which case the fault is just mine.
In the next post we’ll look at some F# code to model the Bayesian inference process.
Tags
- FSHARP
- STATISTICS
8 Comments
Comments
Bayesian inference in F# - Par
2008-11-07T12:04:04ZPingBack from http://www.tmao.info/bayesian-inference-in-f-part-i-background/
barrkel
2008-11-07T15:09:46ZCan you recommend some reading for Bayesian statistics?
lucabol
2008-11-07T15:30:44ZOne of the reasons it is not that popular is the lack of good introductory books on it. The one I use is “Bayesian Data Analysis (2nd edition)” by Gelman, Carlin, Stan and Rubin.
It is a good books with plenty of practical examples, but it is not exactly easy. In this series of posts I’ll try to be easy. We’ll see.
configurator
2008-11-09T08:25:07ZSo what does “the 95% confidence interval for X is (x1, x2)” mean in ‘classical’ statistics?
lucabol
2008-11-09T09:16:32ZA confidence interval is a range in which the observed sample parameter is supposed to fall if you repeated the experiment many (how many?) times. They are a way to describe the sample distribution, not the distribution of the underlying parameter you set out to discover. As I said in the blog post, things get pretty convoluted in classical statistic.
From Wikipedia (http://en.wikipedia.org/wiki/Confidence_interval) : “For a given proportion p (where p is the confidence level), a confidence interval for a population parameter is an interval that is calculated from a random sample of an underlying population such that, if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question …
Confidence intervals play a similar role in frequentist statistics to the credibility interval in Bayesian statistics. However, confidence intervals and credibility intervals are not only mathematically different; they have radically different interpretations.
A Bayesian interval estimate is called a credible interval. Using much of the same notation as above, the definition of a credible interval for the unknown true value of θ is, for a given α,
“Pr of θ given X=x”(u(x) < θ < v(x))) = 1 - α
Here Θ is used to emphasize that the unknown value of θ is being treated as a random variable. The definitions of the two types of intervals may be compared as follows.
The definition of a confidence interval involves probabilities calculated from the distribution of X for given (θ,φ) (or conditional on these values) and the condition needs to hold for all values of (θ,φ).
The definition of a credible interval involves probabilities calculated from the distribution of Θ conditional on the observed values of X=x and marginalised (or averaged) over the values of Φ, where this last quantity is the random variable corresponding to the uncertainty about the nuisance parameters in φ.
Note that the treatment of the nuisance parameters above is often omitted from discussions comparing confidence and credible intervals but it is markedly different between the two cases.
In some simple standard cases, the intervals produced as confidence and credible intervals from the same data set can be identical. They are always very different if moderate or strong prior information is included in the Bayesian analysis.
Meaning and interpretation
For users of frequentist methods, various interpretations of a confidence interval can be given.
- The confidence interval can be expressed in terms of samples (or repeated samples): “Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time.” [5] Note that this need not be repeated sampling from the same population, just repeated sampling [6].
- The explanation of a confidence interval can amount to something like: “The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level”[7]. In fact, this relates to one particular way in which a confidence interval may be constructed.
- The probability associated with a confidence interval may also be considered from a pre-experiment point of view, in the same context in which arguments for the random allocation of treatments to study items are made. Here the experimenter sets out the way in which they intend to calculate a confidence interval and know, before they do the actual experiment, that the interval they will end up calculating has a certain chance of covering the true but unknown value. This is very similar to the “repeated sample” interpretation above, except that it avoids relying on considering hypothetical repeats of a sampling procedure that may not be repeatable in any meaningful sense.
In each of the above, the following applies. If the true value of the parameter lies outside the 90% confidence interval once it has been calculated, then an event has occurred which had a probability of 10% (or less) of happening by chance.
Users of Bayesian methods, if they produced an interval estimate, would by contrast want to say “My degree of belief that the parameter is in fact in this interval is 90%” [8]. See Credible interval. Disagreements about these issues are not disagreements about solutions to mathematical problems. Rather they are disagreements about the ways in which mathematics is to be applied.
”
Eber Irigoyen
2008-11-09T23:19:52Z“Admittedly, I’m not that smart.“
where does that leave us, mere mortals?
Luca Bolognese's WebLog
2009-01-19T11:48:04ZOther parts: Part I — Background Part II — A simple example — modeling Maia The previous post ended on
Luca Bolognese's WebLog
2009-01-19T11:49:37ZOther parts: Part I - Background Part IIb - Finding Maia underlying attitude Let’s start with a simple