Beta Distribution: an Intuitive Explanation

Biostatistics Tutorial Toolkit for Bayesian Methods

Intuitively explain the Beta Distribution and its applications.

Hai Nguyen
April 11, 2021

Motivation

Even though I had learned the beta distribution from UIC’s Bayesian methods course and tutored it, such as setting up it as the prior distribution in conjugate distribution context. But it was easy to forget because of its dried content and too abstract. Here I try to combine the rigid theory (UC coursework’s content) and intuitive thought. By that way, I was able to ‘permenently stamp’ the concept to my brain.

Definition

A continuous random variable \(X_B \sim Beta(\alpha, \beta)\) has Beta distribution if its probability density function (PDF) is

\[ f_{X_B} (x; \alpha, \beta) = \frac{1}{B(α,β)} x^{\alpha − 1} (1−x)^{\beta − 1}, \ \ \text{for} \ 0 < x < 1. \]

where \(B(\cdot)\) is the Beta function and shape parameters \(\alpha, \beta > 0\).

Intuitive interpretation

PDF Probability as a …
Binomial \(f(x) = {n \choose x} p^x (1-p)^{n-x}\) parameter
\(\rightarrow\) the function of \(x\)
Beta \(f(p) = \frac{1}{B(α,β)} p^{\alpha − 1} (1−p)^{\beta − 1}\) random variable
\(\rightarrow\) the function of \(p\)
The very flexible of Beta distribution

Beta function

The beta function is

\[ B(x,y) = \int_0^1 t^{x−1} (1−t)^{y−1} dt = \frac{\Gamma(x) \Gamma(y)}{\Gamma(x+y)}, \]

where \(\Gamma(\cdot)\) is the Gamma function.

Gamma function

The Gamma function \(\Gamma\) is an extension of the factorial function, with its argument shifted down by 1, to real and complex numbers.

For positive integer \(n\):

\[ \Gamma (n) = (n−1)! = 1 \times 2 \times 3 \times ... \times (n−1) \]

The gamma function is defined for all complex numbers except the non-positive integers by the integral:

\[ \Gamma (t) = \int_0^{\infty} x^{t-1} e^{-x} dx \]

Simplify the Beta function with the Gamma Function \(\Rightarrow\) we saw the PDF of Beta written in terms of the Gamma function. The Beta function is the ratio of the product of the Gamma function of each parameter divided by the Gamma function of the sum of the parameters (proof refered the further reading topic).

Main facts

\[ E[X_B] = \mu = \frac{\alpha}{\alpha + \beta}; \ \ V[X_B] = \sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} \]

The standard uniform distribution \(\text{Unif} \ (0,1)\) is a special case of the beta distribution \(Beta \ (1,1)\), when \(\alpha = \beta = 1\).

The mode is \(\omega = \frac{\alpha − 1}{\alpha + \beta − 2}\) for \(\alpha, \beta > 1\).
The concentration is \(\kappa = \alpha + \beta\).
Definitions of \(\mu, \omega\) and \(\kappa\) can be inverted:

\[ \alpha = \mu\kappa, \beta = (1 − \mu)\kappa \]

\[ \alpha = \omega(\kappa−2)+1, \beta = (1 − \omega)(\kappa−2)+1, \ \kappa > 2. \]

Parameter \(\kappa\) is a measure of number of observations needed to change our previous belief about \(\mu\).
If \(\kappa\) is small we need only a few new observations.

Example. Concentration \(\kappa = 8\) around \(\mu = 0.5\) corresponds to \(\alpha = \mu \kappa = 4\) and \(\beta = (1 − \mu) \kappa = 4\).

Parameterization in terms of mean value and standard deviation is:

\[ \alpha = \mu [\frac{\mu (1 - \mu)}{\sigma^2} - 1]; \ \ \beta = (1 - \mu)[\frac{\mu (1 - \mu)}{\sigma^2} - 1] \]

Standard deviation is typically smaller than standard deviation of uniform distribution on \([0,1]\), i.e. \(0.28867\).

Examples.

  1. For \(\mu = 0.5\), \(\sigma = 0.28867\) the shape parameters are \(\alpha = 1\), \(\beta = 1\).
  2. Find shape parameters of beta distribution with \(\mu = 0.5\), \(\sigma = 0.1\).

The standard uniform distribution \(Unif \ (0,1)\) is a special case of the beta distribution \(Beta \ (1,1)\), when \(\alpha = \beta = 1\).

In actions

p <- seq(0,1,by=0.2)

df <- data.frame(p)
ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=1, shape2=2), aes(colour = "alpha=1,beta=2")) + 
  stat_function(fun=dbeta, args=list(shape1=2, shape2=2), aes(colour = "alpha=2,beta=2")) +
  stat_function(fun=dbeta, args=list(shape1=4, shape2=2), aes(colour = "alpha=4,beta=2")) +
  stat_function(fun=dbeta, args=list(shape1=6, shape2=2), aes(colour = "alpha=6,beta=2")) +
  stat_function(fun=dbeta, args=list(shape1=8, shape2=2), aes(colour = "alpha=8,beta=2")) +
  scale_y_continuous(limits=c(0,3.6)) +
  scale_colour_manual("", values = c("palegreen", "orange", "olivedrab", "blue", "black")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=2, shape2=1), aes(colour = "alpha=2,beta=1")) + 
  stat_function(fun=dbeta, args=list(shape1=2, shape2=2), aes(colour = "alpha=2,beta=2")) +
  stat_function(fun=dbeta, args=list(shape1=2, shape2=5), aes(colour = "alpha=2,beta=5")) +
  stat_function(fun=dbeta, args=list(shape1=2, shape2=6), aes(colour = "alpha=2,beta=6")) +
  stat_function(fun=dbeta, args=list(shape1=2, shape2=8), aes(colour = "alpha=2,beta=8")) +
  scale_y_continuous(limits=c(0,3.6)) +
  scale_colour_manual("", values = c("palegreen", "orange", "olivedrab", "blue", "black")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=1, shape2=1), aes(colour = "alpha=1,beta=1")) +
  scale_y_continuous(limits=c(0,3.6)) +
  scale_colour_manual("", values = c("green")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=0.5, shape2=0.5), aes(colour = "alpha=0.5,beta=0.5")) + 
  stat_function(fun=dbeta, args=list(shape1=1, shape2=1), aes(colour = "alpha=1,beta=1")) +
  stat_function(fun=dbeta, args=list(shape1=2, shape2=2), aes(colour = "alpha=2,beta=2")) +
  stat_function(fun=dbeta, args=list(shape1=4, shape2=4), aes(colour = "alpha=4,beta=4")) +
  stat_function(fun=dbeta, args=list(shape1=6, shape2=6), aes(colour = "alpha=6,beta=6")) +
  scale_y_continuous(limits=c(0,3.6)) +
  scale_colour_manual("", values = c("palegreen", "orange", "olivedrab", "blue", "black")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=400, shape2=80), aes(colour = "alpha=400,beta=80")) + 
  stat_function(fun=dbeta, args=list(shape1=40, shape2=8), aes(colour = "alpha=40,beta=8")) +
  stat_function(fun=dbeta, args=list(shape1=30, shape2=70), aes(colour = "alpha=30,beta=70")) +
  stat_function(fun=dbeta, args=list(shape1=3, shape2=7), aes(colour = "alpha=3,beta=7")) +
  scale_y_continuous(limits=c(0,25)) +
  scale_colour_manual("", values = c("blue", "green", "orange", "black")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=1, shape2=1), aes(colour = "alpha=1,beta=1")) +
  stat_function(fun=dbeta, args=list(shape1=3, shape2=3), aes(colour = "alpha=3,beta=3")) +
  stat_function(fun=dbinom, args=list(size=1, prob=0.5), aes(colour = "Bernoulli w/ prob=0.5")) + # bernoulli
  scale_y_continuous(limits=c(0,3.6)) +
  scale_colour_manual("", values = c("red","green","black")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data=df, aes(x=p))+
  stat_function(fun=dbeta, args=list(shape1=9, shape2=3), aes(colour = "alpha=9,beta=3")) +
  scale_y_continuous(limits=c(0,3.4)) +
  scale_colour_manual("", values = c("blue")) + 
  ylab("Density") +
  ggtitle("PDF of Beta Distribution") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5))

From the actions we notify that:

Plots in shiny

Planning to build an shiny app to plot beta distribution on the specification of shape parameter (“still being in the process”).

Further reading

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hai-mn/hai-mn.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Nguyen (2021, April 11). HaiBiostat: Beta Distribution: an Intuitive Explanation. Retrieved from https://hai-mn.github.io/posts/2021-04-11-beta-distribution-in-intuitive-explanation/

BibTeX citation

@misc{nguyen2021beta,
  author = {Nguyen, Hai},
  title = {HaiBiostat: Beta Distribution: an Intuitive Explanation},
  url = {https://hai-mn.github.io/posts/2021-04-11-beta-distribution-in-intuitive-explanation/},
  year = {2021}
}