This document is a discussion and implementation of the second extended example from Horton (2013): I Hear, I Forget. I Do, I Understand: A Modified Moor-Method Mathematical Statistics Course. The American Statistician 67:4 219-228 (although the actual example is originally from Evans and Rosenthal (2004): Probability and Statistics: The Science of Uncertainty, New York: W. H. Freeman and Company). The purpose of this document is to illustrate some of the challenges that instructors might face when teaching topics that require computation to actually implement, but where computational skills may or may not be part of the learning objectives for the course. This is in slight contrast to the other tutorial covering an example from the Horton (2013) paper, which focusses more on how computation can be used to enhance student understanding of topics which would traditionally be taught in a purely analytical manner.
This document contains the fully worked example, discussion, and several implmentations of algorithms for sampling from a probability distribution. Both base R
and, where appropriate, more modern implementations are given side-by-side, and instructors can choose which to use depending on the learning outcomes for the course. In particular, the base R
versions are usually better for a course where computation is used but not necessarily a main learning objective; in contrast, the more advanced versions might be better if R
programming is a main learning objective.
The computational concepts here are appropriate for students with one introductory programming course, and ideally one applied course that uses R
. If they don’t have the latter, a brief introduction to R
programming may be required.
The learning objectives of the example covered here may include:
R
skills and use them fluently in the above; for example, understand functions well enough to be able to use integrate
without difficultyggplot
, and for this to be the default choice for plottingThe question for students is as follows (this is paraphrased but otherwise taken directly from Horton (2013)):
The first part could be done analytically (although it’s difficult and messy); we will show how this part of the question can be solved computationally. The second part literally asks students to write a program.
The CDF is the integral of the density, \[ F(y) = \int_{-\infty}^{y} f(s)ds \] Students might approach this problem by trying to brute-force evaluate this integral using their knowledge of calculus. This is difficult, but it’s a difficult calculus problem, not a difficult statistical problem. If calculus skills are what’s meant to be tested here then that’s one thing, but if the point of the problem is to have students develop practical skills in working with complicated probability densities, then working through this problem analytically can remove focus from this goal.
Whatever the learning objectives, computation can be used to aid understanding of this problem, and give students a way to check if their answers are reasonable. R
has a built in integrate
function that can be used to numerically evaluate improper integrals of reasonably well-behaved functions. Students can use this right away to find the normalizing constant \(c\):
# Define the function of interest
fx <- function(x) (1+abs(x))^3 * exp(-x^4)
# Integrate it over the entire domain
fx_int <- integrate(fx,-Inf,Inf)$value
fx_int
## [1] 6.809611
# The normalizing constant is the reciprocal of this
norm_const <- 1 / fx_int
norm_const
## [1] 0.1468513
Students should then double-check that their normalized density integrates to 1:
fxc <- function(x) norm_const * fx(x)
integrate(fxc,-Inf,Inf)$value
## [1] 1
The code here looks simple, but students with little knowledge of R
can easily get trapped in trying to write it. In particular, it uses some features and concepts that might be unfamiliar to students with little programming background:
function
, which is itself a function, and you give them an expression that takes one or more variables and returns a result. This is simple enough, but students will find their first functions hard to debug, in particular because the reference semantics that R
uses (pass-by-value, but with lazy evaluation) are somewhat unique, and make it hard for first time users to read the stack trace and figure out what went wrongintegrate
. integrate
takes an argument which is a function, and applies an algorithm that uses that function, returning a value (actually, returning a list, from which you have to extract that value- another concept to explain)R
uses environments (analagous to namespaces in langugages like python and C++) to store and search for data. We used this here in the definition of fxc
: the variable norm_const
and the function fx
are stored in the global environment; the function fxc
will first look for these values in its execution environment, and then not finding them, will search in its calling environment (the global environment), where it will find them. It then passes its own argument x
to fx
, evaluates that, multiplies the result by norm_const
, and returns this value. An R
programmer might think to write the code this way, but then to explain to students what it’s doing and why it works requires the introduction of several new concepts.An alternative to introducing some of these new concepts is to, well, not introduce them, and write the code in a different way:
# Print out the value of norm_const
norm_const
## [1] 0.1468513
# Copy it and fx directly
fxc2 <- function(x) 0.1468513 * (1+abs(x))^3 * exp(-x^4)
integrate(fxc2,-Inf,Inf)$value
## [1] 1
The central tradeoff for the instructor seems to be time spent teaching programming vs. quality of the resulting program. The second implementation of fxc
gives exactly the same result, but requires copying and pasting of previously used code, which is tedious and introduces further possibility of human error. However one might argue that blindly pulling variables from the global environment also introduces the possibility of error and should be considered bad practice! Every instructor will probably have different opinions about technical points like this; the point is, these are difficulties that should be thought of ahead of showing these examples to students, in order to ensure that
the example helps achieve your desired learning objectives and avoiding falling too far down a rabbit hole.
Moving on, we can also write an R
function to evaluate the CDF. Whether this is used as the actual answer to the problem, or to aid students’ understanding and let them check their answer numerically, it is a useful exercise.
fx_cdf <- function(x) integrate(fxc,-Inf,x)$value
# Test it out
fx_cdf(-1) # Arbitrary
## [1] 0.08834757
fx_cdf(0) # Should equal fx_cdf(Inf) / 2, as density is symmetric
## [1] 0.5
fx_cdf(1) # Arbitrary, should equal 1 - fx_cdf(-1) again by symmetry
## [1] 0.9116524
fx_cdf(Inf) # Should equal 1
## [1] 1
This is a good time to mention to students strategies for checking their work. We checked 2 properties of the CDF that we knew from its definition:
We probably want to check two more properties of \(F(x)\): that it is non-decreasing in \(x\), and that its shape actually matches that of \(f(x)\). Both of these can be accomplished using a plot. If students aren’t familiar with ggplot
, they would have to be introduced at this point, and indeed this is a good example of using ggplot
to plot functions:
# Create a dataframe that holds the x-axis range that we want to plot
# Pass this to ggplot, and add a stat_function layer. ggplot does the rest
pdf_plt <- data_frame(x = c(-2,2)) %>%
ggplot(aes(x = x)) +
theme_light() +
stat_function(fun = fxc,colour = "purple") +
labs(title = "PDF",
x = "y",
y = "f(y)") +
scale_x_continuous(breaks = seq(-2,2,by=0.5))
# Need a vectorized version of fx_cdf
fx_cdf_v <- function(x) {
out <- numeric(length(x))
for(i in 1:length(x)) {
out[i] <- fx_cdf(x[i])
}
out
}
cdf_plt <- data_frame(x = c(-2,2)) %>%
ggplot(aes(x = x)) +
theme_light() +
stat_function(fun = fx_cdf_v,colour = "purple") +
labs(title = "CDF",
x = "y",
y = "F(y)") +
scale_x_continuous(breaks = seq(-2,2,by=0.5))
cowplot::plot_grid(pdf_plt,cdf_plt,nrow = 1)
We can see from these plots that the CDF follows the shape of the PDF, increasing sharply where the PDF is high and slowly where the PDF is low.
Even just plotting these functions uncovered another important concept in R
programming, namely that of vectorization. ggplot
requires functions used internally to accept and return vector arguments and values. If students aren’t familiar with this yet, the instructor can dig a bit deeper into what happened here:
fxc
function uses operations that all are already vectorized, meaning they take a vector as an argument and return the operation evaluated on each element of that vector. The operations +
, *
, abs
, exp
and ^
all work this way, among many othersfx_cdf
function passes its argument to the upper
named argument of the integrate
function, which accepts only a single scalar value, not a vector. As such, the function fx_cdf
can’t automatically accept and return vectors. We can fix this somewhat easily as above, by wrapping the function in a for
loop.If computation itself is a learning objective of the course, this is a great place to introduce the concept of a higher-order function, one which takes another function as an argument, and possibly returns a function as its return value. We can write a higher order function that will vectorize its input as follows:
vectorize <- function(f) {
# f is a function that accepts scalar arguments and returns scalar values
# vectorize(f) will return a function, which accepts vector arguments and returns vector values
function(x) {
out <- numeric(length(x))
for (i in 1:length(x)) {
out[i] <- f(x[i])
}
out
}
}
fx_cdf_v2 <- vectorize(fx_cdf)
fx_cdf_v2(c(-1,0,1))
## [1] 0.08834757 0.50000000 0.91165243
There is a lot going on up there; this is a useful exercise only, again, if R
programming is part of the learning objectives for the course.
The second part of the problem is to write a program that will produce a sample from the density \(f(y)\). We’ll go over the practical aspects of doing this three ways: Inverse Transform Sampling, Importance Sampling, and MCMC. We won’t do the theory or make any claims about which one is better suited to this problem; we’ll just focus on potential challenges students might face trying to implement each.
If \(X \sim F\) with CDF \(F(x)\), and \(U \sim Unif(0,1)\), then \(F^{-1}(U) \overset{d}{=} X\). This can be used to generate a single point from \(F\) according to the following algorithm:
This can be repeated using many independent random draws from \(U \sim Unif(0,1)\) to get a random sample from \(F\). The implementation is very straightforward… except we don’t know \(F(x)\) here, so we certainly don’t know \(F^{-1}(x)\).
We can proceed using R
by noting that if \(y = F^{-1}(x)\) then \(F(y) - x = 0\). R
has a built-in function uniroot
for finding roots of functions, i.e. solving equations of the form \(g(y) = 0\) for \(y\). We can use this to find \(y = F^{-1}(x)\) for any \(x\) as follows:
cdf_inv <- function(x) {
# Function returns F^-1(x)
# Define the function of y to pass to uniroot. y is now the variable, and x is considered a fixed constant
g <- function(y) fx_cdf(y) - x
# Pass it to uniroot; search for the root on a large interval (this practicality can be further
# discussed with students)
uniroot(g,lower = -10,upper = 10)$root
}
# Test it out
cdf_inv(fx_cdf(-1))
## [1] -1.000004
cdf_inv(fx_cdf(1))
## [1] 1.000008
cdf_inv(fx_cdf(-.5))
## [1] -0.4999956
cdf_inv(fx_cdf(.5))
## [1] 0.499996
cdf_inv(fx_cdf(-2))
## [1] -2.000024
cdf_inv(fx_cdf(2)) # Yikes!
## [1] 5.003514
cdf_inv(fx_cdf(1.5))
## [1] 1.500017
cdf_inv(fx_cdf(1.6))
## [1] 1.596018
cdf_inv(fx_cdf(1.8))
## [1] 1.829467
We see that our method works okay on points around the range of the density that contains most of the mass (about -2 to 2, although by 2 it gets pretty bad); out in the tails it can be very inaccurate. It remains to be seen whether this would affect the sampling. This is a good time to have students reflect back on how cool what we’ve done is; we couldn’t write down expressions for the CDF or its inverse, but we can still compute with them using R
. It’s also a good time to point out the limitations; the naive numerical integration and root-finding methods we used are good, but aren’t always accurate, and understanding that this is something that needs to be checked in applications is important.
Now we have to write an algorithm to sample from the original density. This can be accomplished with a simple loop:
set.seed(34879)
B <- 10000 # Size of sample to generate
thesample <- numeric(B)
for (b in 1:B) {
# Generate a uniform random deviate
u <- runif(1)
# Compute the inverse CDF at it
thesample[b] <- cdf_inv(u)
# Print some progress. Don't print it every iteration- printing stuff to the screen
# is actually much more computationally intensive than actually doing an iteration of
# this algorithm!
if (b %% 500 == 0) {
cat("Produced",b,"samples\n")
}
}
## Produced 500 samples
## Produced 1000 samples
## Produced 1500 samples
## Produced 2000 samples
## Produced 2500 samples
## Produced 3000 samples
## Produced 3500 samples
## Produced 4000 samples
## Produced 4500 samples
## Produced 5000 samples
## Produced 5500 samples
## Produced 6000 samples
## Produced 6500 samples
## Produced 7000 samples
## Produced 7500 samples
## Produced 8000 samples
## Produced 8500 samples
## Produced 9000 samples
## Produced 9500 samples
## Produced 10000 samples
# Basic summary statistics; by symmetry we know the distribution we're sampling from has expected value 0,
# so let's check if we got that part right:
mean(thesample) # pretty close
## [1] 0.01076533
sd(thesample)
## [1] 0.7599331
range(thesample)
## [1] -1.650879 4.128738
# ...and we make a plot to see for sure. We'll cut it off at x = 2, even though our sample goes past that
data_frame(x = thesample) %>%
ggplot(aes(x = x)) +
theme_classic() +
geom_histogram(aes(y = ..density..),bins = 100,colour = "black",fill = "orange") +
stat_function(fun = fxc,colour = "purple") +
coord_cartesian(xlim = c(-2,2)) +
labs(title = "Sample from our density",
subtitle = "Inverse Transform Sampling",
x = "y",
y = "f(y)")
To summarize, in implementing this simple sampling algorithm covered the following computational concepts:
uniroot
and a discussion about accuracy of numerical computationsggplot
This sampling can also be introduced using more advanced programming constructs, namely the map
and reduce
functions from the purrr
package (part of the tidyverse
). The above loop is simple enough that this probably isn’t necessary, but it is potentially a good example to help illustrate the use of map/reduce and working with data in lists in R
. These concepts certainly aren’t new, and are implemented in base R
through apply
, lapply
, sapply
, vapply
… the implementations in the purrr
package provide an API that is consistent in its inputs/outputs, and much simpler to use and learn.
set.seed(34879)
B <- 10000
thesample <- runif(B) %>% # Generate the B unif(0,1) realizations in one line
map(~c(x = cdf_inv(.x))) %>% # Apply the cdf_inv function to each element of the above vector, returning a list of named vectors
reduce(bind_rows) # Concatenate the named vectors into a data_frame
thesample
## # A tibble: 10,000 x 1
## x
## <dbl>
## 1 0.9558366
## 2 1.1489711
## 3 -0.9452934
## 4 -0.2579777
## 5 0.7454143
## 6 0.8171312
## 7 0.8668995
## 8 0.3511792
## 9 -0.9762473
## 10 -0.8888388
## # ... with 9,990 more rows
Doing it this way introduces map/reduce in R
: take data, apply a function to each datapoint individually (whether they be elements in a vector, rows in a data_frame, or arbitrary items in a list), then collapse the results together somehow. Specifically,
B
independent realizations of a Unif(0,1) random variablepurrr::map
function, which takes a vector or list as an argument and a function of one variable as arguments, and applies that function to each element of the input. This is accomplished in base R
using the apply
family. Two advantages of using purrr
are a consistent set of functions that are compatible with each other, take the same inputs in the same order as each other and return predictible output, and the compact formula ~
syntax for defining anonymous functions. This makes this type of functional programming easier for students to learn, as there is much less time focussing on indicental details (which apply
function do I use? What are the arguments? What is it going to return? Will it always return a list or sometimes a vector? And so on).reduce
the list of named vectors returned by map
into a single data_frame
using dplyr::bind_rows
.We’ll implement another sampling algorithm, Rejection Sampling, which is still simple but slightly less so. The Rejection Sampling algorithm is basically:
This algorithm is straightforward to state. To implement it requires considering a few details:
With those decisions made we can implement our algorithm:
set.seed(29432)
xmin <- -3
xmax <- 3
ymax <- 0.7
N <- 10000 # Number of samples we want, NOT number of iterations that will be performed
numiter <- 0 # A counter to count the number of iterations performed
acceptancerate <- numeric() # Track the acceptance rate over iterations
thesample <- numeric()
numaccepted <- 0 # Counter of number accepted
while(numaccepted < N) {
u <- runif(1,xmin,xmax)
if (runif(1,0,ymax) < fxc(u)) {
numaccepted <- numaccepted + 1
thesample <- c(thesample,u)
}
numiter <- numiter + 1
acceptancerate <- c(acceptancerate,numaccepted / numiter)
if (numiter %% 5000 == 0) {
cat("Iterations:",numiter,"Accepted:",numaccepted,"Acceptance Rate:",scales::percent(numaccepted / numiter),"\n")
}
}
## Iterations: 5000 Accepted: 1232 Acceptance Rate: 24.6%
## Iterations: 10000 Accepted: 2408 Acceptance Rate: 24.1%
## Iterations: 15000 Accepted: 3603 Acceptance Rate: 24%
## Iterations: 20000 Accepted: 4841 Acceptance Rate: 24.2%
## Iterations: 25000 Accepted: 5993 Acceptance Rate: 24%
## Iterations: 30000 Accepted: 7188 Acceptance Rate: 24%
## Iterations: 35000 Accepted: 8424 Acceptance Rate: 24.1%
## Iterations: 40000 Accepted: 9580 Acceptance Rate: 23.9%
cat("Final Iterations:",numiter,"\nFinal Accepted:",numaccepted,"\nFinal Acceptance Rate:",scales::percent(numaccepted / numiter),"\n")
## Final Iterations: 41787
## Final Accepted: 10000
## Final Acceptance Rate: 23.9%
# Summary statistics and plot
mean(thesample) # pretty close
## [1] 0.001582394
sd(thesample)
## [1] 0.7591176
range(thesample)
## [1] -1.667341 1.696078
# ...and we make a plot to see for sure. We'll cut it off at x = 2, even though our sample goes past that
data_frame(x = thesample) %>%
ggplot(aes(x = x)) +
theme_classic() +
geom_histogram(aes(y = ..density..),bins = 100,colour = "black",fill = "orange") +
stat_function(fun = fxc,colour = "purple") +
coord_cartesian(xlim = c(-2,2)) +
labs(title = "Sample from our density",
subtitle = "Rejection Sampling",
x = "y",
y = "f(y)")
# Also plot the acceptance rate over iterations
data_frame(y = acceptancerate,
x = 1:length(acceptancerate)) %>%
ggplot(aes(x = x,y = y,group = 1)) +
theme_classic() +
geom_line() +
scale_x_continuous(labels = scales::comma_format()) +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Acceptance Rate by Iteration",
subtitle = "Rejection Sampling Algorithm",
x = "Iteration",
y = "Acceptance Rate")
Rejection sampling introduces some similar computational concepts as the previous example on inverse transform sampling (loops, etc). There are some new concepts introduced:
We will implement one final sampling algorithm, namely Metropolis-Hastings MCMC. What we need to get the computer to do is:
There are several details to be worked out:
These represent a mix of theoretical and practical considerations. For our application here we’ll make some choices, which you of course are free to change:
Let’s see how we might implement this:
set.seed(24389)
niter <- 1000
burnin <- 100
thinning <- 9
xstart <- 0
iterations <- numeric(niter)
curpoint <- xstart
for (i in 1:niter) {
# Propose a new point
newpoint <- rnorm(1,curpoint,.25)
# Decide whether to accept
if (runif(1) < (fx(newpoint) * dnorm(curpoint,newpoint,.25)) / (fx(curpoint) * dnorm(newpoint,curpoint,.25))) {
iterations[i] <- newpoint
} else {
iterations[i] <- curpoint
}
curpoint <- iterations[i]
}
# Burn in and thin
iterations_out <- iterations[(burnin+1):niter][1:(niter-burnin) %% thinning == 0]
# Check out the trace plot, and histogram of samples
traceplot <- data_frame(y = iterations,
x = 1:length(iterations)) %>%
ggplot(aes(x = x,y = y,group = 1)) +
theme_classic() +
geom_path() +
labs(title = "MCMC Trace Plot",
x = "Iteration",
y = "Accepted Value") +
coord_flip()
histogramofsample <- data_frame(x = iterations_out) %>%
ggplot(aes(x = x)) +
theme_classic() +
geom_histogram(aes(y = ..density..),bins = 30,colour = "black",fill = "lightblue") +
stat_function(fun = fxc,colour = "purple") +
labs(title = "Histogram of MCMC Samples",
subtitle = "Purple curve: desired density",
x = "Sampled Value",
y = "Density")
cowplot::plot_grid(traceplot,histogramofsample,nrow = 1)
This implementation required some computational concepts:
for
loopsggplot
Looking at the results, we see some immediate problems that can be discussed in a deeper treatment of MCMC. From a computational perspective, the natural thing to do is to change the inputs to the algorithm. We can make a principled choice here, based on our knowledge of the behaviour of the algorithm, to try a bimodal proposal density and lower the proposal standard deviation- but nonetheless, this requires rerunning the code with changed parameters. This introduces a potential integrity issue into our analysis: if we just start changing things willy-nilly, we lose track of which experiments we performed (you could keep track in a notebook, but most students probably won’t). If we mitigate this by just copying and pasting the whole algorithm for every new set of inputs, our code becomes cluttered and we increase the chances of human error. Best practice dictates that we write a function.
This is another layer of computational complexity to what is supposed to be a practical problem. Discussion of why we’re doing this, as above, should help the students appreciate what problem we’re solving by adding this extra layer of work to our procedure.
Let’s write the function. First we’ll write a simple pair of proposal distribution functions, for computing the density of and sampling from an equal mixture of two normals, with low standard deviation. This should help our algorithm visit both modes of the distribution, without having to jump around the whole sample space, increasing the chances of visiting low-density areas. We’ll then implement a function to perform the above MCMC algorithm.
# Functions for computing density and sampling from a mixture of 2 normals,
# centered at mu and -mu
m2n_density <- function(x,mu,sig) {
.5 * dnorm(x,mu,sig) + .5 * dnorm(x,-mu,sig)
}
m2n_sample <- function(mu,sig) {
# mu is a vector of length 2
if (rbinom(1,1,.5) == 0) {
rnorm(1,mu,sig)
} else {
rnorm(1,-mu,sig)
}
}
mcmc_sample1 <- function(niter,propsd,burnin,thinning,xstart = 0) {
iterations <- numeric(niter)
curpoint <- xstart
for (i in 1:niter) {
# Propose a new point
newpoint <- m2n_sample(curpoint,propsd)
# Compute the acceptance probability
accprob <- (fx(newpoint) * m2n_density(curpoint,newpoint,propsd)) / (fx(curpoint) * m2n_density(newpoint,curpoint,propsd))
# Decide whether to accept
if (runif(1) < accprob) {
iterations[i] <- newpoint
} else {
iterations[i] <- curpoint
}
curpoint <- iterations[i]
}
# Burn in and thin
iterations_out <- iterations[(burnin+1):niter][1:(niter-burnin) %% thinning == 0]
# Check out the trace plot, and histogram of samples
traceplot <- data_frame(y = iterations,
x = 1:length(iterations)) %>%
ggplot(aes(x = x,y = y,group = 1)) +
theme_classic() +
geom_path() +
labs(title = "MCMC Trace Plot",
x = "Iteration",
y = "Accepted Value") +
scale_x_continuous(labels = scales::comma_format()) +
coord_flip()
histogramofsample <- data_frame(x = iterations_out) %>%
ggplot(aes(x = x)) +
theme_classic() +
geom_histogram(aes(y = ..density..),bins = 30,colour = "black",fill = "lightblue") +
stat_function(fun = fxc,colour = "purple") +
labs(title = "Histogram of MCMC Samples",
subtitle = "Purple curve: desired density",
x = "Sampled Value",
y = "Density")
# Return the sample and the plots
list(
sample = iterations_out,
traceplot = traceplot,
histogram = histogramofsample
)
}
# Try it out
testsample <- mcmc_sample1(niter = 1000,propsd = .25,burnin = 100,thinning = 9)
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)
Not bad! The major advantage of keeping the code in a function is now if we want to assess the effect of changing an input, we can just call the function again. Our notebook contains a record of everything we’ve done, and the chance of human error remains the same as if we only called the function once.
testsample <- mcmc_sample1(niter = 1000,propsd = .1,burnin = 100,thinning = 9)
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)
testsample <- mcmc_sample1(niter = 10000,propsd = .1,burnin = 1000,thinning = 9)
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)
If computation is a focus of the course itself, we can turn to the question of how to improve the above code. One improvement might be to pass the proposal distribution itself as an argument to the sampling function, rather than just its variance. This allows rapid testing of different proposals. This is a good point to introduce different function syntax in R
, as provided by the rlang::as_function
function. This lets us pass a function as an argument using
function(x) x + 1
"rnorm"
rnorm
~ .x + 1
These variants are becoming more and more popular in R
programming. Adding this to our function can be done as follows:
mcmc_sample2 <- function(niter,proposal_sample,proposal_density,burnin,thinning,xstart = 0) {
# Convert proposal to functions
proposal_sample <- rlang::as_function(proposal_sample)
proposal_density <- rlang::as_function(proposal_density)
# The rest of the code remains mostly unchanged
iterations <- numeric(niter)
curpoint <- xstart
for (i in 1:niter) {
# Propose a new point
newpoint <- proposal_sample(curpoint)
# Compute the acceptance probability
accprob <- (fx(newpoint) * proposal_density(curpoint,newpoint)) / (fx(curpoint) * proposal_density(newpoint,curpoint,propsd))
# Decide whether to accept
if (runif(1) < accprob) {
iterations[i] <- newpoint
} else {
iterations[i] <- curpoint
}
curpoint <- iterations[i]
}
# Burn in and thin
iterations_out <- iterations[(burnin+1):niter][1:(niter-burnin) %% thinning == 0]
# Check out the trace plot, and histogram of samples
traceplot <- data_frame(y = iterations,
x = 1:length(iterations)) %>%
ggplot(aes(x = x,y = y,group = 1)) +
theme_classic() +
geom_path() +
labs(title = "MCMC Trace Plot",
x = "Iteration",
y = "Accepted Value") +
scale_x_continuous(labels = scales::comma_format()) +
coord_flip()
histogramofsample <- data_frame(x = iterations_out) %>%
ggplot(aes(x = x)) +
theme_classic() +
geom_histogram(aes(y = ..density..),bins = 30,colour = "black",fill = "lightblue") +
stat_function(fun = fxc,colour = "purple") +
labs(title = "Histogram of MCMC Samples",
subtitle = "Purple curve: desired density",
x = "Sampled Value",
y = "Density")
# Return the sample and the plots
list(
sample = iterations_out,
traceplot = traceplot,
histogram = histogramofsample
)
}
# Try it out with the same proposal as before
# Use the formula syntax to pass fixed arguments to the passed functions
testsample <- mcmc_sample2(niter = 1000,
proposal_sample = ~m2n_sample(mu = .x,sig = .1),
proposal_density = ~m2n_density(x = .x,mu = .y,sig = .1),
burnin = 100,
thinning = 9)
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)
testsample <- mcmc_sample2(niter = 10000,
proposal_sample = ~m2n_sample(mu = .x,sig = .1),
proposal_density = ~m2n_density(x = .x,mu = .y,sig = .1),
burnin = 1000,
thinning = 9)
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)
# Try it with a different proposal. Now you can do anything you want, like a noncentral t:
testsample <- mcmc_sample2(niter = 10000,
proposal_sample = ~rt(n=1,df=5,ncp=.x),
proposal_density = ~dt(x = .x,df=5,ncp = .y),
burnin = 1000,
thinning = 9)
## Warning in dt(x = .x, df = 5, ncp = .y): full precision may not have been
## achieved in 'pnt{final}'
## Warning in dt(x = .x, df = 5, ncp = .y): full precision may not have been
## achieved in 'pnt{final}'
## Warning in dt(x = .x, df = 5, ncp = .y): full precision may not have been
## achieved in 'pnt{final}'
## Warning in dt(x = .x, df = 5, ncp = .y): full precision may not have been
## achieved in 'pnt{final}'
cowplot::plot_grid(testsample$traceplot,testsample$histogram,nrow = 1)