Given the ability of stochastic sampling, we may do the following things: (i) approximate p(x), (ii) integrate p(x), and (iii) optimize p(x).

A difficulty for stochastic sampling is that the form of p(x) or f(x) is usually complex, however, we can only sample directly from some easy-formed distributions, like discrete distribution, uniform distribution or unit Gaussian.

So, usually we sample from an easy-to-sample distribution q(x) instead of from p(x) or f(x), then we either assign each sample point a "weight", or an "acceptance probability" which decides whether we accept this point as a sample from p(x) or f(x).

Some basic sampling methods include:

- Monte Carlo: Sample from q(x). No weight or acceptance probability is calculated. Just pretends that samples from q(x) is from p(x), and use the samples to compute the integration of p(x).
- Weighted Sampling: Sample from q(x). Assign each sample x a weight w=p(x)/q(x). Use the weighted samples to compute the integration of p(x).
- Rejection Sampling: Sample from q(x), where q(x) > p(x) in the whole domain of p(x). Accept each sample x by an acceptance probability a=p(x)/q(x).

- Metropolis Sampling: Sample xt from a transition proposal q(xt|xt-1). Accept the sample by the probability

The intuition is to accept xt with high probability if f(xt) > f(xt-1). This requires that q(a|b)=q(b|a). - Metropolis-Hasting Sampling: Change the acceptance probability to

This trick removes the requirement of q(a|b)=q(b|a). - Gibbs Sampling: In case that f(x) is very high-dimensional, it is expansive to compute f(x). So we avoid computing acceptance probability by assuming a=1, which means always accept the sample.
- Simulated Annealing: Change the acceptance probability to

where T(t) decreases from a large value to 1. The intuition is that initially, a large T increases the probability to accept an xt which does not increase the value of f(xt).

References:

- B. Walsh. Markov Chain Monte Carlo and Gibbs Sampling. Lecture notes. 2004.
- David J.C. MacKay. Information Theory, Inference, and Learning Algorithms. Chapter 29 and Chapter 30.