Draft:Bayesian model comparison

Bayesian Model Comparison
	File:BayesTheorem.pngBayes' Theorem Bayes' Theorem, a fundamental concept in Bayesian statistics, is used to update the probability of a hypothesis as more evidence or information becomes available.
Key Concepts
Bayes Factors	A ratio of marginal likelihoods that quantifies the evidence in favor of one model compared to another.
Marginal Likelihood (Evidence)	The probability of the observed data given a model, integrated over all possible parameter values.
Posterior Model Probability	The probability that a model is true given the observed data and prior information.
Information Criteria	Approximations to Bayes Factors, e.g., BIC, AIC, DIC.
Predictive Accuracy	How well a model predicts new or unseen data, often assessed through cross-validation or WAIC.
Model Averaging	Combining predictions from multiple models, weighted by their posterior probabilities or predictive performance.
Methods
Separate Estimation	Comparing models based on posterior predictive distributions, Bayes factors, and information criteria.
Comparative Estimation	Assessing the 'distance' between posterior distributions using measures like Kullback-Leibler divergence.
Simultaneous Estimation	Exploring the model space using techniques like reversible jump MCMC (RJMCMC) or birth-and-death MCMC (BDMCMC).

Draft article not currently submitted for review.

This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window.

To be accepted, a draft should:

Show the subject qualifies for a Wikipedia article by using multiple sources that meet four criteria. The sources should be (1) reliable (2) secondary (3) independent of the subject (4) talk about the subject in some depth. For some topics, there are alternative criteria.
Be written from a neutral point of view
Respect copyright and do not plagiarize. Do not copy-paste.

It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Last edited by Bearcat (talk | contribs) 38 days ago. (Update)

Submit the draft for review!

Bayesian model comparison means comparing how well statistical models fit to data by Bayesian statistics. It is used for diverse tasks like variable selection in regression, determining the number of components in a mixture model, and choosing parametric families. The goal of model comparison may be selecting a single "best" model, or improve estimation via model ensemble averaging, where expectation values from different models are weighted-averaged by their posterior probabilities.

Common methods for Bayesian model comparison include:

Separate estimation: Comparing models through posterior predictive distributions, Bayes factors, and approximations like BIC and DIC.
Comparative estimation: Assessing the "distance" between posterior distributions using measures like Kullback-Leibler divergence.
Simultaneous estimation: Exploring the model space using techniques like RJMCMC or BDMCMC.

Setup

Bayesian evidence, or marginal likelihood, for a model $M$ is the average likelihood of observing the data $y$ under the prior distribution of the model parameters $\theta$ : $p(y|M)=\int p(y|\theta ,M)\pi (\theta |M)d\theta$ When comparing two models, $M_{0}$ and $M_{1}$ , the Bayes factor is the ratio of their evidences: $B_{01}={\frac {p(y|M_{0})}{p(y|M_{1})}}$ A Bayes factor greater than 1 favors $M_{0}$ , while a value less than 1 favors $M_{1}$ . The magnitude of the Bayes factor reflects the strength of evidence, often interpreted using Jeffreys' scale.

Generally, the prior probability is chosen to quantify Occam's razor. A model with many free parameters will generally fit the data better, but it may overfit and perform poorly on new, unseen data. This can be quantified by choosing a prior distribution that decreases with model parameter count.

Bayesian complexity measures the effective number of parameters that the data can support, accoutning for parameters that are unconstrained by the data.^[1]

Instead of choosing a single "best" model, Bayesian model averaging (BMA) combines predictions from multiple models, weighted by their posterior probabilities. This approach acknowledges uncertainty about the true model, incorporating it into the final inference.

Bayesian stacking, a more recent technique, weights models based on their out-of-sample predictive performance, using the entire dataset for model fitting. This method relaxes the assumption that the true model is within the set of candidate models.

Approximations

Calculating the Bayesian evidence involves multi-dimensional integration, often computationally demanding. Several approximation methods exist, including:

Laplace approximation: Assumes a Gaussian likelihood and prior, simplifying the evidence integral.
Thermodynamic integration (simulated annealing): A numerical integration technique for complex likelihoods.
Nested sampling: Recasts the multi-dimensional integral into a simpler one-dimensional form.

Information criteria

A family of approximations to the Bayes factor were derived based on information theory, all named "information criteria". These rely on simplifying assumptions that may be satisfied in practice.^[2] The most popular ones are:

Akaike information criterion (AIC): Penalizes models based on the number of parameters.
Bayesian information criterion (BIC): Similar to AIC, but with a stronger penalty for complexity.
Deviance information criterion (DIC): Generalizes AIC to hierarchical modeling, using the effective number of parameters.
Widely Applicable Information Criterion (WAIC): Generalizes AIC to singular statistical models, based on pointwise predictive densities.

Predictive accuracy

Model evaluation focuses on a model's predictive capacity rather than its fit to the observed data. Techniques like cross-validation and leave-one-out cross-validation (LOO-CV) partition the data to assess a model's performance on unseen data, mitigating overfitting.

Pareto smoothed importance sampling LOO-CV (PSIS-LOO-CV) enhances computational efficiency and stability of LOO-CV, particularly for complex models.

Separate Estimation

Consider two models, $M_{1}$ and $M_{2}$ . For prediction, a natural Bayesian approach compares models based on their posterior predictive distributions. Another approach involves comparing models using their posterior probabilities given the data. Using Bayes' rule, the choice between models can be made using the ratio: ${\frac {p(M_{2}|y)}{p(M_{1}|y)}}={\frac {p(M_{2})}{p(M_{1})}}\times {\frac {p(y|M_{2})}{p(y|M_{1})}}.$ The second term in this ratio, the ratio of marginal likelihoods, is the Bayes factor (BF). It is obtained by integrating over all parameter values, not by maximizing as in likelihood ratios. While theoretically attractive, Bayes factors can be difficult to calculate, especially for complex models, and are sensitive to prior choices.

Approximations to the Bayes factor, such as BIC and DIC, provide computationally efficient alternatives. These criteria penalize models with greater complexity, favoring parsimonious models that adequately explain the data. However, these approximations rely on specific assumptions and may not be appropriate for all model types.

Other examples

Models can be compared by assessing the "distance" between their posterior (or posterior predictive) distributions. If the distance is small, the more parsimonious model might be preferred. Examples include the Kullback-Leibler divergence and entropy distance measures.

MCMC methods

Markov chain Monte Carlo (MCMC) can be used to perform Bayesian model selection. The idea is to construct an MCMC chain in the space of possible models $M$ , such that the MCMC chain samples the space of possible models according to the model posterior distribution, or some other distribution.

Reversible jump MCMC (or trans-dimensional MCMC)^[3], allows "jumps" between models of different dimensions. Birth and death MCMC^[4]^[5] is an alternative that models the time between jumps as a random variable, with model probabilities determined by the time spent in each model.

Applications

Mixture Models

Mixture models are widely used for data exhibiting heterogeneity. Several techniques exist for comparing mixture models. For instance, the DIC can be used when the mixture model is well defined. In other cases, alternative DIC estimators tailored for mixture models can be employed. Bayes factors, posterior predictive checks, and visual inspection of model fits also aid in selecting appropriate mixture models.

References

General references

Gelman, Andrew (2014). Bayesian data analysis. Chapman & Hall/CRC texts in statistical science (Third ed.). Boca Raton: CRC Press. ISBN 978-1-4398-4095-5.
Congdon, P. (2007). Bayesian Statistical Modelling. Wiley Series in Probability and Statistics. Wiley. ISBN 978-0-470-03593-1.
Robert, Christian P.; Casella, George (2004). "Monte Carlo Statistical Methods". Springer Texts in Statistics. New York, NY: Springer New York. doi:10.1007/978-1-4757-4145-2. ISBN 978-1-4419-1939-7. ISSN 1431-875X.
Kruschke, John K. (2015). "Model Comparison and Hierarchical Modeling". Doing Bayesian Data Analysis. Elsevier. pp. 265–296. doi:10.1016/b978-0-12-405888-0.00010-6. ISBN 978-0-12-405888-0.

K. P. Burnham and D. R. Anderson, Model Selection and Multi-model Inference: A Practical Information-theoretic Approach, 2nd edn (Springer, New York, 2002).
D. MacKay, Information theory, inference, and learning algorithms (Cambridge University Press, Cambridge, UK, 2003).
Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with discussion). Statist. And Computing 7, 253-272.
Celeux, G., Forbes, F., Robert, C.P. and Titterington, D.M. (2003). Deviance information criteria for missing data models. Cahiers du Ceremade 0325.
Congdon, P. (2001). Bayesian Statistical Modelling. Wiley, England.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995). Bayesian Data Analysis. Chapman and Hall, London.
George, E. and McCulloch, R. (1993). Variable selection via Gibbs sampling. J. American Statist. Association 88(423), 881-889.
Green, P. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika 82(4), 711-732.
Kass, R. and Raftery, A. (1995). Bayes factors. J. American Statist. Assoc. 90, 773-795.
Perez, J.M. and Berger, J. (2002). Expected posterior prior distributions for model selection. Biometrika 89, 491-512.
Richardson, S. and Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Royal Statist. Soc. Series B 59 731-792.
Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York, second edition.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. Royal Statist. Society Series B 64(3), 583-639.