Jump to content

Talk:Bayesian inference in phylogeny

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Untitled

[edit]

I've added this entry. Will take a few days to flesh it out. Any suggestions will be helpful. Stiwari 00:23, 17 September 2006 (UTC)[reply]


copy/paste from Maximum parsimony; move into article later

[edit]

Maximum parsimony has more about this topic than this page does. Almost all of it needs to be moved to this article; am pasting it here for now...

Bayesian phylogenetics uses the likelihood function, and is normally implemented using the same models of evolutionary change used in Maximum Likelihood. It is very different, however, in both theory and application. Bayesian statistics is interesting because it takes into account ones a priori beliefs about the expected results of a test (called the prior probability), and gives a revised estimate of probabilities based on the results of a test (posterior probabilities). This is quite different from frequentist statistics, but is rather similar to the way in which people ordinarily address questions.

Bayesian phylogenetic analysis uses Bayes' theorem, which relates the posterior probability of a tree to the likelihood of data, and the prior probability of the tree and model of evolution. However, unlike parsimony and likelihood methods, Bayesian analysis does not produce a single tree or set of equally optimal trees. Bayesian analysis uses the likelihood of trees in a Markov chain Monte Carlo (MCMC) simulation to sample trees in proportion to their likelihood, thereby producing a credible sample of trees. Following the mathematical application of Bayes' theorem, particular relationships (usually taken to mean particular branches or clades) occur within this set of trees in proportion to their posterior probability. Thus, if a particular grouping appears in 759 of 1000 trees resulting from a Bayesian analysis, this group has a posterior probability of 75.9%. Unlike other measures of support (such as bootstrap percentages), this value can be interpreted directly as the probability that that relationship represents the real phylogeny of the organisms, given the data, the model, and the prior probabilities.

The straightforward interpretation of Bayesian posterior probabilities, the automatic production of a confidence set of trees, and the relative computational ease of the Markov chain Monte Carlo approach (broadly comparable in computational time to a single ML analysis) are rapidly bringing Bayesian analysis into the mainstream. Much work is being expended making Bayesian analyses more flexible; an especially promising line of inquiry, one shared with ML analysis, is the exploration of integrating likelihood estimates over nuisance parameters (branch lengths, model parameters); this should improve estimates of the variables of interest (usually the tree).

In the above analogy regarding choosing a contractor, there is no easy analogy for the set-up of a Bayesian analysis. If the choice of the lowest bidder is used as a prior for the analysis, the result will be couched in terms of whether or not that bid should be rejected in favor of another. The result of would be similar to the results of a likelihood analysis (see above), but it would include frequency distributions for the expected contractor costs. Two contractors may have the same average expected cost, but one may have a narrower confidence range, and thus be more likely to deliver the job closer to the projected cost. Some contractors may have such a broad distribution of costs that they may exceed the maximum you are willing to pay, while others may be expected to cost more, but are very unlikely to exceed this cost. Thus, if the model, the data, and the priors are good, the Bayesian estimate provides a lot more information, and a much better framework for selecting a contractor.

One commonly cited drawback of Bayesian analysis is the need to explicitly set out a set of prior probabilities for the range of potential outcomes. The idea of incorporating prior probabilities into an analysis has been suggested as a potential source of bias. This is, in fact, a misunderstanding of the point of Bayesian analysis, which is to assess the support for changing an a priori hypothesis. Still, it is possible to specify uninformative priors, which do not prefer any particular hypothesis. Arguably, some hypotheses are more likely than others (e.g., it is unlikely that mollusks will be found to be vertebrates), and a reasonable analysis should probably reflect this. Bayesian methods involve other potential issues, such as the evaluation of "convergence," the point at which the MCMC process stops searching for the "space" of credible solutions and begins to build the credible sample. At present, it there is no objective way to evaluate convergence, and it remains to be seen if subjective methods are effective.

-- Ling.Nut 08:59, 26 August 2007 (UTC)[reply]

A long paragraph about parsimony: is it necessary?

[edit]

This being a page about bayesian inference in phylogenesis, do you reckon it's that relevant to add a section about parsimony? Shouldn't we maybe just put a link and dedicate the whole page to bayesian methods? - Arteteco (talk) 10:39, 10 September 2019 (UTC)[reply]