Talk:Central limit theorem

Mathematics High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
High	This article has been rated as High-priority on the project's priority scale.

Statistics Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Top	This article has been rated as Top-importance on the importance scale.

This article has been mentioned by a media organization:

This Article needs a new beginning

Again wikipadia has the most difficult, longest, unclear, and possibly wrong explanation. I was browsing internet for a search for the fundamental theorem in statistics and found many interesting answers. Most of them the central limit theorem. And all places explained it well. Try for yourself in google: "most fundamental theorem in statistics". — Preceding unsigned comment added by 2001:4643:E6E3:0:ED59:485C:BA1B:3F50 (talk) 12:14, 9 September 2018 (UTC)[reply]

It certainly needs a new opening. The assertion in the first paragraph, "The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions." is an almost classical example of the fallacy of assertion of the consequent.

My own Rule is quite the opposite: "The outliers are the information. The bell curve is the random noise telling you it's there."

David Lloyd-Jones (talk) 01:10, 1 May 2020 (UTC)[reply]

This is to bring contents to the top

I removed this:

An interesting illustration of the central tendency, or Central Limit Theorem, is to compare, for a number of lifts (elevators for those on the left-hand side of the Atlantic), the maximum load and the maximum number of people. For small lifts holding only a few people, the maximum load divided by maximum number of people is usually greater than it is in large lifts holding a larger number of people. This is necessary because some small groups of people who fill the lift may well have several people who are above average weight (just as, on other occasions, other small groups may have several who are well below average weight), whereas the larger the sample (the number of people in the large lift) the nearer the proportion of overweight people will be to the norm for the whole population.

While it is a nice example, it doesn't illustrate the Central limit theorem, whose gist is that the sum is normally distributed. I don't quite know where to put this example though. Maybe in standard deviation or normal distribution? AxelBoldt 21:02 Oct 14, 2002 (UTC)

I've encountered another definition of "the" central limit theorem.

My statistics textbook (Mathematical Statistics with Applications, 6th edition, by Wackerly, Mendenhall III, and Scheaffer) defines it in this way:

If Y1, Y2, ..., Yn are iid with μ and σ, then n^1/2*(Y_bar-μ)/σ converges to a standard normal distribution as n goes to infinity. (my paraphrase)

The HyperStat on-line basic statistics text says

The central limit theorem states that given a distribution with a mean m and variance s2, the sampling distribution of the mean approaches a normal distribution with a mean (m) and a variance s2/N as N, the sample size, increases. (quoted directly)

I suppose this follows from the definition given in this article. Nonetheless, it is not identical to the one given in the article.

Is there a general trend for more basic/applied statistics books to use this mean-centric definition, while more advanced/theoretical ones use the definition given in the article? Is the definition given in the article better somehow? (I assume the mean-centric definition can be derived from it, but not vice versa.) Should the article also mention the mean-centric definition, since it seems to be somewhat popular?

--Ryguasu 10:52 Dec 2, 2002 (UTC)

No --- the "mean-centric" version and the "sum-centric" version are trivially exactly the same thing; either can be derived from the other, and it's completely trivial: Just multiply both the numerator and the denominator by the same thing; you need to figure out which thing. Michael Hardy 04:34 Feb 21, 2003 (UTC)

Right. This became obvious to me sometime after posting the question. Nonetheless, I think I'm going to stick in the mean-based formulation at some point; I've found more books using only the mean-based definition, and I imagine that some not so mathematically inclined people who nonetheless have to brush up against the CLT (certain social scientists come to mind) might like having what is not trivial to them pointed out. I agree, however, that unless proofs of the CLT typically involve the mean-based formulation, the one currently given on this page should be presented as more fundamental. --Ryguasu

Maybe I'm getting in over my head here, but do you really need to normalize S_n to say anything precise here? Can't we clarify the first "informal" claim of convergence of S_n by saying, parallel to what AxelBoldt has said for the normalized (i.e. Z_n) case

The distribution of S_n converges towards the normal distribution N(nμ,σ²n) as n approaches ∞. This means: if F(z) is the cumulative distribution function of N(nμ,σ²n), then for every real number z, we have

lim_n→∞ Pr(S_n ≤ z) = F(z).

Is there a lurking desire here to state the non-standard normal part as a corollary, rather than as central to the CLT? That might be ok, although the general-purpose version looks more useful to me.

--Ryguasu 01:18 Dec 11, 2002 (UTC)

The problem is that on one side of your equality you have a limit as n approaches infinity, so that the value of that side does not depend on anything called n, and which CDF you've got on the other side does depend on the value of n. -- Mike Hardy

Actually, the CDF on the right hand size depends on z, not on n. There are no free ns anywhere. --Ryguasu

It does depend on n, but your notation inappropriately suppresses that dependency. You defined F(z) as the cumulative distribution function of N(nμ,σ²n). AxelBoldt 02:23 Dec 14, 2002 (UTC)

Excellent point. Nonetheless, I find it suspicious that someone with more mathematical experience than me can't express the "informal" claim in a rigorous manner. At Talk:Normal distribution, you mentioned "goodness of fit" tests. Couldn't you express the informal version formally, through some limit statement about the results of such a test as the number of samples/trials goes to infinity? --Ryguasu 02:11 Jan 30, 2003 (UTC)

Probably, I don't know. But the version given in the article is also a rigorous statement of the "informal" claim you have in mind. AxelBoldt 00:55 Jan 31, 2003 (UTC)

How about adding some examples? (This is something most of the math pages are lacking.) How about an illustration involving coin flips? I.e., X_n is defined on the probability space [0, 1] so that X_n is 1 with probability 1/2 and -1 with probability 1/2. A series of graphs and equations could be given.

In the article, there is a comment reading, "picture of a distribution being "smoothed out" by summation would be nice". I've created an animated gif to address this comment. Since animated gifs are considered questionable, I am posting it to the talk page to see if others think it's a good idea. (The image has a rather large footprint on the screen. If anyone can easily shrink it, that would be good. With the rather rudimentary image manipulation tools at my disposal, it would be a moderately involved undertaking for me, so I'm not going to do it unless it's a worthwhile effort.)

I also propose the following explanatory text:

The figure below demonstrates the central limit theorem in action. It shows the distribution of the random variable Y = n^-½S_n for values of n from 1 to 7. (In this particular case, the random variables X_i have variance equal to 1, so the variance of S_n is equal to n. The factor n^-½ scales Y so that its variance is equal to 1 independent of n.)

Any and all comments appreciated. -- Cyan 22:15, 2 Feb 2004 (UTC)

Testing...

Yes, using the thumbnail feature would be a quick work-around. I don't know anything about this, but the diagram seems useful to me (it's particularly useful that it pauses between repetitions). You can count along in your head 1 to 7 as the shape of the graph changes, it doesn't rely on captions you need to read at the same time as observing the graph. I give it my uninformed support. :) (Plus, if this is replicating information already included in the text then that's even better; relying on an animated gif to impart key information rather than to give an example of it would be a bad thing). fabiform | talk 04:32, 4 Feb 2004 (UTC)

An animation can't be printed, and I've always found animated diagrams to be very frustrating, particularly in a case like this. I have to wait for it to come around again if I'm trying to wrap my head around some individual part of it. There's no pause button, no frame forward, no rewind, at least in most browsers. I'd rather see such images side by side in most cases. Perhaps an animation in addition might be neat, but forcing it on readers is to me not friendly.

Here's a quick vertically flattened version (which could float to the side of the body text, for instance). A horizontal version might be better, or break it on two lines. --Brion 09:15, 4 Feb 2004 (UTC)

My $0.02:

a) This is, indeed, an example of an appropriate use of an animated GIF. There's no actual need to change it. However...

b) I actually think that in this particular case the separate pictures are really just as good. I find the animation irritatingly jumpy, and, of course, the constant-time steps are too fast for the early steps (where you might even want to take a moment to visualize the convolution in your head, and notice that you go from two sharp peaks to three blunt peaks to a single broad peak with four bumps), and too slow for the later steps (which all look alike). This is a nit-pick, though.

c) Footprint of the animated version is OK. Note, however, that you could easily reduce the extent of the X axis to +/- 3.5. Maybe by the last iteration there is some data outside those limits and maybe you know it's there, but visually it doesn't matter.

d) The individual thumbnails in Brion's version need a bit of work. They're currently too small and the vertical arrangement isn't very good. You're going to get a million "try this, try that" suggestions, each of which would be a couple of hours' work to try... mine is that you use a table and put them into some kind of comic strip format, maybe two rows of four, maybe four rows of two... yes, you'd need to provide an eighth image but since it would look just the same as the seventh that wouldn't be a problem... you'd need to tinker with the axis labelling, slightly bigger type, perhaps slightly fewer divisions... the axis labels (numbers) do NOT need to be TRULY legible, they should be reduced with antialiases smoothing, it's OK if they look blurry when you enlarge them, but they need to be just legible enough that you think you're seeing 1, 2, 3...

Very appropriate to the subject matter, by the way, and a nice illustration. Good stuff! Dpbsmith 11:37, 4 Feb 2004 (UTC)

Thanks for all the comments, folks! Here's what I'm going to do. As Dbpsmith and Brion suggest, I'm going to create a static image in 2 strips of 4 graphs. I'll play around with the x-axis limits for aesthetic effect, and I'll include a link to the animated gif for those of our readers who want to click on it. The reason to include it at all is that the last few panels will be indistinguishable as static images, but small changes will be apparent in the animated version, thus giving the viewer a sense of the scale of changes in distribution that occur past a certain value of n. -- Cyan 16:04, 4 Feb 2004 (UTC)

I looked at the different proposed diagrams, and I think I prefer the 2 strips of 4 graphs idea. I like the static images better than the animated image. -- It occurs to me that the illustration of the central limit theorem could be expanded by showing two or more different initial distributions, or adding a different distribution each time (not identical). After all the whole point of the theorem is that for a large class of distributions, adding them together brings you to the same limiting distribution. Thoughts? Happy editing, Wile E. Heresiarch 02:47, 18 Mar 2004 (UTC)

Oh, just a minor followup -- maybe it would help if the same example shown on the main central limit theorem page was the same as one of (hopefully several) examples shown in illustration of the central limit theorem. I'm thinking the main page could just show the phenomenon, and the illustration page could go into more detail. Thinking out loud, Wile E. Heresiarch 14:08, 18 Mar 2004 (UTC)

Yet another half-baked idea -- maybe the effect of the animation can be sort-of imitated by leaving each plotted line in the succeeding figures, but grayed-out or something like that. So you could see just how much the line is changing, and the old lines won't block out the new ones if we use a lighter/grayer color. Wile E. Heresiarch 14:15, 18 Mar 2004 (UTC)

I have to agree with the no-animation camp. While it does show the progression nicely, having to watch it repeat a few times isn't ideal, and it distracts from the article. The images are great though, and as shown above they work nicely in a line. One other problem with animation is that it can show effects that are not there - the line looks to move which kinda hides the fact that it is a convolution. There might be a case to argue for a link to the animated version, but I would argue it is unnecessary. Good work folks. Mat-C 00:41, 18 Apr 2004 (UTC)

Mat-C, maybe you can look at the figures in Student's t-distribution and tell me what you think -- I attempted to show the progression of the t distribution to the normal distribution by using different colors. How successful was that, do you think? Thanks for any comments, Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)

Just for those who are wondering, the reason I haven't followed up on producing a set of images is because I discovered that the numerical convolution method I'm using isn't actually converging to a Gaussian. The images above look like Gaussians, but in fact are flatter and have wider tails than a Gaussian actually has. In fact, if I start with a Gaussian, the convolution moves it away from Gaussianity, flattening it and widening the tails. I haven't the time to devote to correcting this problem right now... I may get to it at some less busy time in the future. -- Cyan 05:53, 18 Apr 2004 (UTC)

Hmm, can you tell me a little about how you're going about the convolution, then? The reason that I ask is that I have also computed a numerical convolution (via FFT) for the figures on the illustration of the central limit theorem page, and I'd like to try to make sure those figures don't have the same problem. Thanks for any info. Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)

I used a two-sided filter algorithm based on MATLAB's built-in one-sided "filter" function (more info on this function here). I convolved a vector containing discrete samples of the distribution with the original distributio, and then rescaled it back to standard deviation 1, which involves resampling the distribution so that the discrete grid matches that of the original distribution. Apparently this quick and dirty procedure is affected by some kind of numerical error, because the distribution it converges to is not Gaussian. If you want to check the convergence, why not just plot a Gaussian over your filter-derived distribution? -- Cyan 04:54, 19 Apr 2004 (UTC)

Thanks for your comments. Just a thought -- the problem that you describe might be caused by the discretization effects -- I ran into that when working on another convolution problem and found the convolution result slowly drifting away from the correct result. I think it might be possible to solve the problem without resampling, which could reduce the discretization error. I think I'll post the Octave code which I used to construct the figures -- then it can be inspected and compared, as well as making it possible to "try this at home". Happy editing, Wile E. Heresiarch 02:22, 20 Apr 2004 (UTC)

o(t²)

Just a note: o(t²), t → 0, refers to a function which goes to zero more quickly than t² (like t³), and not a function 'like' t², which would be O(t²). Hence, I have reverted the recent edits that changed o(t²) to o(t³). Notably, the article on Big-O notation does not discuss limits other than the limit as t → ∞. However, it should do so! Ben Cairns 06:56, 14 Feb 2005 (UTC).

o(t²) Reply

Sorry, I 've did not seen your message (Bjcairns) in the discussion enrty. I confused big O with small o. I though that this o is reffering to the higher order corrections of the Taylor's expansion formula. I suppose that you are right so I changed the article back to its previous version with o(t²) without being logged in. That ip 143.233.xxx.xxx etc is mine :) My version is perhaps correct if we consider the Big O and not the small one. Theofilatos 17:07, 17 Feb 2005 (UTC)

Needs layman's language too

This article seems to be very mathematically complex. It could benefit from some simple layman's language. Ian Howlett 13:24, 30 June 2005 (UTC).[reply]

Quotation marks de-emphasize

Quotation marks around a word often mean something like: that's what some people are often heard to call it, but I don't want to commit myself to agreeing. Thus they de-emphasize. If you write "John has a 'degree' from the University of the Ozarks", the quotation marks enclosing the word "degree" mean that maybe John and some others call it a "degree", but you don't necessarily agree. Often quotation marks mean "don't take this word literally." That is the meaning of the quotation marks around "the" in the section heading that says "The" central limit theorem. The word "the" in this context implies uniqueness: that there is only one central limit theorem. In fact there are many, with varying assumptions: sometimes independence is relaxed; sometimes identical distribution is relaxed; sometimes the random variables live in some space besides the real line, etc. The quotation marks mean that often people call this one "the" central limit theorem, but the word "the" should not be taken too literally. Michael Hardy 18:16, 16 September 2005 (UTC)[reply]

I am aware of the use of quotes in this way, I use them like that "every" day. :) However, I find it strange that somebody would quote the word the. Oleg Alexandrov 18:42, 16 September 2005 (UTC)[reply]

Ironic emphasis of "the" is common enough in informal American English (dunno if the Brits use it too). I don't think we want an easily misunderstood wordplay here. I've replace "The" central limit theorem with Classical central limit theorem. Feel free to find a different adjective. There are other uses of "scare quotes" in the article which should be reviewed. Regards & happy editing, Wile E. Heresiarch 03:08, 19 September 2005 (UTC)[reply]

Link to polymers

Hi guys I'm an editor who delves a lot in physics (statistical mechanics especially) and a bit in statistics. That means I use this theorem a lot. I'll have other things to say, but as of now I just need to share something that sprung to my mind (not that it's original work, someone probably thought of that before me):

there is very probably a link between the non-independant case and polymer physics. A real-world polymer is basically a correlated random walk, although this correlation tends to decrease exponentially. Yet the object follows the Central Limit Theorem. About this see the ideal chain and worm-like chain articles, especially the parts about the Kuhn segment.

Either mathematicians have a version of non-independant CLT corresponding to this, in which case as a polymer and random walk editor I need to know, or this should probably be added as another case of non-independant CLT, in some form or the other.(ThorinMuglindir 23:40, 25 October 2005 (UTC))[reply]

A few thoughts

From the article: "The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist)."

This should also appear (and probably be explained in detail) in probability density function.

A chapter about dedicated to CLT and Fourier transform wouldn't be superfluous either, as the CLT is quite easy to demonstrate in Fourier space. Such considerations are for the moment mentioned but in very little detail. That would lead us to being able to say that the convergence of CLT is faster in the low-fourier modes, and slower in the high fourier modes (if you don't renormalize the sum, there can even be no convergence at all in the high fourier modes in some, see below). Wouldn't attempt to formalize that in a clean mathematical way myself though.

Some singular cases that might be worth explaining

As I said yesterday I am very much into editing polymer and random walk stuff for physics, which leads me to linking to CLT a lot. There is a case where convergence toward CLT is singular, yet arises quite often in random walks (namely, that is random walk on a lattice).

Take for instance independant variables which can take -1 or 1 for values with proba 1/2 each, and sum them N times.

If you look at the density function you obtain, it is not strictly a gaussian. It is a series of successive dirac delta functions. Now CLT is not that far off because the amplitude of the Dirac peaks of the resulting sum varies according to a gaussian curve. So that if you look at the function in the low fourier modes, it will correspond to the gaussian curve that is predicted by CLT. For high fourier modes (k > N.2Π, or k > 2Π if you don't renormalize the sum) the density of the resulting sum has nothing to do with a gaussian.

The situation is not the same if you consider a sum of continuous variables, or a lattice-free random walk, the same problem does not arise.

For example consider the countinuous variable that is uniformely distributed in $[-1/{\sqrt {2}};1/{\sqrt {2}}]$ , and sum a large number of independant realisations of this variable. This new variable has the same mean and variance as the previous one, yet you won't obtain a series of dirac peaks like in the previous case. The resulting density density will look as a gaussian pretty much at any scale, including in the high fourier range...

All this will probably be clearer by writing a formula: for the latter variable (variance 1, mean 0), the (normalized) sum converges toward the density function P(X), corresponding to N(0,1/N). Now, strictly speaking, the former variable sum's density function does not converge toward P(X), but rather toward:

P'(X)=P(X)\Sigma _{k=-N}^{+N}\delta (X-k/N)

, where delta is the Dirac delta function

About this here are my questions: is the above somehow related to what you say about the nature of the third moment of the variable controlling the speed of convergence? Can this difference in convergence in the high and low fourier modes be formalized mathematically?(192.54.193.37 08:58, 26 October 2005 (UTC))[reply]

Of I again forgot to log on... Well, the section above is from ThorinMuglindir 09:00, 26 October 2005 (UTC)[reply]

Not all of what you are saying is immediately clear to me, but you seem to be talking about discrete random variables compared with continuous random variables. Discrete random variables do not have probability density functions, but the cnetral limit theorem is not about densities anyway. It is about convergence in distribution, i.e. about cumulative distribution functions. So there is less problem about comparing discrete and continuous random variables. The classic example is the normal approximation to the binomial distribution; even here the approximation can be misleading in the tails, as it often is when appling the central limit theorem. --Henrygb 23:55, 26 October 2005 (UTC)[reply]

thanks my question was indeed related to that binomial distribution. Just as a remark it is often possible and useful to define a probability density function for a discrete variable, using Dirac delta function. Mathematically speaking Dirac delta function is not a function, but it's still a distribution (distribution, not in the sense of statistics, but in the sense of topology, that is to say, an object in the adherence of the space of functions). Of coures when you do physics you couldn't care less about what exactly is a function and what is a distribution... What I wrote above is just a reformulation of the meaning of the graph that compares the curve and the histogram in the binomial distribution article, reformulation that is based on Dirac delta functions.(ThorinMuglindir 10:03, 27 October 2005 (UTC))[reply]

I'll add a very short bit to the article, explaining that CLT can also be adapted to sums of discrete variables, although in a slightly different form, and link to binomial distribution as an example. Be it just to not confuse a reader who comes here from, say the random walk article, where CLT is applied to a sum of discrete variables.ThorinMuglindir 10:04, 27 October 2005 (UTC)[reply]

if we build a histogram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the histogram converges toward a Gaussian curve as n approaches infinity

The above is wrong. Counter example: a discrete variable that takes only even values. It's wrong to talk about "centers of the upper faces of the rectangles", one should tak about some smoothed histogram, or much better (though less intuitive) about distribution functions. — Preceding unsigned comment added by Leonbloy (talk • contribs) 10:40, 15 February 2011 (UTC)[reply]

I don't see it's necessary to talk about 'some smoothed histogram'—one can simply use a histogram with a larger bin width, e.g. 2 for your example. Qwfp (talk) 11:26, 15 February 2011 (UTC)[reply]

sum has finite variance, or the random numbers themselves?

The first paragraph states: The most important and famous result is called simply The Central Limit Theorem which states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.

The random variables must have finite variance, right? This was the impression I got from http://mathworld.wolfram.com/CentralLimitTheorem.html. I am not skilled at mathematics so I do not know if saying the sum has a finite variance is correct. Thank you. Jason Katz-Brown 05:32, 11 February 2006 (UTC)[reply]

They say the same thing: variances are non-negative so the sum of a finite number of them will be finite if and only if each of them is. --Henrygb 16:31, 11 February 2006 (UTC)[reply]

Organigram?!

From the article:

This means that if we build an organigram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the organigram converges toward a gaussian curve as n approaches $\infty$ . The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.

Huh? We draw an organizational chart of what, and how? I suppose "independent identical" is meant to be iid (as independent and identical is a contradiction), but what about the rest of it? Given the binomial distribution article reference, "organigram" is probably meant to be "histogram", though I don't see how the curve would join the "centers" of the upper faces more than any other points on them (a description which makes more sense for organigrams, even though as a whole using organigrams to depict distributions would be a strange idea). The histogram, then, is presumably of the probability distribution of a random variable that is the sum of n iid discrete random variables (or approximation of the same by frequencies in a finite sample of the random variable, but then we need to take limit at infinite sample size or the other limit won't converge). Is this interpretation correct? I'm not sure. How to explain it in good encyclopedia style? I don't know. As it stands this part of the article is very confusing and should be fixed, preferably by someone who knows something about probability theory (I don't, so I'm not touching it). 82.103.195.147 20:51, 12 August 2006 (UTC)[reply]

What about the sum of non-identically distributed random variables?

This is a personal doubt, but probably more people comming to this page can have it. Is there any result related to the CLT that says anythin about the sum of random variables in general? For example, in my problem I have 40 Beta variables each of them with their own mean and variance. I think there is some result saying their sum is a Normal variable with mean and variace equal to their respective sums. Is that right?Arauzo 10:15, 17 September 2006 (UTC)[reply]

See the sections on Lyapunov condition and Lindeberg condition. --Henrygb 17:54, 17 September 2006 (UTC)[reply]

I read the article and understand that whether the random variables are identical or not, their sum will be normally distributed. I disagree with the condition that the random variables must be identical.--Piyatad 09:56, 28 November 2006 (UTC)[reply]

I doesn't say they must be identical. It says that IF their distributions are identical (the distributions, not the random variables!) THEN etc. etc. But it also says:

Several generalizations for finite variance exist which do not require identical distribution but incorporate some condition which guarantees that none of the variables exert a much larger influence than the others.

So there you have it: the article says the distributions do not need to be identical if "some other condition" holds. Just which other condition depends on which version of the theorem you're talking about. I think equal variances may be more than strong enough; and if I weren't writing this comment in some haste I just might say that's obvious.... Michael Hardy 01:45, 4 December 2006 (UTC)[reply]

I think we should state very explicitly in the first paragraph that the CLT applies to the sum of arbitrary distributions, not only indentical distributions. The current version is causing misunderstanding in standard deviation, where they state that "... [the classical central limit theorem] says that sums of many independent, identically-distributed random variables tend towards the normal distribution as a limit." My suggestion is replace the last sentence in the first paragraph with the following: "The most important and famous result is called The Central Limit Theorem which states that if the sum of independent and arbitrarily-distributed variables has a finite variance, then it will be approximately normally distributed (i.e., following a normal or Gaussian distribution)." Please comment.

It's a long time since the above, but any way... I think that equal variances are not enough and that it would be incorrect to have "The most important and famous result is called The Central Limit Theorem which states that if the sum of independent and arbitrarily-distributed variables has a finite variance, then it will be approximately normally distributed (i.e., following a normal or Gaussian distribution)." The "proof" in the article might be simplified by working with cumulant generating functions (cgf) and this would make the point here clearer. Just write the expansion to include the skewness, and work out the effect on the cgf for the average. You get something involving the skewnesses of the individual components and for the CLT to hold this term must converge to zero as the number of samples increases. To defeat the CLT assumption you just need to find an sequence of skewnesses which increases fast enough. Thus a CLT result does not some caveats: (quote)

Several generalizations for finite variance exist which do not require identical distribution but incorporate some condition which guarantees that none of the variables exert a much larger influence than the others.

... where the condition would need to apply all aspects of the distributions of the variables. Melcombe (talk) 15:05, 2 April 2008 (UTC)[reply]

Large sample size

The need for a large sample size should be included. n >= 30 to 70 for it to be large. 70.111.238.17 14:11, 1 October 2006 (UTC)[reply]

It says as n approaches ∞, and that is certainly quite large. But some "rules of thumb" could be added too. In many cases, "≥ 30" is quite conservative. Michael Hardy 21:29, 2 October 2006 (UTC)[reply]

central {limit theorem} or {central limit} theorem?

The article now says:

This is a limit theorem and is about central limits.

I've long thought it was the central theorem on limits, not the theorem on central limits. Can someone explain just what "central limits" are? The article's present comment seems to confuse rather than to clarify. Michael Hardy 02:51, 6 November 2006 (UTC)[reply]

I removed a bit that said that the theorem was NOT a 'central' theorem, but was a theorem about 'central limits'. The text now says what it is, not what it isn't. I hope that clears it up. 8_)--Light current 03:24, 6 November 2006 (UTC)[reply]

I don't see how it clears up what a "central limit" is. What is a "central limit"? Michael Hardy 03:42, 6 November 2006 (UTC)[reply]

...and now I've edited it to say it's a central theorem about limits, not a theorem about central limits. Can no one explain what a "central limit" is? I suspect no one can, because I suspect there's no such thing. Michael Hardy 03:45, 6 November 2006 (UTC)[reply]

I dont think you are correct. You should read the whole thing then youll see. Excerpt from page:

Note the following apparent "paradox": by adding many independent identically distributed positive variables, one gets approximately a normal distribution. But for every normally distributed variable, the probability that it is negative is non-zero! How is it possible to get negative numbers from adding only positives? The reason is simple: the theorem applies to terms centered about the mean. Without that standardization, the distribution would, as intuition suggests, escape away to infinity.

My itals, bolding--Light current 03:52, 6 November 2006 (UTC)[reply]

So a central limit is one that is evenly distributed about zero.--Light current 03:54, 6 November 2006 (UTC)[reply]

Hey Ive just noticed you are a statistician!! Why you asking me about stats? 8-)--Light current 03:55, 6 November 2006 (UTC)[reply]

Ive removed the controversial statement until we can get the proper dope on it 8-)--Light current 04:01, 6 November 2006 (UTC)[reply]

I agree with M.Hardy. Historically, Polya introduced in 1920 the name in german "zentral Grenzwertsatz" which means central theorem-about-the-limit (George Polya "Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem," Mathematische Zeitschrift, 8 (1920), 171-181)Dangauthier 14:55, 5 April 2007 (UTC)[reply]

I just (two years later!) added this ref to the history section of the article. Qwfp (talk) 14:58, 15 May 2009 (UTC)[reply]

Why don't you simply write "central limit-value-theorem" or "central limit-theorem"? Chezistari (talk) 11:12, 11 January 2011 (UTC)[reply]

Making it Easier to Understand

I reckon the section on Classical CLT should start by stating the theorem. Justification should come after this. So the passage might read: " The central limit theorem says that the means of samples are normally distributed." - comments please —The preceding unsigned comment was added by 212.159.75.167 (talk) 20:18, 3 January 2007 (UTC).[reply]

Another suggestion is provide an "every day" example with simulated graphs, like color of adjacent cars that stop at an intersection or something. -unsigned-

30 Individual Samples

I have heard that 30 individuals samples will meet the requirements of the C.L.T and therefore be considered a "statically" valid sample, assuming they were randomly selected using proper statistical procedure. This seems wrong to me, but I'd like to know what others say. Gautam ^Discuss 06:45, 8 June 2007 (UTC)[reply]

Often far less than 30 is enough; sometimes many more is not. It depends on what distribution you're sampling from. But don't call them samples; call them observations within a sample. Michael Hardy 06:48, 8 June 2007 (UTC)[reply]

Additive mean and variance

The CLT indicates for large sample size (n>29 or 100),[1] that the sampling distribution will have the same mean as the population, but variance divided by sample size

The CLT doesn't say that, and it doesn't depend on large sample size. The expected value of the sum is the sum of the expected values, always. For independent random variables the variance of the sum is always the sum of the variances. 72.75.76.121 (talk) 23:43, 5 December 2007 (UTC)[reply]

Standard error of sum is σ n^1/2

In the following:

Consider the sum S_n = X₁ + ... + X_n. Then the expected value of S_n is nμ and its standard error is σ n^−1/2. Furthermore, informally speaking, the distribution of S_n approaches the normal distribution N(nμ,σ²n) as n approaches ∞.

σ n^−1/2 should be σ n^1/2, as this refers to the standard error of the sum, S_n, not of the sample mean. I made this change.tom fisher-york (talk) 16:30, 14 December 2007 (UTC)[reply]

Uses of CLT?

It would be great if this article included a bit about why the CLT is so cool..

For those of us who are getting into some real math through wikipedia, it's really great when there is an example application of a theorem like CLT. For people who already know all about CLT, it may seem like an impossible task to select an example (like giving an example use of addition) but I think the CLT is just arcane enough to warrant a bit of practical stuff so people can see why it is so cool. For me, I was just trying to figure out what the probability of something -- something with a distinctly -not-normal distribution. I ran 1,000 simulations and saw (naturally) that the simulations were generating a distribution with a very familiar (normal) distribution. Wow, I thought. That's the Central Limit Theorem at work! So I was able to avoid further simulations and use a plain old Z test to estimate the final result -- p < e-26 . Even with a very fast computer, you could not do enough simulations of my problem to establish this probability. With the CLT, there is no need.

Just my 2 cents. I could add this myself if folks agree... Tombadog (talk) 12:45, 13 March 2008 (UTC)[reply]

iid?

Do the random variables really have to be iid? Does the CLT work if the variables are independent but have different distributions? Let's say a build a house in 10 (or more) following steps from to basis to roof. Every step takes some time (lenght=random variable). The steps are independent from each other but have different distributions (normal, exponentiell, equal, whatever..). The total length (sum of the 10 random variables) should then be approximately normal distributed regardless of the distribution of each single variable (or is it not)? --217.83.60.191 (talk) 16:25, 18 March 2008 (UTC)[reply]

If you simply drop the assumption of identical distribution, then the resulting statement is not true. But the assumption of identical distribution can be replaced by any of various other assumptions with the result that the theorem is still true. I don't have any of the details at the tip of my tongue, but probably such things should be added to the article. Michael Hardy (talk) 17:02, 18 March 2008 (UTC)[reply]

See discussion above under "What about the sum of non-identically distributed random variables". Melcombe (talk) 15:07, 2 April 2008 (UTC)[reply]

CLT in real life?

I remember back in the old days at university when I heard of an eplanation of the CLT in real life by a friend of a friend. Most things in nature a normally distributed like the length of some kind of plant, the lifespan of some kind of animal, the temperature on a special day or whatever. All these random variables are influenced by many other variables. So the normal distribution is so important because everything (every variable) that is the result of many many other variables is approx. normally distributed. It is not really the sum of other variables in a mathematical way as we don't know exactely the relationship between the variables but its close to that. --Unify (talk) 23:46, 18 March 2008 (UTC)[reply]

Correction needed

Under "Proof of the central limit theorem", the artlicle has

For any random variable, Y, with zero mean and unit variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,

\varphi _{Y}(t)=1-{t^{2} \over 2}+o(t^{2}),\quad t\rightarrow 0

The correction required relates to "by Taylor's theorem" ... the expansion might be a Taylor expansion, but the result that the expansion is valid in this case is not (i.e. assuming that the variance exists and not assuming higher order moments). Anyone have a proper reference for the result, or should the sentence just be rephrased? Melcombe (talk) 15:16, 2 April 2008 (UTC)[reply]

Proof of the central limit theorem

I think the statement about the "remarkably simple proof" is not appropriate. The version of Taylor's theorem that is used here, is probably known only to a small fraction of the readers ( $\phi _{Y}(t)$ has complex values), the "simple properties of characteristic functions" (referring to linear transformations) are not explained, and the convergence to an exponential would deserve a reference. Above all, the heavy tool applied here, the Levy continuity theorem, is not trivial at all. In my opinion, expressing a proof in terms of another difficult theorem doesn't make it simple. It would therefore be more appropriate to state that rigorous proofs of the central limit theorem tend to be cumbersome, but that a non-rigourous argument is easily obtained from the Taylor expansion of the characteristic functions. —Preceding unsigned comment added by 84.56.2.132 (talk) 06:59, 17 July 2008 (UTC)[reply]

I agree with the previous comment that the proof is not rigorous. I actually think it is wrong. If you replace $o\left({t^{2} \over n}\right)$ by $t^{3} \over n$ , then one can clearly see that the limit in that case would be $e^{-t^{2}/2+t^{3}}$ , which is false.

I think the proof needs to use the fact that $\varphi _{Y}(t)$ is a characteristic functions and not just any function.

(Audetto (talk) 20:52, 30 May 2010 (UTC))[reply]

The “o” symbol here refers to the 1/n part, and refers to the n → ∞ limit. Thus you could replace o(t²/n) by t²/n^3/2, but not by t³/n. As for the Lévy theorem, it is certainly both crucial for the proof and nontrivial. However that theorem is much more intuitive—that is easier to believe in—than CLT. If a sequence of characteristic functions converges, then so must the random variables too — might be difficult to prove, but the statement seems so obvious that it seems possible for the elementary level proof to skip it. // stpasha » 22:01, 30 May 2010 (UTC)[reply]

Here is a partial list of principal approaches to proving CLT:

using Stirling's formula;
using the moment generating function (or just moments);
using the characteristic function (Fourier transform);
Lindeberg's method;
Stein's method;
using Brownian motion;
using quantile functions;
using entropy, Fisher information etc.

Surely it is not an easy matter. When there is one simple proof, people do not seek many others. Boris Tsirelson (talk) 10:30, 31 May 2010 (UTC)[reply]

Lack of independence: "Expert attention"

I add a ref about CLT for convex bodies, and remove the tag. Unfortunately, the tag "expert attention needed" was inserted with no explanation, what for. Thus, I am not sure: is it still needed, or not? Boris Tsirelson (talk) 19:52, 18 October 2008 (UTC)[reply]

Well, other than adding reference(s) to this section, nothing much has changed. While I didn't add the tag, there seem to be two points needing attention. Firstly, for the first three items in the list, it would be good to have an outline of the sort of conditions/assumptions being imposed for the CLT to hold: possibly a single summary might do for all three? Secondly, for the 4'th item (convex bodies) it seems necessary to say how a CLT can apply to "bodies" or "sets" when all the discussion above is about random things which are numerically-valued. I guess there would be the question of whether such cases should be dealt with under the heading of "dependence", or might be better with their own slot, depending on what is being meant. Melcombe (talk) 14:15, 22 October 2008 (UTC)[reply]

There is also the question of the heading "lack of independence". At least for some of the topics indicated, it seems that both "non-identical" and "non-independent" are allowed, whereas there is at least an implication that what is covered is the case of "identical" and "non-independent". Melcombe (talk) 14:27, 22 October 2008 (UTC)[reply]

I see. Well, maybe I'll do some more. Why "lack of independence"? For two reasons. First, some time ago the main part of the article covered the non-identical case. Second (and more important), "non-identical" is a relatively small problem, while "non-independent" is relatively hard. About convex bodies: it is meant that a point is chosen from a given convex body (uniformly); its coordinates are "random things which are numerically-valued", but dependent in a way not typical for probability theory (till now); to cover this case is considered an important progress. Boris Tsirelson (talk) 15:21, 22 October 2008 (UTC)[reply]

OK, some improvements would be good. Regarding "convex bodies", it is not obvious how this might differ from a non-independent version of what is in the subsection titled "Multidimensional central limit theorem" ... what role does the "convex" bit have? Is it just to ensure that the mean is within the "body"? The abstract and first page of the cited article are not particularly informative to me. Melcombe (talk) 09:12, 23 October 2008 (UTC)[reply]

I am reading some sources. About convex body: it is completely different, since the large parameter is not the number of summands (this is just 1, - no sum at all, just a single random vector) but rather the dimension of the body (and the random vector), that is, the number of (random, dependent) coordinates. But wait, I'll try to write it down some day. Boris Tsirelson (talk) 13:53, 23 October 2008 (UTC)[reply]

I did something, and shall continue. Boris Tsirelson (talk) 20:15, 23 October 2008 (UTC)[reply]

Error in "Lindeberg condition"?

I suspect the Lindeberg condition in the section discussing non-identically distributed random variables might be incorrect. Possibly there should be an average instead of a plain sum, or someting like that. I suspect that in the current form, the condition is essentially never satisfied. Not sure about this, though. I'll check this if I find the original reference. --130.231.89.82 (talk) 09:19, 22 October 2008 (UTC)[reply]

Why incorrect? The sum in it corresponds to the sum in the definition of

s_{n}.

It means that deviated values do not contribute to the variance (in the limit). I believe it is correct. Boris Tsirelson (talk) 20:21, 23 October 2008 (UTC)[reply]

Error in Central limit theorem for Gaussian polytopes

In this new (sub)section, there is "for all t in R" ... but t only appears as a dummy integration variable. Melcombe (talk) 16:26, 28 October 2008 (UTC)[reply]

Oops... You are right, thank you. I'll correct it soon. In fact, I see that the formulation is clumsy; we can just say "converge in distribution". (In contrast, "CLT for convex bodies" includes some uniformity over all bodies or densities.) Strangely, the clumsy formulation is used by the authors. Boris Tsirelson (talk) 18:55, 28 October 2008 (UTC)[reply]

Asymptotic normality for statistical estimators

To User:Melcombe (and maybe someone else): Probably you could add some results about asymptotic normality for statistical estimators. Boris Tsirelson (talk) 07:08, 4 November 2008 (UTC)[reply]

Re-averaged?

Back in 26 October 2008, an anonymous editor inserted the word "re-averaged" into the definition of the CLT in the first sentence of the article. It has been there ever since, and has been echoed in countless internet websites. But what is a "re-averaged sum"? I submit that this edit was well-intended, but meaningless and very confusing.

Further, in Theorems 1 and 2 in the article by Le Cam, there is no reference to identically distributed random variables. Indeed, the point of many modern versions of the CLT is that the sequence of independent random variables need not be identically distributed. I submit that the phrase "identically distributed" does not belong in the opening paragraph at all.

Accordingly, I have tried to rewrite the lead paragraph so that it is both correct and generally understandable. I hope it is acceptable to all of you. —Aetheling (talk) 20:01, 11 March 2009 (UTC)[reply]

I think the term was meant to be linked to the sentence (which you have left) "They all express the fact that a sum of many independent random variables will tend to be distributed according to one of a small set of "attractor" distributions." ...presumably this is a nod in the direction of stable distributions. Given this it may be that the "re-averaged" was meant to be something like "rescaled", referring to the fact that divisor other than n is required for a non degenerate limit distribution.

Or did it mean, subtract the expectation? Boris Tsirelson (talk) 10:07, 12 March 2009 (UTC)[reply]

Turing and the CLT

Surely the reference to Turing's fellowship disertation should refer not to Cambridge University but to King's College Cambridge! 86.22.72.56 (talk) 18:13, 10 November 2009 (UTC)[reply]

Help, clarification

Can we be careful to define "sum of distribution functions". Clearly this is not the process say SUM(U,V) = (U + V)/2 which indeed yields a distribution function with all the nice properties, and looks like a "sum". The usage of convolution needs to be well explained and tied to the notion of the "distribution of the means". If we have a random process which, say, is constrained outputs values in the range [0,1] then clearly the means will never reach a limit which is a normal distribution.

   Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}\,,

the above appears on the page missing the forward slash for division between sigma and sqrt(n) —Preceding unsigned comment added by 76.76.233.148 (talk) 23:31, 14 November 2009 (UTC)[reply]

Correction to correction above: Never mind...neglected to note that the numerator contains just the sum and not the mean so formula is correct (but arcane) as it stands... —Preceding unsigned comment added by 76.76.233.148 (talk) 23:52, 14 November 2009 (UTC)[reply]

Lede Shortcomings

This appears to be an excellent article, particularly for those intelligent and informed enough to understand it or who are already aware of the subjects import. However the Lede fails to impart that it is a fact (since some equate "theorem" with "theory" and hence something that might be only conditionally true), it's relation to the quotidian use of statistical sampling and inference, or the fact that it's one of the most important results of mathematics. 72.228.177.92 (talk) 20:01, 12 February 2010 (UTC)[reply]

It also gives the impression it's a recent result. 72.228.177.92 (talk) 16:52, 13 February 2010 (UTC)[reply]

Possible copyright infringement, or possible violation of WP:CC-BY-SA?

Stpasha has spotted that parts of this article bear a close resemblance to the entry in the Global Encyclopedia of Welfare Economics and tagged it for Copyright violation. I had a quick look and started wondering if it could be their entry that's copied from ours. The book was published in 2008 or 2009 by Global Vision Publishing House. I notice that the 2nd and 3rd sentences of the 2nd paragraph on p25 should be a single sentence, as in our article. Also the figure on p25 has no caption, is not referred to in the text, and has axis labels that are much too small to read, as does File:Central limit thm.png. Qwfp (talk) 07:10, 1 June 2010 (UTC)[reply]

Indeed, I changed the {{copyvio}} to {{backwardscopyvio}}. Apparently the author of the book plainly copied the material, without even trying to understand it... As a result many formulas in the book contain small errors: on p.26 φ_Y turned into φY, 2n into 2_n, 1/n^1/2 into 1/n^{1 2}, and so on. The book is really strange, at least the ISBN published on its title page corresponds to some other entry in the global catalog.

Now apparently that book reused the material without attribution (at least I could not find any), and it claims copyright to its text, which is a violation of the WP:CC-BY-SA. Logically, this means that Wikipedia should file a copyright infringement case against the publisher, but I'm not sure how and who is supposed to do that. // stpasha » 08:03, 1 June 2010 (UTC)[reply]

Me neither. Maybe if you leave the listing at Wikipedia:Copyright problems/2010 June 1 some knowledgeable admin will take it forward if appropriate? Qwfp (talk) 08:12, 1 June 2010 (UTC)[reply]

It is absolutely clear that the book copied WP. Indeed, it contains the (rather long and complicated) "Beyond the classical framework" section, written by me solely, just for WP, and never sent by me to any other place. Boris Tsirelson (talk) 11:49, 1 June 2010 (UTC)[reply]

Hi. The listing came current for review today. I'm afraid that Wikipedia cannot file a copyright infringement case, as the Foundation does not own the text on Wikipedia. You do. This would be something that would have be undertaken by one of the contributors of substance to the article. The process for complaining about websites is listed at Wikipedia:Mirrors and forks, under Wikipedia:Mirrors and forks#Non-compliance process. Book publishers are a different matter; you can contact them, but as it is not quite so simple for them to resolve as it is for a webmaster, you may find them less responsive. Their e-mail address is listed on the back cover of the book. --Moonriddengirl ^(talk) 14:22, 9 June 2010 (UTC)[reply]

“Beyond the classical framework”

It appears to me that some of the material in that section is way too beyond, even beyond the scope of the article. The major “feature” of the CLT is not the normal distributional limit, but the fact that this result holds irrespective of the distribution shapes of each individual term. (This is by the way why de Moivre had nothing to do with the CLT, regardless of what Tijms might have written.) Now if we look at the “beyond...” section, we’ll find that

“Under weak dependence” section is basically ok, although I'm somehow suspicious of the requirement for the twelfth momenth of X — could have been a typo, especially since later we replace that with (2+δ)-th moment, where δ is presumably small.
“Martingale CLT” is also ok, although the proper name is “martingale difference CLT”. This theorem, as well the previous section should be properly attributed to whoever first proved those theorems, not to the popular textbooks by Billingsley or Durett.
“Convex bodies” section looks ok, only it has to be paraphrased in terms of the convergence in distribution (even if its weaker than convergence in total variation), since we don't usually mix the CLT with Berry-Esseen-type results.
“Lacunary trigonometric series” is not ok, because it lacks both the “individuality” for the r.v's X_k, and the distributionlessness property. This is more of a theorem that states that sine function may serve as a randomization device, but this is definitely not the CLT-type result.
“Gaussian polytopes” is also not ok. The variables A_k are standard normal, and the conclusion of the theorem is probably sensitive to this assumption. Also, the statement clearly does not hold in one dimension, since in that case it is governed by the FTG theorem. In any case, this result about polytopes is more suitable for the extreme value theory than the CLT article.
“Linear function of orthogonal matrices” is not ok either — it deals with uniformly distributed matrices M.
“Subsequences” is too confusing — not clear what is Ω, what is 1, and how can we have X_n→0 while at the same time X_n²→1?

// stpasha » 19:11, 1 June 2010 (UTC)[reply]

Well, your last question is easy to answer: yes, it may happen, and happens often; for example, this happens for any i.i.d random variables with mean 0 and variance 1. Note the weak convergence (a link is given for it). What is Ω? A probability space, of course. Boris Tsirelson (talk) 20:05, 1 June 2010 (UTC)[reply]

I'm probably being dumb today, but I'm still not getting it. Usually a random variable is defined as a measurable function X_n:Ω→R, and there seems to be no point in considering the L²-norm on the underlying probability space? Especially since we almost never even have to consider that space, thanks to the Kolmogorov’s theorem. Now, the notion of weak convergence is defined only for the Hilbert spaces, and there is no indication which Hilbert space is being considered, since usually the Lp space refers to a Banach space (with norm), not to Hilbert space (with inner product). And the inner product on space R is equivalent to multiplication by scalar, so that weak convergence is the same as the usual convergence. Now, the intent of the author could be interpreted as the L²(Ω) being the space of all square-integrable random variables defined on Ω, but in that case the sequence X_n could not converge to zero because “zero” is not a proper density function. Thus, I remain confused here. What is L²(Ω)? // stpasha »

L²(Ω) is the Hilbert space of all square-integrable random variables (real-valued) on the probability space Ω. This is evidently a special case of L² space in general; the only tiny change is that the measure of the whole given measure space must be 1. Otherwise it is just the usual Hilbert space L². There is no "L²-norm on the underlying probability space", but there is such norm on random variables. "The inner product on space R" is also irrelevant.

"the L²(Ω) being the space of all square-integrable random variables defined on Ω, but in that case the sequence X_n could not converge to zero because “zero” is not a proper density function" — thus you DO understand what is L²(Ω)! But why do you mention density function? Random variables in general need not have it; and even when they do, this is irrelevant.

"we almost never even have to consider that space, thanks to the Kolmogorov’s theorem" — this is your POV; and if so, then, why not expel from WP any mention of prob space? Or at least de-emphasize them? (Starting probably with Standard probability space.) My POV is opposite, and is emphasized in my article "probability space" in Citizendium. Boris Tsirelson (talk) 05:39, 2 June 2010 (UTC)[reply]

And if you do not need probability space then I do not understand why do you need the Kolmogorov’s theorem. Its goal is just to provide you with a probability space! If you like to use only this probability space — OK, no problem; the weak convergence applies; the final results depend only on the joint distribution. Boris Tsirelson (talk) 06:42, 2 June 2010 (UTC)[reply]

Another "beyond"

Can I prompt a look at Subindependence and its references? Those references imply that the CLT can be proved using subindependence rather than independence and, if so, it may be worth fitting this into the article at some appropriate place. Melcombe (talk) 10:25, 2 June 2010 (UTC)[reply]

Rather a curiosity. With some effort it is possible to construct examples of subindependence (in the lack of independence). But I doubt that it is useful at least once. Boris Tsirelson (talk) 14:50, 2 June 2010 (UTC)[reply]

But why does it have to be useful? Someone looking for good information about the CLT might either expect, or be happy to find, such information. After all, a lot of effort in maths is placed into finding weaker/weakest conditions where a known result holds. And I would say "I doubt that it is useful at least once" about many of the "extensions" already noted in the article. Still, unless someone else thinks otherwise, I won't push harder for its inclusion. Melcombe (talk) 09:34, 4 June 2010 (UTC)[reply]

How does it converge?

The standard result is that the √n times the average converges in distribution to a normal with mean zero. The definition of convergence in distribution uses the cumulative density function; and it is known that generally when a sequence converges in distribution it does not necessarily mean that the corresponding densities will also converge. So the question is: do the densities converge under the conditions of the CLT, or not?

Another question is about the speed of convergence. If you look at the picture in the article, it seems like even such “pathological density” already becomes close to normal in a sample of only 4. There are probably cases when the convergence is slower, say some heavily skewed distributions. I know there is the Berry–Esseen theorem, and there is a lower bound on the value of constant C in that inequality, which probably means it gives an example of the most “pathological distribution”. Also, what is the upper bound (if any) on the quantity E[|X|³] / E[X²]^3/2, which also participates in that inequality? In other words, which density function would provide the “worst possible” convergence rate towards the normal limit? // stpasha » 22:48, 9 June 2010 (UTC)[reply]

About densities: first of all, they need not exist; for binomial distributions, for example. But if they exist then they probably converge; it seems I saw it, but for now I am not sure. Boris Tsirelson (talk) 05:31, 10 June 2010 (UTC)[reply]

Might the “worst possible” convergence rate occur when E[|X|³] doesn't exist but E[X²] does, so the Berry-Esseen theorem doesn't apply but the classical central limit theorem still holds? That suggests a t₃ distribution, or a half-t₃ distribution, would be pretty bad. Qwfp (talk) 08:02, 10 June 2010 (UTC)[reply]

Even if the given distribution of X₁ has a density, it may happen that the densities f_n of

S_{n}/{\sqrt {n}}

are unbounded on every interval, and fail to converge at x for every rational x. See W. Feller, "An introduction to probability theory and its applications", vol. 2 (1966), Sect. 15.5. Boris Tsirelson (talk) 13:52, 10 June 2010 (UTC)[reply]

On the other hand, if the characteristic function of X₁ is integrable then the densities exist and converge uniformly to the normal density. (Ibid.) Boris Tsirelson (talk) 13:56, 10 June 2010 (UTC)[reply]

"Also, what is the upper bound (if any) on the quantity E[|X|³] / E[X²]^3/2" — surely no upper bound; finiteness of the second moment does not imply finiteness of the third moment; cutting off large values we get finite but large third moment. Boris Tsirelson (talk) 14:02, 10 June 2010 (UTC)[reply]

Here is what I see in R. Durrett, "Probability: theory and examples" (second edition), Sect. 2.4(d):

Heyde (1967) has shown that for

0<\delta <1

\sum _{n}n^{-1+\delta /2}\sup _{x}|F_{n}(x)-N(x)|<\infty

if and only if

E|X|^{2+\delta }<\infty .

For this and more on rates of convergence see Hall (1982).

P. Hall (1982) "Rates of convergence in the central limit theorem", Pitman Pub. Co., Boston, MA

C.C. Heyde (1967) "On the influence of moments on the rate of convergence to the normal distribution", Z. Warsch. verw. Gebiete 8, 12-18.

Boris Tsirelson (talk) 14:16, 10 June 2010 (UTC)[reply]

Diagram in introduction

Is the diagram in the introduction about tossing one fair coin really an example of the CLT? There is only one random variable here, the outcome of the coin toss. It looks more an illustration that the binomial distribution tends to become the normal distribution for large n. Additionally, the diagram is confusing because it looks like it is an illustration of the sentence in the introduction about how the distributions of the outcomes of a large number of UNFAIR coins (not just one coin) taken together become normally distributed. By the way, I found that a very helpful example. —Preceding unsigned comment added by 86.85.210.34 (talk) 19:04, 18 January 2011 (UTC)[reply]

If a coin is tossed twice, we get two (independent) random variables (and their sum, if we add them). --Boris Tsirelson (talk) 06:08, 19 January 2011 (UTC)[reply]

See de Moivre–Laplace theorem : "In probability theory, the de Moivre–Laplace theorem is a normal approximation to the binomial distribution. It is a special case of the central limit theorem." --Qwfp (talk) 10:42, 19 January 2011 (UTC)[reply]

Strange unnecessary theorem hypothesis

For the classical CLT, the article currently says, "Suppose we are interested in the behavior of the sample average of these random variables [blah]" in the assumptions. I bet the result also holds if we're not interested in the sample average at all. I think this should be fixed. How about "Let S_n = ... be the sample average..." and then just go from there? — Preceding unsigned comment added by 192.150.186.248 (talk) 18:02, 1 June 2011 (UTC)[reply]

The article is too big already

Don't you think that the section "history of CLT" could be an article itself? There's so much to tell about it! I expanded the section "multivariate central limit theorem" myself, but I also think it could be a separate article, due to the natural difficulties involving random vectors rather than random variables. Luizabpr (talk) 15:26, 25 June 2011 (UTC)[reply]

Yes to both ideas! A separate article on the history of the CLT would be an excellent idea, as would a separate article on the multivariate CLT. I will add this to my list of desirable projects, but realistically I can't get to it for at least three months. Maybe someone else should take a crack at it.—Aetheling (talk) 17:17, 25 June 2011 (UTC)[reply]

Sum of discrete yields Poisson

I added the clarification that the discrete variables must be closed under summation. That means we can consider the discrete points to be natural numbers (0,1,2,...). The sum of two natural numbers is a natural number, so there is no way a normal distribution will yield the probabilities, since the normal distribution is defined on the reals, not on the natural numbers. The mean of a (possibly infinite) number of discrete variables will yield a real number, but the statement was not about the mean, it was about the sum. PAR (talk) 02:36, 15 August 2011 (UTC)[reply]

The CLT holds regardless of whether the summands are continuous or discrete. In either case, the sum has to be rescaled before its distribution will converge to normal. If the summand is discrete, successive rescaled sums will be concentrated on a finer and finer grid and so their distribution becomes continuous in the limit. Spacepotato (talk) 03:08, 16 August 2011 (UTC)[reply]

Yes. For example, toss the fair coin n times... It is Moivre-Laplace theorem, of course. Poisson is also a limit of binomials, but here p must depend on n (so that np converges to some λ). Boris Tsirelson (talk) 20:03, 16 August 2011 (UTC)[reply]

But the question here is whether or how much of this belongs in an artcle on the central limit theorem and how much might be better said in another article. I'm not sure that it would fit within convergence of random variables, but it might be possible to create an article "convergence of distributions". Alternatively it might be reasonable to place relevant stuff in asymptotic distribution. Melcombe (talk) 08:53, 18 August 2011 (UTC)[reply]

Well, I guess I am guilty of one of the things I dislike most - editors making contributions without at least an 80% understanding what they are writing. I am interested in continuous stable distributions, and came across a paper by Lee [1] which deals with both continuous and discrete stable distributions. He makes the following statement:

The Central Limit Theorem for discrete variables states that the limiting distribution of the sums of i.i.d. discrete variables with finite mean is Poisson.

and

The Poisson distribution can be thought of as the discrete analogue of the continuous Gaussian distribution, as they are both the limiting distributions for sums of i.i.d. variables without power-law tails, and they are the only stable distributions for which all moments exist.

Well I thought, cool, it seems right, and I put it in. After two reverts, I decided to dig into it, especially since it seems (as mentioned above) that the binomial distribution is discrete, yet I cannot see how it ever becomes more Poisson-like than Gaussian as the number of samples increases. Lee mentions a paper by Steutel and Van Harn [2] which I cannot access, but there is a google book by them at [3] which may be of use. The bottom line is, rather than delve into Steutel and Van Harn, etc. and go off on discrete stable distributions, let me ask, what are they talking about? Is Lee simply wrong, or am I misinterpreting his correct statement? PAR (talk) 04:13, 19 August 2011 (UTC)[reply]

Lee appears to think that if you sum many "discrete [random?] variables" that the "limiting distribution [...] is Poisson." Meaning, I guess, that if I summed the number of heads from 50 coin tosses I might get Siméon Denis Poisson. This is not true. 0¹⁸ (talk) 04:51, 19 August 2011 (UTC)[reply]

Well, you do get approximately Poisson, if your coin is very unfair, say, gives "heads" with probability 1/50 (rather than 1/2). Boris Tsirelson (talk) 08:06, 19 August 2011 (UTC)[reply]

WP:RS#Scholarship bullet 3 notwithstanding, I wouldn't rely on PhD dissertations as sources for articles on high-profile well-studied longstanding topics such as the central limit theorem. If something's important enough to be in the Wikipedia article for such a topic it's bound to appear in secondary sources such as books and review articles. Lee appears to be misinterpreting Steutel and Van Harn. The paper you say you can't access (though it's stated to be open-access) concludes "Corollary 3.3 seems to suggest that the distribution of a sum of i.i.d. random variables with only a first moment should be approximated by a discrete stable Poisson distribution rather than by a stable degenerate distribution. If higher moments exist, a normal approximation would, of course, be preferable". Qwfp (talk) 13:15, 19 August 2011 (UTC)[reply]

I agree with this sentiment. The next (and final) sentence says that the paper does not develop theory of the limiting distribution of discrete random variables. 0¹⁸ (talk) 14:41, 19 August 2011 (UTC)[reply]

I'm sticking with my guns, you might find that a random variable Y that is set equal to the sum of many random variables is Poisson distributed, but it will never be the man the distribution is named for. Seriously though, there is no reason you can't use the CLT as stated for discrete variables. It is an approximation, and it works (very well) for summed binomials and others. The concept of a CLT having a Poisson distribution as its limit is non-sensical, it would have to be a Poisson divided by some factor so that the mean could approach the population mean. 0¹⁸ (talk) 14:34, 19 August 2011 (UTC)[reply]

The limiting behavior depends on what sort of limit you take. If you take the limit of the sequence
   Heads in 1 flip of a fair coin (Pr(heads)=1/2)
   Heads in 10 flips of a coin with Pr(heads)=1/20
   Heads in 100 flips of a coin with Pr(heads)=1/200
   ⋮
then the distribution of the number of heads will become Poisson in the limit, with parameter 1/2.

On the other hand, if you take the limit of the sequence
   Heads in 1 flip of a fair coin (Pr(heads)=1/2)
   Heads in 10 flips of a fair coin
   Heads in 100 flips of a fair coin

   ⋮
   Heads in 10ⁱ flips of a fair coin
   ⋮
then the distribution of the number of heads will become normal, but of course, only after rescaling—the number of heads itself goes to infinity a.s., and what converges to normal is the Z-value,

{\frac {{\rm {(\#\ of\ heads)}}-10^{i}/2}{10^{i/2}/2.}}

So, the limit of a sum of i.i.d. discrete random variables may be Poisson(λ), but a necessary condition is that we use different random variables in each step in our approach to the limit in such a way that the total mean of the sum approaches λ. Also, this is not the Central Limit Theorem.

If we take the same random variable and add more and more i.i.d. copies of it together, then the result does not depend on whether the variable is continuous, discrete, or a mixture of continuous and discrete.

If the variance of the variable exists and is positive, then, after rescaling, the limit distribution will be normal.
If the variance of the variable does not exist, then we may get a non-normal stable distribution after rescaling.
If the variance of the variable does not exist, it is also possible that there will be no way to rescale the sum so as to get a limiting distribution.

Finally, the so-called discrete stable distributions discussed by Steutel and van Harn may occur as limiting distributions of sequences of sums in certain cases, but it will require either changing the random variable summed as we go along (as we did to get a Poisson limit), or rescaling not in the usual way, but using a stochastic rescaling given by Steutel and van Harn. Stochastic rescaling also provides another way to make the limiting distribution Poisson.

Spacepotato (talk) 23:32, 19 August 2011 (UTC)[reply]

~~reread that quoted sentence by Qwfp.~~ 0¹⁸ (talk) 00:26, 20 August 2011 (UTC)[reply]

Why reread it? Spacepotato is right. Boris Tsirelson (talk) 12:49, 20 August 2011 (UTC)[reply]

upon rereading Spacepotato's comment, I agree. 0¹⁸ (talk) 19:14, 20 August 2011 (UTC)[reply]

Rubbish in Multivariate CLT

Why does the string ?UNIQ79b599716edd6743-math-0000007B-QINU appears in the statement of the multivariate version of the CLT? Can anyone clear it? Can't edit the article. — Preceding unsigned comment added by Reempe123 (talk • contribs) 01:58, 19 October 2011 (UTC)[reply]

I do not see anything like that in the article (neither in wikitext nor in rendered text). Boris Tsirelson (talk) 07:02, 19 October 2011 (UTC)[reply]

I was seeing something very similar (different numbers in the middle) just now. I checked the wikitext and removed an empty <math></math> pair, which seems to have fixed it. Qwfp (talk) 10:04, 19 October 2011 (UTC)[reply]

Ugly Article

For such an incredibly important topic this is a pretty ugly and unwieldy article. There is talk of the effect of repeated adding but one should talk about how the distribution of a sum of two r.v.'s involves a convolution and it is the repeated re-convolution of densities that attenuates the non-Normal characteristics of every density with finite variance. There should be discussion of stable distributions and the fact that the normal is the only stable distribution with finite variance and that this is why it is the end point of the unlimited convolution of any set of distributions of finite variance with themselves. Simply put, the higher order features are progressively diminished in the repeated convolution operation. This is actually a very easy thing to understand and removes the feeling of "magic" from the operation of the CLT. It works for a reason and that is the reason. If you limit yourself to finite variances, you end up with the normal. If you had other constraints, then you might end up with one of the other stable distributions, such as the Levy flight.

Also how about a little discussion of those pathological distributions that fail to satisfy the CLT? It's not hard to find examples, after all. Cauchy distribution? — Preceding unsigned comment added by 69.142.244.49 (talk) 01:52, 28 March 2012 (UTC)[reply]

WP:Be bold. Qwfp (talk) 15:29, 28 March 2012 (UTC)[reply]

But those (normalized) iid sums converge to a stable distribution. These are not really pathological, just heavy tailed, and have many important applications. Mathstat (talk) 00:08, 29 March 2012 (UTC)[reply]

"Convex body" - more general case

The phrase "The condition ƒ(x₁, …, x_n) = ƒ(|x₁|, …, |x_n|) is replaced with much weaker conditions: E(X_k) = 0, E(X_k²) = 1, E(X_kX_ℓ) = 0 for 1 ≤ k < ℓ ≤ n." is removed by User:Jura05 with the explanation that "logconcave is needed anyway". Of course, logconcavity is required, but what of it? Now the reader does not know, what does "more general" mean in this case. As for me, it was better before this edit. Boris Tsirelson (talk) 08:10, 4 June 2012 (UTC)[reply]

Why parameter of location?

Recently, User:Rphirsch replaced "mean" (in the lead) with "parameters of location". True, many natural statistical estimators are asymptotically normal (for large samples). In particular, location; but not just location. However, what is called "Central Limit Theorem"? Is it about the sum, or equivalently, the mean (as I always believed)? Or about location parameters? Or about arbitrary parameters? Or about all cases of asympototic normality? Boris Tsirelson (talk) 16:50, 19 March 2013 (UTC)[reply]

Ref 19 does not seem to support sentence

In Extensions to the theorem:Products of positive random variables

there is a ref 19, which is not supporting the phrase (at least the paper's text does not mention any square integrability. I'm not a specialist in the field so, can someone double-check it?Jorgecarleitao (talk) 12:10, 13 June 2013 (UTC)[reply]

Ref 19 begins with a strange phrase (and by the way, it does mention square integrability): "It is well known that the products of independent, identically distributed (iid), positive, square integrable random variables (rv’s) are asymptotically lognormal. This fact is an immediate consequence of the classical central limit theorem (clt)." Square integrability of a random variable is more than enough for ensuring square integrability of the positive part of its logarithm; but it cannot ensure square integrability of the negative part of the logarithm (since the random variable may take small values too often). Probably the quoted phrase should not be taken too seriously; it is not the formulation of a result. Results of the paper are on a different matter. Not a relevant source. Boris Tsirelson (talk) 14:24, 13 June 2013 (UTC)[reply]

The central limit theorem and Wikipedia

With certain reservations in relation to Wikipedia it is possible to paraphrase a conclusion, conformable to the central limit theorem: If the material of article of Wikipedia is the sum of many independent texts of participants, each of which makes a small contribution relatively the general result, at increase in number of participants objectivity and completeness of coverage of a resultant material aspires to the ideal.Vladimir Baykov (talk) 19:22, 21 July 2013 (UTC)[reply]

normal distribution with a mean equal to m and a variance equal to s² / n

All the tutorials, e.g. Khan and lss academies, teach that the normal distribution will have the population mean and 1/n the population variance. Why does this article silence down these key facts? Why does not tell more than the distribution is Normal? --Javalenok (talk) 11:36, 17 November 2013 (UTC)[reply]

From Central limit theorem#Classical CLT: "For large enough n, the distribution of S_n is close to the normal distribution with mean µ and variance ⁠σ²n⁠". Qwfp (talk) 11:52, 17 November 2013 (UTC)[reply]

Galton's quote

It doesn't appear to me that this quote is dealing with the central limit theorem. Without knowing this actual reference, Galton generally preferred the median to the mean which is why he wants his group of arranged in order so that the median can easily be identified. — Preceding unsigned comment added by Jonnano (talk • contribs) 13:47, 23 November 2013 (UTC)[reply]

Recent incompetent edits

Edits of 37.100.122.230 of Jan 9 introduce a number of evident errors.

In particular: multiplies by sigma in order to get standard (instead of dividing by sigma). (Before the formulation was correct: convergence to N(0,\sigma^2).)

Random variables converge to the normal density. (Before the formulation was correct: convergence in distribution.)

Uniform convergence of cumulative probabilities to the normal density. (Before the formulation was correct: to the normal cumulative function.)

I revert it all. Boris Tsirelson (talk) 20:36, 9 January 2014 (UTC)[reply]

Mistranslation of Polya

the article contains a translation of a passage from a paper of George Polya, which begins as follows:

"The occurrence of the Gaussian probability density 1 = e^−x² in repeated experiments, in errors of measurements, . . . "

But the mathematical expression 1 = e^−x² makes no sense in this context. Instead, the Gaussian probability density (with mean μ = 0) is as follows:

         d(x)  =  (1/(sqrt(2π) σ)) e^−x²/2σ²

for some positive standard deviation σ.

In the simplest case, which is the "standard" case with μ = 0 and σ = 1, this becomes

         d(x)  =  (1/sqrt(2π)) e^−x²/2

Daqu (talk) 20:16, 4 June 2015 (UTC)[reply]

Indeed. But someone should take the source and look what is really written there. Boris Tsirelson (talk) 04:51, 5 June 2015 (UTC)[reply]

Generalised theorem

Someone added this section by just copying Stable distribution#A generalized central limit theorem, not even bothered by technical incompatibilities. I've corrected the refs as needed (but still, their style is different from others). I am not sure, however, whether it was a good idea to copy the whole section hereto, or not. Boris Tsirelson (talk) 14:57, 12 June 2015 (UTC)[reply]

iterates

Understanding the article's very first sentence depends crucially on understanding the word iterates. I tried to find a Wikipedia article on what an iterate is, but I got redirected to Iteration, where the term is not mentioned. --217.226.68.242 (talk) 22:16, 1 June 2016 (UTC)[reply]

Yes. I guess what is meant, but I never saw this term. I guess, it is used by statisticians; really? Boris Tsirelson (talk) 05:22, 2 June 2016 (UTC)[reply]

External links modified

Hello fellow Wikipedians,

I have just modified 2 external links on Central limit theorem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 11:03, 18 November 2016 (UTC)[reply]

Twelfth moment

In the (first) theorem in Section "CLT under weak dependence" it may seem that X_n¹² is an evident typo and should be X_n². But no, it is not! The twelfth moment appears indeed in the theorem according to the source (Billingsley). Moreover, the stronger version of the theorem (given in the same subsection) shows the price (in terms of α_n) of using the moment 2 + δ (but not 2). Boris Tsirelson (talk) 11:49, 8 December 2016 (UTC)[reply]

There is a mix-up here between an observation and a random variable

From the article:

Let {X1, …, Xn} be a random sample of size n—that is, a sequence of independent and identically distributed (i.i.d.) random variables drawn from a distribution of expected value given by µ and finite variance given by σ2.

First, I am not a specialist in this field. But, first it says that {X1, …, Xn} is a random sample, and after that it says that every X is a random variable. But a random variable is something else, it is defined here: https://wiki.riteme.site/wiki/Random_variable.

And the same confusing appears at the beginning of the article. The definition seems good to me, but the example is related to computing the mean of a number of observations, and in the definition it says that "independent random variables are added". This is very confusing from my point of view. Even if the example is correct, an example which should help understand the definition should be provided. Costinb7 (talk) 17:00, 29 April 2019 (UTC)[reply]

Require a section on Feller's theorem

As stated in Page 348 in 'Measure Theory and Probability Theory' by K. B. Athreya, S. N. Lahiri (2006), Feller's theorem is an alternative method to show that the Lindeberg's condition holds and it can also be used to disprove convergence to a normal distribution using proof by contradiction. I think it is a good idea to state Feller's theorem in the article. --Beat of the tapan (talk) 05:28, 11 August 2019 (UTC)[reply]

Which one of Feller's theorems is meant? Boris Tsirelson (talk) 06:49, 11 August 2019 (UTC)[reply]

If

\forall \epsilon >0

,

\lim _{n\rightarrow \infty }max_{1\leq j\leq n}P(|X_{j}|>\epsilon s_{n})=0

and

S_{n}/s_{n}\rightarrow ^{d}N(0,1)

then

\{X_{j}\}_{j}

satisfies the Lindeberg's condition. Where

S_{n}=\sum _{j=1}^{n}X_{j}

and

s_{n}^{2}=Var(S_{n})

(rough notation).

This is typically used to prove against the CLT using proof by contradiction. Thinking about it, this may be a bit off the radar for this article, I think it should be implemented in the Lindeberg's condition article instead.--Beat of the tapan (talk) 10:35, 12 August 2019 (UTC)[reply]

Meaning of limit arrow superscripts

At various points in the article, the notations ${\xrightarrow {a}}$ and ${\xrightarrow {d}}$ are used, but I don't see anything that specifies what they mean. How are they defined? Can we add clarification of their meaning to the article? The-erinaceous-one (talk) 23:25, 19 April 2024 (UTC)[reply]

Hi @The-erinaceous-one,

There is no "

{\xrightarrow {a}}

" in the article, only "

{\xrightarrow {d}}

"; but for some reason the top of the formula gets a bit truncated in some displays, resulting in something that looks like

{\xrightarrow {a}}

.

I noticed that a few weeks ago, but since the code is correct and this seems to be more of a rendering issue I thought that changing the code here was not the best way to go about it, and I wasn't sure what to do... It is a problem, though, so we should probably try to do something about it.

The arrow

{\xrightarrow {d}}

is the standard notation for convergence in distribution. I think that its use in the article is relatively OK, because the first times that it is used its meaning is recalled — see for instance the statement of the Lindeberg–Lévy CLT and of the Lyapunov CLT, which are stated in words and then in symbols.

Best, Malparti (talk) 17:02, 21 April 2024 (UTC)[reply]

Thanks for the clarification. I created a bug report in Phabricator for the math rendering problem.

Tracked in Phabricator
Task T363081

The-erinaceous-one (talk) 08:00, 22 April 2024 (UTC)[reply]

@The-erinaceous-one: great, thanks! Malparti (talk) 13:37, 22 April 2024 (UTC)[reply]