Talk:Normal distribution/Archive 2

This is an archive of past discussions about Normal distribution. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Archive 4

IQ discussion

I propose that the large IQ section be replaced by a link to somewhere else. It is barely relevant to this page. Comments? --Zero 11:59, 5 Feb 2005 (UTC)

OK, if you have an objection please state it now as I'm planning to come back in a couple of days to delete that whole section. --Zero 09:05, 22 Mar 2005 (UTC)

Just for the record: no objection if what you're proposing is to slice out the IQ section and merge it into a more appropriate article. Make sure you also move the references that apply to the IQ section and remove those that are no longer needed from this page. --MarkSweep 09:59, 22 Mar 2005 (UTC)

No objections if it's moved elsewhere, but it is somewhat relevant to the normal distribution. It's often used as an example of what is a normal distribution, and unlike the other examples, the entire test is formed/graded/whatever you want to call it to give a normal distribution. It'd be nice if you could leave a summary of the IQ section in this article though. --jag123 01:55, 25 Mar 2005 (UTC)

I agree. Mentioning that IQ tests are constructed to have a normal distribution might be fine. The gory details really belong somewhere else, though, as they digress from the topic. --Spin2cool 03:46, 07 Feb 2006 (UTC)

Peer review

I've listed this article for peer review. It's pretty good already and could perhaps soon become a featured article. If it improves significantly, someone should nominate it as a featured article candidate. --MarkSweep 05:56, 22 Mar 2005 (UTC)

In use

You jumped the gun way to fast there MarkSweep. :) Cburnett 08:57, 22 Mar 2005 (UTC)

I guess we both were editing it before the servers went dead, and then when they came back – boom. Anyway, I'm done with the merges now.

About my changes: There is some general disagreement about how to handle math inside regular text, using either <math> or low-level formatting and HTML entities. I prefer the former, since it leaves more choice to the user. Still, the consensus seems to be that mixing images and text should be avoided (readers can still ask for it by setting their preferences). My math rendering preferences are set to "HTML if possible or else PNG", and I tend to format things in such a way that <math> in regular paragraphs will render as HTML. For displayed formulas, I still prefer PNG for anything "large" (sums, integrals, fractions, etc., which look ugly and/or plain confusing in pure HTML).

Another thing: Is there a consensus about what the second parameter of the normal distribution is? We seem to be going back and forth between "variance" and "standard deviation". I've seen encountered three options: consistently use variance, consistently use sd, or inconsistently use both. Given a choice, I'd prefer being consistent. I've changed the text so that it says "variance" almost everywhere, except when talking about the "scale parameter", which is the standard deviation. --MarkSweep 09:39, 22 Mar 2005 (UTC)

I prefer PNG always so I change everything to math tags so that I get my PNG. :)

I've always seen variance as the second parameter and seems more natural because the multivariate has the covariance matrix (I've *never* heard of a standard deviation matrix). Cburnett 15:49, 22 Mar 2005 (UTC)

You can get PNG by setting your math rendering preference to "Always render PNG". I only set my preference to "HTML if possible" for purposes of editing.

About standard deviation as the second parameter, Gelman et al. describe σ as the scale parameter, but then define N in terms of σ² and (of course) use the variance in most cases. The normal distribution functions of R are parameterized in terms of the standard deviation. In the GNU Scientific Library, the bivariate Gaussian pdf is parameterized in terms of the two standard deviations and the correlation coefficient. But your point about the general multivariate case is very convincing, so I guess we should use "variance" throughout. --MarkSweep 20:37, 22 Mar 2005 (UTC)

Correct, except I can't get PNG rendered formulas if some are entered as raw HTML like numerous articles do. So I convert them all to tex and go from there.

I don't have it with me at the moment, but I'm fairly sure the Statistical Inference by Casella & Berger primarily use variance. But, still, the univariate can be derived from the multivariate and the multivariate uses a covariance matrix. I don't know that a square root matrix of a symmetric matrix always exists. Meaning that A is a square root matrix of B if A*A=B. I know it will always hold for a hermitian matrix but you can't generalize that to a generic, symmetric matrix. Anywho... Cburnett 00:15, 23 Mar 2005 (UTC)

Certainly such a square root exists when the matrix is symmetric and the entries are real; see spectral theorem. But I've always seen variance rather than SD used as the parameter. Michael Hardy 01:25, 23 Mar 2005 (UTC)

Uncorrelated implies independent?

From various discussions and searches (including one at Talk:Kalman filter) it appears to be widely believed and even published that if X and Y are both normally distributed and uncorrelated, then they are independent. I'm sure this is false and have created a simple example in which |X| = |Y| but they are uncorrelated because the sign is random. The misconception seems to be widespread enough that it would be worth adding something to this page clarifying the mistake. — ciphergoth 13:43, 2005 Apr 28 (UTC)

Well the answer from correlation is no. That article notes:

If the variables are independent then the correlation is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. Here is an example: Suppose the random variable X is uniformly distributed on the interval from −1 to 1, and Y = X². Then Y is completely determined by X, so that X and Y are as far from being independent as two random variables can be, but their correlation is zero; they are uncorrelated. However, in the special case when X and Y are jointly normal, independence is equivalent to uncorrelatedness.

Which is similar to what you said, but even easier to explain I think. Though my mastery of the mechanics of this has lost me, so I can't recall how to show why X and Y have 0 correlation. - Taxman 14:42, Apr 28, 2005 (UTC)

Thanks for this. Unfortunately it almost perfectly fails to answer the question. It asserts that in general, uncorrelated =!=> independent, and then states a special case involving normally distributed variables in which uncorrelated => independent. The variables in the example are not normally distributed either. It thus doesn't do anything to answer the question posed, which is: if X and Y are normally distributed and uncorrelated, are they necessarily independent?

Jointly normal is in this case a very important difference. It would have been clearer had I simply removed the sentence about the special case, which was a bit off track for the question. The rest is a clear example to answer your original question. The only difference is the example happened to discuss uniform distributed RV's instead of normal. In any case, I think the answer to your original question is now clear to you, so this is moot. - Taxman 21:33, Apr 28, 2005 (UTC)

I think the answer is that just because both X and Y are normal does not imply that they are jointly normal, and that uncorrelated => independent only where they are jointly normal. In fact, I think I have constructed an example in which X and Y are both normal, and uncorrelated, but not jointly normal and not independent. It would be good if this were spelled out somewhere, since it would answer a common misconception. — ciphergoth 16:02, 2005 Apr 28 (UTC)

uncorrelated + normal ---/---> independent.
uncorrelated + jointly normal ------> independent.

This is widely known. Michael Hardy 17:36, 28 Apr 2005 (UTC)

The misconception arises from the subtle difference between the marginal distributions being normal and the joint distribution being a bivariate normal. See multivariate normal distribution for details, where it is explained that a joint normal distribution requires that every linear combination of X and Y be normal, not just X and Y. This is not the case when |X|=|Y|. — Miguel 21:02, 2005 Apr 28 (UTC)

As I thought. It seems to be quite a widespread misconception, judging by the references User:Orderud found It's good to have this clarified - thanks! — ciphergoth 21:15, 2005 Apr 28 (UTC)

Test scores

Cut from intro to IQ tests:

While for most practical purposes the distributions of IQ and intelligence (or at least psychometric g) can be seen as the same thing, it is important to distinguish between the two terms when discussing whether they are normally distributed.

I agree that the distinction is important, but the remainder of this section is entirely about the construction of IQ tests and how that affects test scores. If there's a distinction being made between any of the following terms, I guess I missed it:

It's not fair to the reader, to say that something is important, but then to go on and neglect it. Also, why is this discussion hidden in a long, arcane math page? Some merging may be in order here. Uncle Ed July 6, 2005 10:50 (UTC)

Numerical calculation of normal cdf

Does anyone know what is meant by the graphical methods/means for numerically calculating the cdf? I would have thought these were not very accurate, and that numerical integration followed by interpolation would be better. --JahJah 02:44, 22 August 2005 (UTC)

I don't think numerical integration is particularly efficient or accurate. The GNU Scientific Library, for example, calculates values of the standard normal CDF using piecewise approximations by rational functions. There's no numerical quadrature involved. --MarkSweep 06:11, 22 August 2005 (UTC)

I understand about rational interpolation given the values at the knot points. After having said that exp(-x^2) has no antiderivative in terms of elementary functions, one wants to show that the normal cdf can still be calculated by elementary methods. Finding the normal cdf by numerical integration is a standard exercise in calculus courses. I presume Taylor series are there to make the same point. I still I have no idea what is meant by geometric methods. --JahJah 10:33, 22 August 2005 (UTC)

Me neither, but it's gone now. --MarkSweep 20:31, 22 August 2005 (UTC)

error in the cdf?

You need to be more specific about what exactly you think might be wrong. --MarkSweep✍ 00:19, 8 September 2005 (UTC)

can any1 tell me wat the integral of 2pi^-0.5*e^(-o.5x^2) is?? i tried interating by parts and other methods but no luck. can sum1 help

The antiderivative does not have a closed-form expression. The definite integral can be found:

\int _{-\infty }^{\infty }e^{-x^{2}/2}\,dx={\sqrt {2\pi \ }}.

See Gaussian function for a derivation of this. Michael Hardy 20:40, 22 May 2006 (UTC)

Mixed variables?

Just under the line, "The standard normal cdf, conventionally denoted Φ, is just the general cdf evaluated with μ = 0 and σ = 1," there's a function that has an integral from -infinity to z, but the derivative is dx. Could this be a simple typo?

It said Φ(z), with z, not x, but then it set it equal to F(x; 0,1) with x, not z. I've changed it. Those two and the upper bound of integration are now both x. The bound variable of integration is now u. Michael Hardy 19:21, 13 October 2005 (UTC)

i have a website which has workings out and has a formula which is supposed to calculate the probabilites. cud sum1 please check to see if they are correct? the website is http://www.vibrationdata.com/math/int_pdf.pdf

Color diagram

The color diagram showing percentages is somewhat misleading. The abscissa s=1 should correspond to the point of steepest slope on the curve, but it does not. Nor are the percentages drawn correctly. Anybody make a correct drawing ? Bo Jacoby 10:34, 21 October 2005 (UTC)

Proof of the properties of the normal distribution

I've started the sum of normal distributions article to prove the additative property of Gaussian distributions. I did, however, got stuck in an integration step, so the proof misses a couple of steps.

Could someone please help contribute, to complete the proof? --Fredrik Orderud 19:58, 23 November 2005 (UTC)

Answer

Hi Fredrik Orderud.

I think the word is additive - not 'additative'.

The first two parts of the property: that the mean of a sum is the sum of the means, and that the variance of the sum is the sum of the variances, apply to any old distribution, not just the normal ones.

I prefer to transform the probability density function f(x) into the cumulant-generating function

Z(t)=(d/dt)\log(\int {e^{xt}f(x)dx})=\mu +\sigma ^{2}t+...

This function is additive, so that the Z of a sum is the sum of the Zs.

The Z-function of the normal distribution have no high-order terms but is simply

\ Z(t)=\mu +\sigma ^{2}t

Then it is obvious that the sum of normally distributed variables is normally distributed.

Bo Jacoby 09:57, 2 December 2005 (UTC)

Archives