Mathematics desk
< March 8	<< Feb \| March \| Apr >>	March 10 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

March 9

Effects of small sample size in ANOVA

I'm using a repeated measures ANOVA to establish whether there are differences in the density of neurons in particular columns of a brain structure, using three animals, with seven sections per animal and three columns, and to do this I'm using a repeated measures ANOVA with the 21 sections as the subjects and the columns as repeated measures.

As far as I can see, this is a small sample size, but I've found significant differences in density between different depths and different columns (though not for the interaction). As far as I understand, a small sample size increases the chance of a type II error and accepting the null hypothesis when it should be rejected, but I can't find many references to any other effects it has, so my question is: given that I've rejected the null hypothesis that neuron density is the same across sections and columns, and so have avoided type II errors, what other issues is the small sample size likely to cause? I'm having difficulty finding a clear answer anywhere.

Thanks for any help, sorry for the longwindedness Jasonisme (talk) 19:41, 9 March 2008 (UTC)[reply]

The major thing I can think of is that it becomes more difficult to do diagnostics, check for constant variance, autocorrelation, etc. OTOH, those techniques are often misused anyway, especially a priori to inform what kind of analysis to do, which then messes up the type I and II error rates. If you wanted to do some really rigorous diagnostics (cross validation, etc.), a small sample size makes the ecdf quite unsmooth. This is less of a "bad" thing and rather just "unattrarctive", however. Baccyak4H (Yak!) 14:09, 10 March 2008 (UTC)[reply]

Entropy maximizing distribution

It is well known that the normal distribution maximizes the entropy for a given mean and variance. If I'm not mistaken, it is easy to generalize this to the claim that, given the first 2n moments of a distribution, the one that maximizes the entropy has a density of the form $e^{P(x)}\;\!$ where P is a polynomial of degree 2n. But what if the number of given moments is odd (say, we constrain the mean, variance and skewness)? Is there a distribution which maximizes the entropy, and does it have a closed form? -- Meni Rosenfeld (talk) 21:16, 9 March 2008 (UTC)[reply]

Just to make sure I'm on the same page: Is the distribution maximizing entropy having a given mean, the uniform distribution centered at the mean, or the point mass at the mean? JackSchmidt (talk) 21:24, 9 March 2008 (UTC)[reply]

Uniform. Remember, high entropy = uncertainty = spread. In this context I am interested only in distributions with a proper pdf, and since the entropy can be arbitrarily high when given only the mean, I am excluding this case. -- Meni Rosenfeld (talk) 22:02, 9 March 2008 (UTC)[reply]

Cool. One reason I asked, is because (assuming you meant entropy the way you do, and not its negative which is common in some areas) it seemed like there was no solution for the first possible case, n=1, without some other hypothesis. When I look at the differential entropy page, and compare your suggested method, something doesn't seem quite right. Analysis is not my strong suit, and statistics even less, so I'll assume I am just confused. If you sketch out the even moment case, I can see if I can make sense of it. JackSchmidt (talk) 22:18, 9 March 2008 (UTC)[reply]

this gives the proof for the normal distribution; my proof is a simple extension. Assuming that there is some

g(x)=e^{P(x)}

with the correct moments, let

f(x)

be a pdf for any distribution with the same moments. Then

\int f\log f\geq \int f\log g=\int fP=\int gP=\int g\log g

Thus g has higher entropy. What is it that didn't seem right? -- Meni Rosenfeld (talk) 23:58, 9 March 2008 (UTC)[reply]

Why is int(f*log(f)) >= int(f*log(g))? JackSchmidt (talk) 00:17, 10 March 2008 (UTC)[reply]

This is true for any two distributions, and the proof is on page 3 of the linked paper. -- Meni Rosenfeld (talk) 00:20, 10 March 2008 (UTC)[reply]

It is false for most pairs distributions. The statement on page 2 has a hypotheses. Why should your distributions satisfy the hypotheses? Where have you made use of the even-ness of the number of moments specified? In other words, you haven't really explained anything. I am trying to help, but I do require a little bit of detail. JackSchmidt (talk) 00:25, 10 March 2008 (UTC)[reply]

Hm? The hypotheses are satisfied by any distribution, except for

f(x)>0

which I'll throw in (that is, I'll maximize over all distributions satisfying it, which I'm pretty sure alters nothing). The evenness is used in the existence of g, which I have not proved, but for an odd number

e^{P(x)}

is obviously not a distribution. My argument is not symmetric in f and g, as the log of f is not a polynomial. I have provided only a sketch because I thought the details were clear, but you are of course welcome to ask about any which isn't. -- Meni Rosenfeld (talk) 00:37, 10 March 2008 (UTC)[reply]

←"For any two distributions f,g, int(f*log(f)) >= int(f*log(g))" is complete nonsense. "The hypothesis int(f-g)>=0 is satisfied for any pair of distributions" is also complete nonsense. JackSchmidt (talk) 00:47, 10 March 2008 (UTC)[reply]

Ah, I think I see the problem. We might be using distribution in two different ways. If g is a point mass, and f is its derivative, then inf(f-g)=-1 is clearly not greater than 0, but f and g are very well behaved distributions. However, I think you want int(f)=int(g)=1, so that they are probability distributions. Maximizing over a set of functions is often not possible, since many nice function spaces are not complete, so I figured you wanted to include distributions as well. Not all distributions have all moments (not even all probability distributions), but I believe there is a class of distributions either called Schwartz class functions or Schwartz distributions, that have all their moments, and I believe, are determined by them. I think I see what you meant in your responses.

Why does int(fP) = int(gP)? This is clearly false for general distributions (take f and g to be distinct point masses), but perhaps it is true here? Is this because int(fP) is a linear combination of moments of f, moments that were require to be equal to the moments of g? This might actually be basically the proof I was talking about below for moment matching. JackSchmidt (talk) 02:36, 10 March 2008 (UTC)[reply]

Yes, I did mean probability distributions - sorry for not making this clear, it has escaped me that "distribution" can be interpreted more generally. I did mention I am interested in proper pdfs, excluding point masses and the like (unless, of course, the maximum cannot be attained this way for odd m).

Indeed, int(fP) = int(gP) because P is a polynomial of degree m, thus these are linear combinations of the first m moments, which are assumed to exist and be equal for the two functions. -- Meni Rosenfeld (talk) 14:16, 10 March 2008 (UTC)[reply]

There is some technique called moment matching. there is some simple formula which given a sequence of moments corresponding to the moments of a nice (Schwartz distribution maybe?) function, gives back the function. It's like a fourier transform or a laplace transform or something. Does that ring a bell at all? I had some book that discussed this in very clear language, but I don't recall which book. It was some sort of formal statistics book, so really an analysis book that focussed on finite measure spaces. JackSchmidt (talk) 22:56, 9 March 2008 (UTC)[reply]

It seems like we need all moments for that, and I don't see how we would find the entropy-maximizing moments based on the first few. -- Meni Rosenfeld (talk) 23:58, 9 March 2008 (UTC)[reply]

My recollection is that the formula is so nice that you can show that your choice of all moments (subject to choosing the first few) maximizes entropy amongst all all distributions which both have all their moments and are determined by them (which I think includes Schwartz distributions, so should be general enough). I think it was something along the lines of giving the log of the distribution as a rapidly converging power series where estimates would be easy to make. JackSchmidt (talk) 00:17, 10 March 2008 (UTC)[reply]

There is an example on page 4 (of the paper Meni provided), on exponential distributions. Given mean, the exponential distribution would maximize the entropy. For odd n, $e^{P(x)}$ can be a distribution in the same way the exponential distribution is. (Igny (talk) 00:53, 10 March 2008 (UTC))[reply]

But the exponential only maximizes entropy given the mean and that it is only supported on the positives. I do not desire that restriction. -- Meni Rosenfeld (talk) 10:33, 10 March 2008 (UTC)[reply]

But for unbounded distributions with fixed mean there is no maximum of the entropy. (Igny (talk) 13:09, 10 March 2008 (UTC))[reply]

Yes, as I have said before, I have excluded the case of

m=1

for this reason. -- Meni Rosenfeld (talk) 14:02, 10 March 2008 (UTC)[reply]