anonymous
2007-09-13 15:21:05 UTC
s^2 = (1/(n - 1)) [(x1 - A(x))^2 + (x2 - A(x))^2 + ... + (xn - A(x))^2].
This is supposed to estimate the "true variance" of the population {y1, ..., ym} with average A(y) from which the sample is drawn. The true variance is
σ^2 = (1/m) [(y1 - A(y))^2 + (y2 - A(y))^2 + .... + (ym - A(y))^2].
I am told that the "n - 1" is used in s^2 because this makes s^2 an unbiased estimator for σ^2 -- that is, the expected value of s^2 is equal to σ^2. I understand what this means.
What I do not understand is: *why* do we need to change the denominator in s^2 to make it an unbiased estimator? It seems strange to use a different formula for the sample than the one we are trying to estimate in the population.
Can anyone explain why s^2 (as given above) is an unbiased estimator for σ^2, and using a denominator "n" for the sample variance is not?