unbiased pool estimator of variance

up vote
0
down vote

favorite

I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.

Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:

Sample variance: $S^2 = 1overn sum(X_i - barX)^2$

Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$

Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$

The last equation, which should give the unbiased pooled estimate, reduces to:

$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$

Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?

asked Aug 5 at 23:51

s5s

19118

Shouldn't it be $n_1+n_2-1$ in the denominator?
â€“Â joriki
Aug 6 at 4:33

@joriki No, -2 is correct.
â€“Â s5s
Aug 6 at 10:05

add a commentÂ |Â

up vote
0
down vote

favorite

I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.

Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:

Sample variance: $S^2 = 1overn sum(X_i - barX)^2$

Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$

Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$

The last equation, which should give the unbiased pooled estimate, reduces to:

$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$

Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?

asked Aug 5 at 23:51

s5s

19118

Shouldn't it be $n_1+n_2-1$ in the denominator?
â€“Â joriki
Aug 6 at 4:33

@joriki No, -2 is correct.
â€“Â s5s
Aug 6 at 10:05

add a commentÂ |Â

up vote
0
down vote

favorite

I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.

Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:

Sample variance: $S^2 = 1overn sum(X_i - barX)^2$

Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$

Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$

The last equation, which should give the unbiased pooled estimate, reduces to:

$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$

Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?

asked Aug 5 at 23:51

s5s

19118

I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.

Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:

Sample variance: $S^2 = 1overn sum(X_i - barX)^2$

Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$

Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$

The last equation, which should give the unbiased pooled estimate, reduces to:

$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$

Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?

asked Aug 5 at 23:51

s5s

19118

asked Aug 5 at 23:51

s5s

19118

asked Aug 5 at 23:51

s5s

19118

asked Aug 5 at 23:51

s5s

19118

Shouldn't it be $n_1+n_2-1$ in the denominator?
â€“Â joriki
Aug 6 at 4:33

@joriki No, -2 is correct.
â€“Â s5s
Aug 6 at 10:05

add a commentÂ |Â

Shouldn't it be $n_1+n_2-1$ in the denominator?
â€“Â joriki
Aug 6 at 4:33

@joriki No, -2 is correct.
â€“Â s5s
Aug 6 at 10:05

Shouldn't it be $n_1+n_2-1$ in the denominator?
â€“Â joriki
Aug 6 at 4:33

@joriki No, -2 is correct.
â€“Â s5s
Aug 6 at 10:05

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$

Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
the pooled estimator of $sigma^2$ is

$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$

This estimator is unbiased.

Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$

Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.

Example: One common measure of the 'goodness' of an estimator is that it have a small
'root mean squared error'. If $T$ is an estimate of $tau$ then
$textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.

The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
the version of the sample variance with $n$ in the denominator has smaller
RMSE than the version with $n-1$ in the denominator. (A formal proof for
$n > 1$ is not difficult.)

set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased 
[1] 60.06415 # smaller RMSE

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873443%2funbiased-pool-estimator-of-variance%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$

This estimator is unbiased.

set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased 
[1] 60.06415 # smaller RMSE

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

add a commentÂ |Â

up vote
1
down vote

accepted

$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$

This estimator is unbiased.

set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased 
[1] 60.06415 # smaller RMSE

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

add a commentÂ |Â

up vote
1
down vote

accepted

$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$

This estimator is unbiased.

set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased 
[1] 60.06415 # smaller RMSE

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$

This estimator is unbiased.

set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased 
[1] 60.06415 # smaller RMSE

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

edited Aug 9 at 15:44

answered Aug 8 at 19:34

BruceET

33.3k61440

answered Aug 8 at 19:34

BruceET

33.3k61440

answered Aug 8 at 19:34

BruceET

33.3k61440

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik