unbiased pool estimator of variance

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.



Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:



Sample variance: $S^2 = 1overn sum(X_i - barX)^2$



Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$



Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$



The last equation, which should give the unbiased pooled estimate, reduces to:



$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$



Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?







share|cite|improve this question



















  • Shouldn't it be $n_1+n_2-1$ in the denominator?
    – joriki
    Aug 6 at 4:33










  • @joriki No, -2 is correct.
    – s5s
    Aug 6 at 10:05














up vote
0
down vote

favorite
1












I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.



Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:



Sample variance: $S^2 = 1overn sum(X_i - barX)^2$



Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$



Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$



The last equation, which should give the unbiased pooled estimate, reduces to:



$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$



Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?







share|cite|improve this question



















  • Shouldn't it be $n_1+n_2-1$ in the denominator?
    – joriki
    Aug 6 at 4:33










  • @joriki No, -2 is correct.
    – s5s
    Aug 6 at 10:05












up vote
0
down vote

favorite
1









up vote
0
down vote

favorite
1






1





I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.



Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:



Sample variance: $S^2 = 1overn sum(X_i - barX)^2$



Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$



Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$



The last equation, which should give the unbiased pooled estimate, reduces to:



$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$



Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?







share|cite|improve this question











I'm not sure I'm calculating the unbiased pooled estimator for the variance correctly.



Assuming 2 samples where $sigma_1 = sigma_2 = sigma$ and is uknown, these are my definitions:



Sample variance: $S^2 = 1overn sum(X_i - barX)^2$



Unbiased estimator: $hatS^2 = novern-1S^2 = 1overn-1 sum(X_i - barX)^2$



Unbiased pooled variance: $(n_1 - 1)hatS_1^2 + (n_2 - 1)hatS_2^2over(n_1 - 1) + (n_2 - 1) = n_1S_1^2 + n_2S_2^2overn_1 + n_2 -2$



The last equation, which should give the unbiased pooled estimate, reduces to:



$sum(X_1i - barX)^2 + sum(X_2i - barX)^2overn_1 + n_2 -2$



Is that correct? Should I expect that the biased pooled estimate's variance will be lower than the estimated variance of each individual data set ($underlineX_1$ or $underlineX_2$)?









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Aug 5 at 23:51









s5s

19118




19118











  • Shouldn't it be $n_1+n_2-1$ in the denominator?
    – joriki
    Aug 6 at 4:33










  • @joriki No, -2 is correct.
    – s5s
    Aug 6 at 10:05
















  • Shouldn't it be $n_1+n_2-1$ in the denominator?
    – joriki
    Aug 6 at 4:33










  • @joriki No, -2 is correct.
    – s5s
    Aug 6 at 10:05















Shouldn't it be $n_1+n_2-1$ in the denominator?
– joriki
Aug 6 at 4:33




Shouldn't it be $n_1+n_2-1$ in the denominator?
– joriki
Aug 6 at 4:33












@joriki No, -2 is correct.
– s5s
Aug 6 at 10:05




@joriki No, -2 is correct.
– s5s
Aug 6 at 10:05










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$



Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
the pooled estimator of $sigma^2$ is



$$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$



This estimator is unbiased.



Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$




Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.



Example: One common measure of the 'goodness' of an estimator is that it have a small
'root mean squared error'. If $T$ is an estimate of $tau$ then
$textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.



The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
the version of the sample variance with $n$ in the denominator has smaller
RMSE than the version with $n-1$ in the denominator. (A formal proof for
$n > 1$ is not difficult.)



set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased
[1] 60.06415 # smaller RMSE





share|cite|improve this answer























    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873443%2funbiased-pool-estimator-of-variance%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$



    Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
    the pooled estimator of $sigma^2$ is



    $$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$



    This estimator is unbiased.



    Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
    the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$




    Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.



    Example: One common measure of the 'goodness' of an estimator is that it have a small
    'root mean squared error'. If $T$ is an estimate of $tau$ then
    $textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.



    The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
    the version of the sample variance with $n$ in the denominator has smaller
    RMSE than the version with $n-1$ in the denominator. (A formal proof for
    $n > 1$ is not difficult.)



    set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
    v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
    v.b = (n-1)*v.a/n # denom n
    mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
    [1] 100.0564 # 70.81563
    [1] 70.81563 # larger RMSE
    mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
    [1] 80.0451 # biased
    [1] 60.06415 # smaller RMSE





    share|cite|improve this answer



























      up vote
      1
      down vote



      accepted










      First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$



      Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
      the pooled estimator of $sigma^2$ is



      $$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$



      This estimator is unbiased.



      Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
      the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$




      Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.



      Example: One common measure of the 'goodness' of an estimator is that it have a small
      'root mean squared error'. If $T$ is an estimate of $tau$ then
      $textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.



      The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
      the version of the sample variance with $n$ in the denominator has smaller
      RMSE than the version with $n-1$ in the denominator. (A formal proof for
      $n > 1$ is not difficult.)



      set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
      v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
      v.b = (n-1)*v.a/n # denom n
      mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
      [1] 100.0564 # 70.81563
      [1] 70.81563 # larger RMSE
      mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
      [1] 80.0451 # biased
      [1] 60.06415 # smaller RMSE





      share|cite|improve this answer

























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$



        Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
        the pooled estimator of $sigma^2$ is



        $$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$



        This estimator is unbiased.



        Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
        the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$




        Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.



        Example: One common measure of the 'goodness' of an estimator is that it have a small
        'root mean squared error'. If $T$ is an estimate of $tau$ then
        $textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.



        The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
        the version of the sample variance with $n$ in the denominator has smaller
        RMSE than the version with $n-1$ in the denominator. (A formal proof for
        $n > 1$ is not difficult.)



        set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
        v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
        v.b = (n-1)*v.a/n # denom n
        mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
        [1] 100.0564 # 70.81563
        [1] 70.81563 # larger RMSE
        mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
        [1] 80.0451 # biased
        [1] 60.06415 # smaller RMSE





        share|cite|improve this answer















        First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2,$ which makes it an unbiased estimator of the population variance $sigma^2.$



        Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $sigma^2.$ Then
        the pooled estimator of $sigma^2$ is



        $$S_p^2 = frac(n-1)S_X^2 + (m-1)S_Y^2m+n-2.$$



        This estimator is unbiased.



        Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
        the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$




        Note: Some authors do define the sample variance as $frac1nsum_i=1^n (X_i - bar X)^2,$ but then the sample variance is not an unbiased estimator of $sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.



        Example: One common measure of the 'goodness' of an estimator is that it have a small
        'root mean squared error'. If $T$ is an estimate of $tau$ then
        $textMSE_T(tau) = E[(T-tau)^2]$ and RMSE is its square root.



        The simulation below illustrates for normal data with $n = 5$ and $sigma^2 = 10^2 = 100,$ that
        the version of the sample variance with $n$ in the denominator has smaller
        RMSE than the version with $n-1$ in the denominator. (A formal proof for
        $n > 1$ is not difficult.)



        set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
        v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
        v.b = (n-1)*v.a/n # denom n
        mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
        [1] 100.0564 # 70.81563
        [1] 70.81563 # larger RMSE
        mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
        [1] 80.0451 # biased
        [1] 60.06415 # smaller RMSE






        share|cite|improve this answer















        share|cite|improve this answer



        share|cite|improve this answer








        edited Aug 9 at 15:44


























        answered Aug 8 at 19:34









        BruceET

        33.3k61440




        33.3k61440






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873443%2funbiased-pool-estimator-of-variance%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?

            What is the equation of a 3D cone with generalised tilt?