How to test significance of a change in standard deviation?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.



Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.



I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.



So, I'd like to understand two things:



1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?







share|cite|improve this question















  • 2




    There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
    – awkward
    Jul 27 at 13:26















up vote
1
down vote

favorite












Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.



Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.



I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.



So, I'd like to understand two things:



1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?







share|cite|improve this question















  • 2




    There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
    – awkward
    Jul 27 at 13:26













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.



Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.



I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.



So, I'd like to understand two things:



1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?







share|cite|improve this question











Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.



Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.



I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.



So, I'd like to understand two things:



1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 27 at 9:02









Toby Eggitt

1083




1083







  • 2




    There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
    – awkward
    Jul 27 at 13:26













  • 2




    There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
    – awkward
    Jul 27 at 13:26








2




2




There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26





There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26











1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.



Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$



In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$



set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)


enter image description here



The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:



var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773


The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$



var.test(x, y, alt="g")

F test to compare two variances

data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773


The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$



1 - pf(6.0918, 11, 7)
## 0.01225055


If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).



qf(.95, 11, 7)
## 3.603037


The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.



enter image description here



Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.



Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.



The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.



Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.



set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423


The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.



Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.






share|cite|improve this answer























  • Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
    – Toby Eggitt
    Jul 28 at 11:42










  • You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
    – BruceET
    Jul 28 at 16:22







  • 1




    Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
    – Toby Eggitt
    Jul 28 at 18:32






  • 1




    Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
    – BruceET
    Jul 28 at 19:57











  • Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
    – Toby Eggitt
    Jul 29 at 4:41










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864204%2fhow-to-test-significance-of-a-change-in-standard-deviation%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.



Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$



In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$



set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)


enter image description here



The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:



var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773


The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$



var.test(x, y, alt="g")

F test to compare two variances

data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773


The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$



1 - pf(6.0918, 11, 7)
## 0.01225055


If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).



qf(.95, 11, 7)
## 3.603037


The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.



enter image description here



Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.



Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.



The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.



Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.



set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423


The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.



Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.






share|cite|improve this answer























  • Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
    – Toby Eggitt
    Jul 28 at 11:42










  • You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
    – BruceET
    Jul 28 at 16:22







  • 1




    Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
    – Toby Eggitt
    Jul 28 at 18:32






  • 1




    Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
    – BruceET
    Jul 28 at 19:57











  • Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
    – Toby Eggitt
    Jul 29 at 4:41














up vote
1
down vote



accepted










There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.



Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$



In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$



set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)


enter image description here



The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:



var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773


The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$



var.test(x, y, alt="g")

F test to compare two variances

data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773


The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$



1 - pf(6.0918, 11, 7)
## 0.01225055


If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).



qf(.95, 11, 7)
## 3.603037


The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.



enter image description here



Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.



Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.



The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.



Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.



set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423


The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.



Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.






share|cite|improve this answer























  • Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
    – Toby Eggitt
    Jul 28 at 11:42










  • You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
    – BruceET
    Jul 28 at 16:22







  • 1




    Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
    – Toby Eggitt
    Jul 28 at 18:32






  • 1




    Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
    – BruceET
    Jul 28 at 19:57











  • Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
    – Toby Eggitt
    Jul 29 at 4:41












up vote
1
down vote



accepted







up vote
1
down vote



accepted






There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.



Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$



In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$



set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)


enter image description here



The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:



var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773


The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$



var.test(x, y, alt="g")

F test to compare two variances

data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773


The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$



1 - pf(6.0918, 11, 7)
## 0.01225055


If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).



qf(.95, 11, 7)
## 3.603037


The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.



enter image description here



Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.



Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.



The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.



Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.



set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423


The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.



Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.






share|cite|improve this answer















There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.



Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$



In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$



set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)


enter image description here



The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:



var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773


The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$



var.test(x, y, alt="g")

F test to compare two variances

data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773


The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$



1 - pf(6.0918, 11, 7)
## 0.01225055


If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).



qf(.95, 11, 7)
## 3.603037


The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.



enter image description here



Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.



Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.



The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.



Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.



set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423


The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.



Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited Jul 28 at 8:50


























answered Jul 28 at 2:11









BruceET

33.1k61440




33.1k61440











  • Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
    – Toby Eggitt
    Jul 28 at 11:42










  • You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
    – BruceET
    Jul 28 at 16:22







  • 1




    Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
    – Toby Eggitt
    Jul 28 at 18:32






  • 1




    Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
    – BruceET
    Jul 28 at 19:57











  • Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
    – Toby Eggitt
    Jul 29 at 4:41
















  • Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
    – Toby Eggitt
    Jul 28 at 11:42










  • You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
    – BruceET
    Jul 28 at 16:22







  • 1




    Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
    – Toby Eggitt
    Jul 28 at 18:32






  • 1




    Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
    – BruceET
    Jul 28 at 19:57











  • Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
    – Toby Eggitt
    Jul 29 at 4:41















Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42




Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42












You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22





You can have R for free in a flash by going to https://www.r-project.org. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22





1




1




Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32




Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32




1




1




Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57





Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code 1 - pf(qf(.95,9,9)/4,9,9), which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24) returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf and qf, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57













Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41




Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864204%2fhow-to-test-significance-of-a-change-in-standard-deviation%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?