How to test significance of a change in standard deviation?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.
Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.
I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.
So, I'd like to understand two things:
1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?
statistics standard-deviation confidence-interval
add a comment |Â
up vote
1
down vote
favorite
Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.
Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.
I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.
So, I'd like to understand two things:
1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?
statistics standard-deviation confidence-interval
2
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.
Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.
I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.
So, I'd like to understand two things:
1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?
statistics standard-deviation confidence-interval
Suppose I have a (expensive to run, mechanical) system that as a decent approximation involves two independent, random, sources of error. One of these is something I can tinker with in hope of reducing the error. I want to determine if a change I make may be claimed to to be an improvement or not.
Typically, when I run the system, I take some samples (I want to take as few as permits a sufficient confidence level--remember the system is expensive to run) I then calculate a standard deviation for the aggregated error. I then tinker with the system and take some more samples and create a new SD value. But of course, for a small sample, it's not safe to simply assume that if the SD improved in this sample, that this reflects an actual improvement, rather than luck.
I'm familiar, but not particularly competent, with the idea of confidence as it is taught in high schools, but that has always been presented in terms of testing if a change of mean is significant. That formula doesn't make sense (to me at least!) for determining if a change of SD is significant.
So, I'd like to understand two things:
1) How can I determine if a change in SD between two samples of particular sizes is significant at a given level?
2) Can I estimate the necessary sample size to obtain a result that's significant at a particular level (and if so, how)?
statistics standard-deviation confidence-interval
asked Jul 27 at 9:02
Toby Eggitt
1083
1083
2
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26
add a comment |Â
2
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26
2
2
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.
Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$
In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$
set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)
The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:
var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773
The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$
var.test(x, y, alt="g")
F test to compare two variances
data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773
The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$
1 - pf(6.0918, 11, 7)
## 0.01225055
If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).
qf(.95, 11, 7)
## 3.603037
The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.
Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.
Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.
The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.
Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.
set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423
The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.
Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going tohttps://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
andqf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.
Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$
In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$
set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)
The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:
var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773
The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$
var.test(x, y, alt="g")
F test to compare two variances
data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773
The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$
1 - pf(6.0918, 11, 7)
## 0.01225055
If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).
qf(.95, 11, 7)
## 3.603037
The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.
Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.
Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.
The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.
Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.
set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423
The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.
Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going tohttps://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
andqf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
add a comment |Â
up vote
1
down vote
accepted
There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.
Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$
In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$
set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)
The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:
var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773
The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$
var.test(x, y, alt="g")
F test to compare two variances
data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773
The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$
1 - pf(6.0918, 11, 7)
## 0.01225055
If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).
qf(.95, 11, 7)
## 3.603037
The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.
Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.
Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.
The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.
Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.
set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423
The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.
Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going tohttps://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
andqf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.
Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$
In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$
set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)
The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:
var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773
The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$
var.test(x, y, alt="g")
F test to compare two variances
data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773
The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$
1 - pf(6.0918, 11, 7)
## 0.01225055
If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).
qf(.95, 11, 7)
## 3.603037
The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.
Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.
Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.
The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.
Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.
set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423
The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.
Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.
There are several issues in your Question (and in the Comments). I will try to deal with some of them.
Suppose $X_i$'s and $Y_i$'s are independent random samples of sizes $n$ and $m$ respectively.
Hypothesis and test. Then the statistic $F = S_x^2/S_y^2 sim mathsfF(n-1, m-1).$
This fact can be used to test $H_0: sigma_x^2/sigma_y^2 = 1$ (population variances $sigma_x^2$ and $sigma_y^2$ are equal) against $H_a: sigma_x^2/sigma_y^2 > 1.$
In R statistical software, the test can be performed as shown below. I begin
by generating two normal samples of sizes $n = 10$ and $m = 8$ with a 4:1 ratio of population variances (a
2:1 ratio of population standard deviations), so we hope to reject $H_0.$
set.seed(727); x = rnorm(12, 100, 20); y = rnorm(8, 100, 10)
boxplot(x, y, col="skyblue2", names=c("x","y"), pch=19)
The boxplots clearly show that the $X_i$'s are more variable than the $Y_i$'s.
The sample variances and their ratio $F$ are as follows:
var(x); var(y); var(x)/var(y)
## 414.3551
## 68.01881
## 6.091773
The test below, rejects $H_0$ at the 5% level because the P-value is $0.01225 < 0.05.$
var.test(x, y, alt="g")
F test to compare two variances
data: x and y
F = 6.0918, num df = 11, denom df = 7, p-value = 0.01225
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
1.690733 Inf
sample estimates:
ratio of variances
6.091773
The P-value of this 1-sided test is computed as the area under
the density curve of $mathsfF(11,7)$ to the right of $F = 6.0918.$
1 - pf(6.0918, 11, 7)
## 0.01225055
If you are doing this test without software and using printed tables of the F-distribution,
the 5% critical value is found by cutting
area 0.0275 from the upper tail of $mathsfF(11, 7),$ which the printed
table will show as something like 3.60 (perhaps by interpolation).
qf(.95, 11, 7)
## 3.603037
The figure below shows the density function of $mathsfF(11, 7)$ showing the
critical value 3.603 (dashed vertical line) and the F-statistic 6.0918 (solid line).
The area beneath the curve to the right of 6.0918 is the P-value 0.01225.
Power of the F-test. There are two difficulties with the F-test described just above. First, it may not give reliable answers unless data are from normal populations, as Commented by @awkward. Various tests for difference in variances ('heteroscedasticity')
that are less-sensitive to non-normal data are discussed in intermediate-level applied statistics books and implemented in
software packages such as R. One of them is the 'Levene Test'.
Second, the F-test and its competitors (for non-normal data) have notoriously bad
power. That is, they may fail to identify real differences in population variances, as reflected in sample variances.
The power of this F-test depends on the ratio $sigma_x^2/sigma_y^2$ and the sizes of the samples.
For a reasonably complete discussion of the power of this F-test, see
this Q & A.
Here is a simulation that approximates the power for a test at the 5% level, against an alternative with a 4:1 ratio of population variances (2:1 for SDs) and for sample sizes $n = m = 10$ (population means are irrelevant). The idea is to run a large number of tests on data simulated to these
specifications and see how often the null hypothesis is rejected.
set.seed(728)
pv = replicate(10^6, var.test(rnorm(10, 0, 2), rnorm(10, 0, 1). alt="gr")$p.val)
mean(pv <= .05)
## 0.631423
The power is about 63%. (About 37% of 4:1 differences in variances will go
undetected. The sample sizes in the example above are similar, so it was not a 'sure thing' that we would reject there.) However, with larger sample sizes $n = m = 25,$ the power is slightly above 95%.
Note: If you can determine a base-level variance $sigma_0^2$ for the process in its current state, then it will be easier to detect whether a single
sample (after 'tinkering' and perhaps improvement) has a smaller variance.
Details of that would be for another discussion.
edited Jul 28 at 8:50
answered Jul 28 at 2:11
BruceET
33.1k61440
33.1k61440
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going tohttps://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
andqf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
add a comment |Â
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going tohttps://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.
– BruceET
Jul 28 at 16:22
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
andqf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.
– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
Oh, this looks significantly more complex than I was hoping! I don't have R, but I can write my own software if I know the formula/algorithm. But this seems to give me a solid starting point, so I'll see if I can make sense of it over the next few weeks. Meanwhile, it looks convincing, so I'll mark it as answered. Appreciate the time you spent on this!
– Toby Eggitt
Jul 28 at 11:42
You can have R for free in a flash by going to
https://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.– BruceET
Jul 28 at 16:22
You can have R for free in a flash by going to
https://www.r-project.org
. Just learn the bits you need. When you need more, there are lots of user help pages online. // You can look at the link for the formula for power. // There are some (Java ?) on-line calculators for power, which you should trust only after checking a few computations against ones known to be reliable.– BruceET
Jul 28 at 16:22
1
1
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
Interesting possibility; I really want to integrate this algorithm with a piece of Scala code I've written for measuring the points that result from this system. But thanks, I'll give it a go and see if I can maybe make sense of, and/or verify, some of the Java options, since they're compatible with Scala too.
– Toby Eggitt
Jul 28 at 18:32
1
1
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code
1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
and qf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.– BruceET
Jul 28 at 19:57
Not trying to force you to read the link, but it explains how you can get power value for test at 5% level against alternative $sigma_x^2/sigma_y^2 = 4$ with $m=n=10$ using code
1 - pf(qf(.95,9,9)/4,9,9)
, which returns pwr 0.6311355. Also, for $m = n = 25,$ 1 - pf(qf(.95,24,24)/4,24,24)
returns pwr 0.9538375. Both agree with simulations at end of my Answer. // This method can be implemented in any software into which PDF and inverse CDF (quantile fcn) of F-dist'n (pf
and qf
, respectively in code) have been programmed. // Quicker, probably more readily portable, than simulations.– BruceET
Jul 28 at 19:57
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
Great, thank you! I would like to find the time to read and learn properly, so I will try to read the whole thing, but the quick "do this" will be great initially. Again, much appreciated!
– Toby Eggitt
Jul 29 at 4:41
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864204%2fhow-to-test-significance-of-a-change-in-standard-deviation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
There is an F-test for equality of variances between two samples, but it is said to be extremely sensitive to non-normality, so it may not be a good choice. The Wikipedia article on the F-test lists some other tests that are not so sensitive to non-normality (see the "Properties" section), so you might check those out. See en.wikipedia.org/wiki/F-test_of_equality_of_variances. I don't have any personal experience with these tests.
– awkward
Jul 27 at 13:26