ANOVA analysis to compare mean values
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.
According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.
My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?
statistics normal-distribution
add a comment |Â
up vote
0
down vote
favorite
According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.
According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.
My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?
statistics normal-distribution
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.
According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.
My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?
statistics normal-distribution
According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.
According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.
My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?
statistics normal-distribution
edited Aug 6 at 7:41
asked Aug 6 at 7:33
Pasindu
163
163
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31
add a comment |Â
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$
I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)
set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)
Sample means are slightly different:
mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204
ANOVA not significant:
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946
Kruskal-Wallis detects different shifts:
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454
The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.
par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")
Shifted-normal data: By contrast both tests detect similar shifts in normal populations.
set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000);
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239
boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))
Boxplots at right below show the three normal samples.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$
I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)
set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)
Sample means are slightly different:
mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204
ANOVA not significant:
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946
Kruskal-Wallis detects different shifts:
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454
The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.
par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")
Shifted-normal data: By contrast both tests detect similar shifts in normal populations.
set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000);
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239
boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))
Boxplots at right below show the three normal samples.
add a comment |Â
up vote
1
down vote
Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$
I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)
set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)
Sample means are slightly different:
mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204
ANOVA not significant:
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946
Kruskal-Wallis detects different shifts:
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454
The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.
par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")
Shifted-normal data: By contrast both tests detect similar shifts in normal populations.
set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000);
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239
boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))
Boxplots at right below show the three normal samples.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$
I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)
set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)
Sample means are slightly different:
mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204
ANOVA not significant:
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946
Kruskal-Wallis detects different shifts:
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454
The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.
par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")
Shifted-normal data: By contrast both tests detect similar shifts in normal populations.
set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000);
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239
boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))
Boxplots at right below show the three normal samples.
Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$
I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)
set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)
Sample means are slightly different:
mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204
ANOVA not significant:
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946
Kruskal-Wallis detects different shifts:
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454
The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.
par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")
Shifted-normal data: By contrast both tests detect similar shifts in normal populations.
set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000);
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
kruskal.test(x~g)
Kruskal-Wallis rank sum test
data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239
boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))
Boxplots at right below show the three normal samples.
answered Aug 8 at 17:26
BruceET
33.3k61440
33.3k61440
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873665%2fanova-analysis-to-compare-mean-values%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39
@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42
Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31