ANOVA analysis to compare mean values

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.



According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.



My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?







share|cite|improve this question





















  • I think this depends on the size of your dataset.
    – pointguard0
    Aug 6 at 7:39










  • @pointguard0 Size is greater than 1000.
    – Pasindu
    Aug 6 at 7:42










  • Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
    – BruceET
    Aug 8 at 15:31















up vote
0
down vote

favorite
1












According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.



According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.



My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?







share|cite|improve this question





















  • I think this depends on the size of your dataset.
    – pointguard0
    Aug 6 at 7:39










  • @pointguard0 Size is greater than 1000.
    – Pasindu
    Aug 6 at 7:42










  • Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
    – BruceET
    Aug 8 at 15:31













up vote
0
down vote

favorite
1









up vote
0
down vote

favorite
1






1





According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.



According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.



My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?







share|cite|improve this question













According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.



According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.



My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Aug 6 at 7:41
























asked Aug 6 at 7:33









Pasindu

163




163











  • I think this depends on the size of your dataset.
    – pointguard0
    Aug 6 at 7:39










  • @pointguard0 Size is greater than 1000.
    – Pasindu
    Aug 6 at 7:42










  • Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
    – BruceET
    Aug 8 at 15:31

















  • I think this depends on the size of your dataset.
    – pointguard0
    Aug 6 at 7:39










  • @pointguard0 Size is greater than 1000.
    – Pasindu
    Aug 6 at 7:42










  • Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
    – BruceET
    Aug 8 at 15:31
















I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39




I think this depends on the size of your dataset.
– pointguard0
Aug 6 at 7:39












@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42




@pointguard0 Size is greater than 1000.
– Pasindu
Aug 6 at 7:42












Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31





Are the conditions for a Kruskal-Wallis test met? K-W is for differences in population medians, but if your distributions are symmetrical that wouldn't matter.
– BruceET
Aug 8 at 15:31











1 Answer
1






active

oldest

votes

















up vote
1
down vote













Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
can have relatively poor power distinguishing among samples of size 1000
from slightly-shifted exponential distributions, all with population SD
$sigma = 1.$



I'm not saying an ANOVA never works on exponential data;
I am saying that there is good reason for the normality assumption.
(The distribution of the F-statistic under $H_0$ is not as expected
unless data are normal.)



set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
g=as.factor(d*20)


Sample means are slightly different:



mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204


ANOVA not significant:



anova(lm(x~g))
Analysis of Variance Table

Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 4.0 1.9924 1.8201 0.1622
Residuals 2997 3280.6 1.0946


Kruskal-Wallis detects different shifts:



kruskal.test(x~g)

Kruskal-Wallis rank sum test

data: x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454


The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.



par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")


Shifted-normal data: By contrast both tests detect similar shifts in normal populations.



set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000); 
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table

Response: x
Df Sum Sq Mean Sq F value Pr(>F)
g 2 8.27 4.1346 4.1808 0.01538 *
Residuals 2997 2963.87 0.9889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

kruskal.test(x~g)

Kruskal-Wallis rank sum test

data: x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))


Boxplots at right below show the three normal samples.



enter image description here






share|cite|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873665%2fanova-analysis-to-compare-mean-values%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
    can have relatively poor power distinguishing among samples of size 1000
    from slightly-shifted exponential distributions, all with population SD
    $sigma = 1.$



    I'm not saying an ANOVA never works on exponential data;
    I am saying that there is good reason for the normality assumption.
    (The distribution of the F-statistic under $H_0$ is not as expected
    unless data are normal.)



    set.seed(1888); x = rexp(3000)
    d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
    g=as.factor(d*20)


    Sample means are slightly different:



    mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
    [1] 1.088035
    [1] 1.089778
    [1] 1.166204


    ANOVA not significant:



    anova(lm(x~g))
    Analysis of Variance Table

    Response: x
    Df Sum Sq Mean Sq F value Pr(>F)
    g 2 4.0 1.9924 1.8201 0.1622
    Residuals 2997 3280.6 1.0946


    Kruskal-Wallis detects different shifts:



    kruskal.test(x~g)

    Kruskal-Wallis rank sum test

    data: x by g
    Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454


    The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.



    par(mfrow=c(1,2))
    boxplot(x~g, col="skyblue2")


    Shifted-normal data: By contrast both tests detect similar shifts in normal populations.



    set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000); 
    g=as.factor(d*20); x =x+d
    anova(lm(x~g))
    Analysis of Variance Table

    Response: x
    Df Sum Sq Mean Sq F value Pr(>F)
    g 2 8.27 4.1346 4.1808 0.01538 *
    Residuals 2997 2963.87 0.9889
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    kruskal.test(x~g)

    Kruskal-Wallis rank sum test

    data: x by g
    Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

    boxplot(x~g, col="skyblue2")
    par(mfrow=c(1,1))


    Boxplots at right below show the three normal samples.



    enter image description here






    share|cite|improve this answer

























      up vote
      1
      down vote













      Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
      can have relatively poor power distinguishing among samples of size 1000
      from slightly-shifted exponential distributions, all with population SD
      $sigma = 1.$



      I'm not saying an ANOVA never works on exponential data;
      I am saying that there is good reason for the normality assumption.
      (The distribution of the F-statistic under $H_0$ is not as expected
      unless data are normal.)



      set.seed(1888); x = rexp(3000)
      d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
      g=as.factor(d*20)


      Sample means are slightly different:



      mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
      [1] 1.088035
      [1] 1.089778
      [1] 1.166204


      ANOVA not significant:



      anova(lm(x~g))
      Analysis of Variance Table

      Response: x
      Df Sum Sq Mean Sq F value Pr(>F)
      g 2 4.0 1.9924 1.8201 0.1622
      Residuals 2997 3280.6 1.0946


      Kruskal-Wallis detects different shifts:



      kruskal.test(x~g)

      Kruskal-Wallis rank sum test

      data: x by g
      Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454


      The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.



      par(mfrow=c(1,2))
      boxplot(x~g, col="skyblue2")


      Shifted-normal data: By contrast both tests detect similar shifts in normal populations.



      set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000); 
      g=as.factor(d*20); x =x+d
      anova(lm(x~g))
      Analysis of Variance Table

      Response: x
      Df Sum Sq Mean Sq F value Pr(>F)
      g 2 8.27 4.1346 4.1808 0.01538 *
      Residuals 2997 2963.87 0.9889
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      kruskal.test(x~g)

      Kruskal-Wallis rank sum test

      data: x by g
      Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

      boxplot(x~g, col="skyblue2")
      par(mfrow=c(1,1))


      Boxplots at right below show the three normal samples.



      enter image description here






      share|cite|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
        can have relatively poor power distinguishing among samples of size 1000
        from slightly-shifted exponential distributions, all with population SD
        $sigma = 1.$



        I'm not saying an ANOVA never works on exponential data;
        I am saying that there is good reason for the normality assumption.
        (The distribution of the F-statistic under $H_0$ is not as expected
        unless data are normal.)



        set.seed(1888); x = rexp(3000)
        d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
        g=as.factor(d*20)


        Sample means are slightly different:



        mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
        [1] 1.088035
        [1] 1.089778
        [1] 1.166204


        ANOVA not significant:



        anova(lm(x~g))
        Analysis of Variance Table

        Response: x
        Df Sum Sq Mean Sq F value Pr(>F)
        g 2 4.0 1.9924 1.8201 0.1622
        Residuals 2997 3280.6 1.0946


        Kruskal-Wallis detects different shifts:



        kruskal.test(x~g)

        Kruskal-Wallis rank sum test

        data: x by g
        Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454


        The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.



        par(mfrow=c(1,2))
        boxplot(x~g, col="skyblue2")


        Shifted-normal data: By contrast both tests detect similar shifts in normal populations.



        set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000); 
        g=as.factor(d*20); x =x+d
        anova(lm(x~g))
        Analysis of Variance Table

        Response: x
        Df Sum Sq Mean Sq F value Pr(>F)
        g 2 8.27 4.1346 4.1808 0.01538 *
        Residuals 2997 2963.87 0.9889
        ---
        Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        kruskal.test(x~g)

        Kruskal-Wallis rank sum test

        data: x by g
        Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

        boxplot(x~g, col="skyblue2")
        par(mfrow=c(1,1))


        Boxplots at right below show the three normal samples.



        enter image description here






        share|cite|improve this answer













        Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA
        can have relatively poor power distinguishing among samples of size 1000
        from slightly-shifted exponential distributions, all with population SD
        $sigma = 1.$



        I'm not saying an ANOVA never works on exponential data;
        I am saying that there is good reason for the normality assumption.
        (The distribution of the F-statistic under $H_0$ is not as expected
        unless data are normal.)



        set.seed(1888); x = rexp(3000)
        d = rep((1:3)/20, each=1000); x = s + d # shift by 1/20, 2/20,3/20
        g=as.factor(d*20)


        Sample means are slightly different:



        mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
        [1] 1.088035
        [1] 1.089778
        [1] 1.166204


        ANOVA not significant:



        anova(lm(x~g))
        Analysis of Variance Table

        Response: x
        Df Sum Sq Mean Sq F value Pr(>F)
        g 2 4.0 1.9924 1.8201 0.1622
        Residuals 2997 3280.6 1.0946


        Kruskal-Wallis detects different shifts:



        kruskal.test(x~g)

        Kruskal-Wallis rank sum test

        data: x by g
        Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454


        The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.



        par(mfrow=c(1,2))
        boxplot(x~g, col="skyblue2")


        Shifted-normal data: By contrast both tests detect similar shifts in normal populations.



        set.seed(1888); x = rnorm(3000); d = rep((1:3)/20, each=1000); 
        g=as.factor(d*20); x =x+d
        anova(lm(x~g))
        Analysis of Variance Table

        Response: x
        Df Sum Sq Mean Sq F value Pr(>F)
        g 2 8.27 4.1346 4.1808 0.01538 *
        Residuals 2997 2963.87 0.9889
        ---
        Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        kruskal.test(x~g)

        Kruskal-Wallis rank sum test

        data: x by g
        Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

        boxplot(x~g, col="skyblue2")
        par(mfrow=c(1,1))


        Boxplots at right below show the three normal samples.



        enter image description here







        share|cite|improve this answer













        share|cite|improve this answer



        share|cite|improve this answer











        answered Aug 8 at 17:26









        BruceET

        33.3k61440




        33.3k61440






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2873665%2fanova-analysis-to-compare-mean-values%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?