Does comparing two p-values make sense?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
1












Does comparing two p-values make sense?



For example, the p-value of factors willingness to pay and the number of owned cars is 0.3.



The p-value of willingness to pay and the number of owned pets is 0.6.



Can I claim that



the number of owned cars has a stronger relationship with willingness to pay



and the number of owned cars explains willingness to pay



more than the number of owned pets does?



I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.







share|cite|improve this question















  • 1




    When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
    – Henry
    Jul 28 at 23:27











  • I’m talking something like this: medcalc.org/manual/chi-square-table.php
    – Marcus Thornton
    Jul 29 at 1:11










  • Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
    – BruceET
    Jul 31 at 6:42










  • Willingness to pay is a class having high, medium, and low.
    – Marcus Thornton
    Jul 31 at 23:32










  • Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
    – BruceET
    Aug 1 at 1:16














up vote
3
down vote

favorite
1












Does comparing two p-values make sense?



For example, the p-value of factors willingness to pay and the number of owned cars is 0.3.



The p-value of willingness to pay and the number of owned pets is 0.6.



Can I claim that



the number of owned cars has a stronger relationship with willingness to pay



and the number of owned cars explains willingness to pay



more than the number of owned pets does?



I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.







share|cite|improve this question















  • 1




    When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
    – Henry
    Jul 28 at 23:27











  • I’m talking something like this: medcalc.org/manual/chi-square-table.php
    – Marcus Thornton
    Jul 29 at 1:11










  • Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
    – BruceET
    Jul 31 at 6:42










  • Willingness to pay is a class having high, medium, and low.
    – Marcus Thornton
    Jul 31 at 23:32










  • Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
    – BruceET
    Aug 1 at 1:16












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





Does comparing two p-values make sense?



For example, the p-value of factors willingness to pay and the number of owned cars is 0.3.



The p-value of willingness to pay and the number of owned pets is 0.6.



Can I claim that



the number of owned cars has a stronger relationship with willingness to pay



and the number of owned cars explains willingness to pay



more than the number of owned pets does?



I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.







share|cite|improve this question











Does comparing two p-values make sense?



For example, the p-value of factors willingness to pay and the number of owned cars is 0.3.



The p-value of willingness to pay and the number of owned pets is 0.6.



Can I claim that



the number of owned cars has a stronger relationship with willingness to pay



and the number of owned cars explains willingness to pay



more than the number of owned pets does?



I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 28 at 22:53









Marcus Thornton

1212




1212







  • 1




    When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
    – Henry
    Jul 28 at 23:27











  • I’m talking something like this: medcalc.org/manual/chi-square-table.php
    – Marcus Thornton
    Jul 29 at 1:11










  • Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
    – BruceET
    Jul 31 at 6:42










  • Willingness to pay is a class having high, medium, and low.
    – Marcus Thornton
    Jul 31 at 23:32










  • Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
    – BruceET
    Aug 1 at 1:16












  • 1




    When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
    – Henry
    Jul 28 at 23:27











  • I’m talking something like this: medcalc.org/manual/chi-square-table.php
    – Marcus Thornton
    Jul 29 at 1:11










  • Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
    – BruceET
    Jul 31 at 6:42










  • Willingness to pay is a class having high, medium, and low.
    – Marcus Thornton
    Jul 31 at 23:32










  • Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
    – BruceET
    Aug 1 at 1:16







1




1




When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27





When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27













I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11




I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11












Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42




Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42












Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32




Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32












Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16




Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.



If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:



qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627


So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.



Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.



Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$



1 - pchisq(12.33, 5)
[1] 0.03053538


Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.



However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)



The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.



The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.



set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)

mean(q >= 11.07)
[1] 0.04864

pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864


Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.



From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.



enter image description here



The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.






share|cite|improve this answer























    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2865648%2fdoes-comparing-two-p-values-make-sense%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Absent requested clarifications, I can only make generic comments on
    the proper uses of P-values.



    If a chi-squared goodness-of-fit test or test for independence has a
    statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
    then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
    on row 5 of the table to which you linked; I have found them using R statistical
    software below:



    qchisq(c(.95, .99), 5)
    [1] 11.07050 15.08627


    So if your computed value of the test statistic is $Q = 12.33,$ you can
    reject the null hypothesis at the 5% level, but not at the 1% level.



    Nowadays, most statistical software gives P-values instead of dealing
    with specific fixed levels of significance. Software can do that because it
    can find more detailed information about a particular distribution
    (for example, $mathsfChisq(textdf = 5)$) than is convenient to print
    in a published table.



    Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
    the density function for $mathsfChisq(textdf = 5)$ to the right of
    of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
    at the 1% level because $0.0305 > 0.01.$



    1 - pchisq(12.33, 5)
    [1] 0.03053538


    Thus given the P-value, a person can choose their own significance level, and
    make a determination whether the test shows a significant result at that level.
    So it is fair to say that small P-values are useful to determine the result
    of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
    against $H_0$ than does a larger one such as 0.045--even though both P-values lead
    to rejection at the 5% level.



    However, it is not generally useful to make distinctions between the
    'information contained' in larger P-values such as 0.3 and 0.6. That is
    because, assuming $H_0$ to be true, the P-value is a random variable
    that is approximately uniform on the interval $(0,1).$ For a continuous
    test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
    prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
    test statistics P-values are roughly, but not exactly uniform. (One
    usually explores the distributions of such P-values through simulation.)



    The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
    because its values are based on integer counts. A simple example is to
    see what happens in repeated tests whether a die is fair. If a die is rolled
    $n = 600$ times, then we ought to see each of the six faces "about 100" times.
    The purpose of the chi-squared statistic is to assess whether the actual
    face counts are sufficiently close to the expected 100 to say results are
    consistent with a fair die.



    The R code below simulates 100,000 such 600-roll experiments and finds the test
    statistic
    $Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
    make a histogram of the 100,000 values of $Q$ and also a histogram of the
    corresponding 100,000 P-values.



    set.seed(1234)
    m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
    for (i in 1:m)
    faces = sample(die, 600, rep=T)
    X = rle(sort(faces))$lengths
    q[i] = sum((X-E)^2/E)

    mean(q >= 11.07)
    [1] 0.04864

    pv = 1 - pchisq(q, 5)
    mean(pv <= .05)
    [1] 0.04864


    Because rolls of fair dice are simulated, it is not surprising to see that
    $Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.



    From the histogram we can see that $Q$ has approximately the target chi-squared
    distribution, rejecting for values to the right of the vertical broken line.
    Also, the P-values are approximately normally distributed, rejecting for
    values to the left of the vertical line.



    enter image description here



    The point of this demonstration is that the uniform distribution of P-values
    makes it difficult to say that particular P-values such as .3 and .6 are
    more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
    are small enough to lead to rejection at our chosen significance level.






    share|cite|improve this answer



























      up vote
      1
      down vote













      Absent requested clarifications, I can only make generic comments on
      the proper uses of P-values.



      If a chi-squared goodness-of-fit test or test for independence has a
      statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
      then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
      on row 5 of the table to which you linked; I have found them using R statistical
      software below:



      qchisq(c(.95, .99), 5)
      [1] 11.07050 15.08627


      So if your computed value of the test statistic is $Q = 12.33,$ you can
      reject the null hypothesis at the 5% level, but not at the 1% level.



      Nowadays, most statistical software gives P-values instead of dealing
      with specific fixed levels of significance. Software can do that because it
      can find more detailed information about a particular distribution
      (for example, $mathsfChisq(textdf = 5)$) than is convenient to print
      in a published table.



      Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
      the density function for $mathsfChisq(textdf = 5)$ to the right of
      of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
      at the 1% level because $0.0305 > 0.01.$



      1 - pchisq(12.33, 5)
      [1] 0.03053538


      Thus given the P-value, a person can choose their own significance level, and
      make a determination whether the test shows a significant result at that level.
      So it is fair to say that small P-values are useful to determine the result
      of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
      against $H_0$ than does a larger one such as 0.045--even though both P-values lead
      to rejection at the 5% level.



      However, it is not generally useful to make distinctions between the
      'information contained' in larger P-values such as 0.3 and 0.6. That is
      because, assuming $H_0$ to be true, the P-value is a random variable
      that is approximately uniform on the interval $(0,1).$ For a continuous
      test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
      prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
      test statistics P-values are roughly, but not exactly uniform. (One
      usually explores the distributions of such P-values through simulation.)



      The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
      because its values are based on integer counts. A simple example is to
      see what happens in repeated tests whether a die is fair. If a die is rolled
      $n = 600$ times, then we ought to see each of the six faces "about 100" times.
      The purpose of the chi-squared statistic is to assess whether the actual
      face counts are sufficiently close to the expected 100 to say results are
      consistent with a fair die.



      The R code below simulates 100,000 such 600-roll experiments and finds the test
      statistic
      $Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
      make a histogram of the 100,000 values of $Q$ and also a histogram of the
      corresponding 100,000 P-values.



      set.seed(1234)
      m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
      for (i in 1:m)
      faces = sample(die, 600, rep=T)
      X = rle(sort(faces))$lengths
      q[i] = sum((X-E)^2/E)

      mean(q >= 11.07)
      [1] 0.04864

      pv = 1 - pchisq(q, 5)
      mean(pv <= .05)
      [1] 0.04864


      Because rolls of fair dice are simulated, it is not surprising to see that
      $Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.



      From the histogram we can see that $Q$ has approximately the target chi-squared
      distribution, rejecting for values to the right of the vertical broken line.
      Also, the P-values are approximately normally distributed, rejecting for
      values to the left of the vertical line.



      enter image description here



      The point of this demonstration is that the uniform distribution of P-values
      makes it difficult to say that particular P-values such as .3 and .6 are
      more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
      are small enough to lead to rejection at our chosen significance level.






      share|cite|improve this answer

























        up vote
        1
        down vote










        up vote
        1
        down vote









        Absent requested clarifications, I can only make generic comments on
        the proper uses of P-values.



        If a chi-squared goodness-of-fit test or test for independence has a
        statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
        then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
        on row 5 of the table to which you linked; I have found them using R statistical
        software below:



        qchisq(c(.95, .99), 5)
        [1] 11.07050 15.08627


        So if your computed value of the test statistic is $Q = 12.33,$ you can
        reject the null hypothesis at the 5% level, but not at the 1% level.



        Nowadays, most statistical software gives P-values instead of dealing
        with specific fixed levels of significance. Software can do that because it
        can find more detailed information about a particular distribution
        (for example, $mathsfChisq(textdf = 5)$) than is convenient to print
        in a published table.



        Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
        the density function for $mathsfChisq(textdf = 5)$ to the right of
        of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
        at the 1% level because $0.0305 > 0.01.$



        1 - pchisq(12.33, 5)
        [1] 0.03053538


        Thus given the P-value, a person can choose their own significance level, and
        make a determination whether the test shows a significant result at that level.
        So it is fair to say that small P-values are useful to determine the result
        of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
        against $H_0$ than does a larger one such as 0.045--even though both P-values lead
        to rejection at the 5% level.



        However, it is not generally useful to make distinctions between the
        'information contained' in larger P-values such as 0.3 and 0.6. That is
        because, assuming $H_0$ to be true, the P-value is a random variable
        that is approximately uniform on the interval $(0,1).$ For a continuous
        test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
        prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
        test statistics P-values are roughly, but not exactly uniform. (One
        usually explores the distributions of such P-values through simulation.)



        The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
        because its values are based on integer counts. A simple example is to
        see what happens in repeated tests whether a die is fair. If a die is rolled
        $n = 600$ times, then we ought to see each of the six faces "about 100" times.
        The purpose of the chi-squared statistic is to assess whether the actual
        face counts are sufficiently close to the expected 100 to say results are
        consistent with a fair die.



        The R code below simulates 100,000 such 600-roll experiments and finds the test
        statistic
        $Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
        make a histogram of the 100,000 values of $Q$ and also a histogram of the
        corresponding 100,000 P-values.



        set.seed(1234)
        m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
        for (i in 1:m)
        faces = sample(die, 600, rep=T)
        X = rle(sort(faces))$lengths
        q[i] = sum((X-E)^2/E)

        mean(q >= 11.07)
        [1] 0.04864

        pv = 1 - pchisq(q, 5)
        mean(pv <= .05)
        [1] 0.04864


        Because rolls of fair dice are simulated, it is not surprising to see that
        $Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.



        From the histogram we can see that $Q$ has approximately the target chi-squared
        distribution, rejecting for values to the right of the vertical broken line.
        Also, the P-values are approximately normally distributed, rejecting for
        values to the left of the vertical line.



        enter image description here



        The point of this demonstration is that the uniform distribution of P-values
        makes it difficult to say that particular P-values such as .3 and .6 are
        more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
        are small enough to lead to rejection at our chosen significance level.






        share|cite|improve this answer















        Absent requested clarifications, I can only make generic comments on
        the proper uses of P-values.



        If a chi-squared goodness-of-fit test or test for independence has a
        statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
        then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
        on row 5 of the table to which you linked; I have found them using R statistical
        software below:



        qchisq(c(.95, .99), 5)
        [1] 11.07050 15.08627


        So if your computed value of the test statistic is $Q = 12.33,$ you can
        reject the null hypothesis at the 5% level, but not at the 1% level.



        Nowadays, most statistical software gives P-values instead of dealing
        with specific fixed levels of significance. Software can do that because it
        can find more detailed information about a particular distribution
        (for example, $mathsfChisq(textdf = 5)$) than is convenient to print
        in a published table.



        Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
        the density function for $mathsfChisq(textdf = 5)$ to the right of
        of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
        at the 1% level because $0.0305 > 0.01.$



        1 - pchisq(12.33, 5)
        [1] 0.03053538


        Thus given the P-value, a person can choose their own significance level, and
        make a determination whether the test shows a significant result at that level.
        So it is fair to say that small P-values are useful to determine the result
        of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
        against $H_0$ than does a larger one such as 0.045--even though both P-values lead
        to rejection at the 5% level.



        However, it is not generally useful to make distinctions between the
        'information contained' in larger P-values such as 0.3 and 0.6. That is
        because, assuming $H_0$ to be true, the P-value is a random variable
        that is approximately uniform on the interval $(0,1).$ For a continuous
        test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
        prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
        test statistics P-values are roughly, but not exactly uniform. (One
        usually explores the distributions of such P-values through simulation.)



        The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
        because its values are based on integer counts. A simple example is to
        see what happens in repeated tests whether a die is fair. If a die is rolled
        $n = 600$ times, then we ought to see each of the six faces "about 100" times.
        The purpose of the chi-squared statistic is to assess whether the actual
        face counts are sufficiently close to the expected 100 to say results are
        consistent with a fair die.



        The R code below simulates 100,000 such 600-roll experiments and finds the test
        statistic
        $Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
        make a histogram of the 100,000 values of $Q$ and also a histogram of the
        corresponding 100,000 P-values.



        set.seed(1234)
        m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
        for (i in 1:m)
        faces = sample(die, 600, rep=T)
        X = rle(sort(faces))$lengths
        q[i] = sum((X-E)^2/E)

        mean(q >= 11.07)
        [1] 0.04864

        pv = 1 - pchisq(q, 5)
        mean(pv <= .05)
        [1] 0.04864


        Because rolls of fair dice are simulated, it is not surprising to see that
        $Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.



        From the histogram we can see that $Q$ has approximately the target chi-squared
        distribution, rejecting for values to the right of the vertical broken line.
        Also, the P-values are approximately normally distributed, rejecting for
        values to the left of the vertical line.



        enter image description here



        The point of this demonstration is that the uniform distribution of P-values
        makes it difficult to say that particular P-values such as .3 and .6 are
        more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
        are small enough to lead to rejection at our chosen significance level.







        share|cite|improve this answer















        share|cite|improve this answer



        share|cite|improve this answer








        edited Aug 1 at 1:05


























        answered Jul 31 at 19:33









        BruceET

        33.1k61440




        33.1k61440






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2865648%2fdoes-comparing-two-p-values-make-sense%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?