Does comparing two p-values make sense?
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
Does comparing two p-values make sense?
For example, the p-value of factors willingness to pay
and the number of owned cars
is 0.3.
The p-value of willingness to pay
and the number of owned pets
is 0.6.
Can I claim that
the number of owned cars
has a stronger relationship with willingness to pay
and the number of owned cars
explains willingness to pay
more than the number of owned pets
does?
I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.
probability statistics statistical-inference
 |Â
show 1 more comment
up vote
3
down vote
favorite
Does comparing two p-values make sense?
For example, the p-value of factors willingness to pay
and the number of owned cars
is 0.3.
The p-value of willingness to pay
and the number of owned pets
is 0.6.
Can I claim that
the number of owned cars
has a stronger relationship with willingness to pay
and the number of owned cars
explains willingness to pay
more than the number of owned pets
does?
I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.
probability statistics statistical-inference
1
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16
 |Â
show 1 more comment
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Does comparing two p-values make sense?
For example, the p-value of factors willingness to pay
and the number of owned cars
is 0.3.
The p-value of willingness to pay
and the number of owned pets
is 0.6.
Can I claim that
the number of owned cars
has a stronger relationship with willingness to pay
and the number of owned cars
explains willingness to pay
more than the number of owned pets
does?
I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.
probability statistics statistical-inference
Does comparing two p-values make sense?
For example, the p-value of factors willingness to pay
and the number of owned cars
is 0.3.
The p-value of willingness to pay
and the number of owned pets
is 0.6.
Can I claim that
the number of owned cars
has a stronger relationship with willingness to pay
and the number of owned cars
explains willingness to pay
more than the number of owned pets
does?
I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.
probability statistics statistical-inference
asked Jul 28 at 22:53
Marcus Thornton
1212
1212
1
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16
 |Â
show 1 more comment
1
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16
1
1
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16
 |Â
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
1
down vote
Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.
If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:
qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627
So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.
Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.
Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$
1 - pchisq(12.33, 5)
[1] 0.03053538
Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.
However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)
The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.
The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.
set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)
mean(q >= 11.07)
[1] 0.04864
pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864
Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.
From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.
The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.
If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:
qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627
So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.
Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.
Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$
1 - pchisq(12.33, 5)
[1] 0.03053538
Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.
However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)
The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.
The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.
set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)
mean(q >= 11.07)
[1] 0.04864
pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864
Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.
From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.
The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.
add a comment |Â
up vote
1
down vote
Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.
If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:
qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627
So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.
Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.
Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$
1 - pchisq(12.33, 5)
[1] 0.03053538
Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.
However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)
The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.
The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.
set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)
mean(q >= 11.07)
[1] 0.04864
pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864
Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.
From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.
The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.
If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:
qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627
So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.
Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.
Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$
1 - pchisq(12.33, 5)
[1] 0.03053538
Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.
However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)
The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.
The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.
set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)
mean(q >= 11.07)
[1] 0.04864
pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864
Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.
From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.
The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.
Absent requested clarifications, I can only make generic comments on
the proper uses of P-values.
If a chi-squared goodness-of-fit test or test for independence has a
statistic $Q$ that is approximately distributed as $mathsfChisq(textdf = 5),$
then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values
on row 5 of the table to which you linked; I have found them using R statistical
software below:
qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627
So if your computed value of the test statistic is $Q = 12.33,$ you can
reject the null hypothesis at the 5% level, but not at the 1% level.
Nowadays, most statistical software gives P-values instead of dealing
with specific fixed levels of significance. Software can do that because it
can find more detailed information about a particular distribution
(for example, $mathsfChisq(textdf = 5)$) than is convenient to print
in a published table.
Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under
the density function for $mathsfChisq(textdf = 5)$ to the right of
of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not
at the 1% level because $0.0305 > 0.01.$
1 - pchisq(12.33, 5)
[1] 0.03053538
Thus given the P-value, a person can choose their own significance level, and
make a determination whether the test shows a significant result at that level.
So it is fair to say that small P-values are useful to determine the result
of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence
against $H_0$ than does a larger one such as 0.045--even though both P-values lead
to rejection at the 5% level.
However, it is not generally useful to make distinctions between the
'information contained' in larger P-values such as 0.3 and 0.6. That is
because, assuming $H_0$ to be true, the P-value is a random variable
that is approximately uniform on the interval $(0,1).$ For a continuous
test statistic, such as $Z$ in a normal test or $T$ in a t test, one can
prove that P-values are precisely $mathsfUnif(0,1).$ For most discrete
test statistics P-values are roughly, but not exactly uniform. (One
usually explores the distributions of such P-values through simulation.)
The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete,
because its values are based on integer counts. A simple example is to
see what happens in repeated tests whether a die is fair. If a die is rolled
$n = 600$ times, then we ought to see each of the six faces "about 100" times.
The purpose of the chi-squared statistic is to assess whether the actual
face counts are sufficiently close to the expected 100 to say results are
consistent with a fair die.
The R code below simulates 100,000 such 600-roll experiments and finds the test
statistic
$Q = sum_i=1^6 frac(X_i-100)^2100$ for each experiment. Then we can
make a histogram of the 100,000 values of $Q$ and also a histogram of the
corresponding 100,000 P-values.
set.seed(1234)
m = 10^5; n = 600; E = n/6; die = 1:6; q = numeric(m)
for (i in 1:m)
faces = sample(die, 600, rep=T)
X = rle(sort(faces))$lengths
q[i] = sum((X-E)^2/E)
mean(q >= 11.07)
[1] 0.04864
pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864
Because rolls of fair dice are simulated, it is not surprising to see that
$Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.
From the histogram we can see that $Q$ has approximately the target chi-squared
distribution, rejecting for values to the right of the vertical broken line.
Also, the P-values are approximately normally distributed, rejecting for
values to the left of the vertical line.
The point of this demonstration is that the uniform distribution of P-values
makes it difficult to say that particular P-values such as .3 and .6 are
more remarkable or meaningful than others. Ordinarily, we only care about whether P-values
are small enough to lead to rejection at our chosen significance level.
edited Aug 1 at 1:05
answered Jul 31 at 19:33
BruceET
33.1k61440
33.1k61440
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2865648%2fdoes-comparing-two-p-values-make-sense%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
When you write "p-value" between two factors do you mean the correlation $rho$? If so, your final comment about $0.05$ is something else
– Henry
Jul 28 at 23:27
I’m talking something like this: medcalc.org/manual/chi-square-table.php
– Marcus Thornton
Jul 29 at 1:11
Pondering your question. I guess I know what 'number of cars owned' looks like: (0, 1, 3, 1, 2, 1, 0, 0, 1, ...). And similarly for pets. But what do data for 'willingness to pay' look like? Likert scale (ordinal) or some sort of numeric scale? // And how are the chi-squared statistics computed? // I don't think P-values should be used in the way you propose, but I'd like to give meaningful examples why not. And maybe suggest an alternative that would work.
– BruceET
Jul 31 at 6:42
Willingness to pay is a class having high, medium, and low.
– Marcus Thornton
Jul 31 at 23:32
Are numbers of cars and pets also expressed as high, medium, and low? If so, you can do chi-sq tests of independence to compare 'Nr. Cars' and 'Willingness', etc.// You can't "prove" that one connection "explains" another, but you might collect evidence to make speculation worthwhile.// As an alternative to looking at P-values (not a good idea as I hope I've explained in my Answer), you may want to look at correlations as measured by 'Kendall's tau' $tau$ or Spearman's $rho.$
– BruceET
Aug 1 at 1:16