How to pick sample size?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.
Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.
How can math help me?
statistics sampling
add a comment |Â
up vote
0
down vote
favorite
I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.
Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.
How can math help me?
statistics sampling
1
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.
Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.
How can math help me?
statistics sampling
I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.
Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.
How can math help me?
statistics sampling
asked Aug 3 at 13:01
asadz
1156
1156
1
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01
add a comment |Â
1
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01
1
1
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.
Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.
First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$
Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$
In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$
In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]
Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.
Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.
First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$
Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$
In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$
In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]
Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.
add a comment |Â
up vote
0
down vote
Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.
Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.
First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$
Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$
In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$
In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]
Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.
Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.
First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$
Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$
In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$
In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]
Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.
Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.
Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.
First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$
Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$
In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$
In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]
Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.
edited Aug 3 at 20:52
answered Aug 3 at 20:36
BruceET
33k61440
33k61440
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2871040%2fhow-to-pick-sample-size%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10
Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33
You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47
5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01