How to pick sample size?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.



Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.



How can math help me?







share|cite|improve this question















  • 1




    What does your performance scale look like? Is it binary, as in bad systems and good systems?
    – Arnaud Mortier
    Aug 3 at 13:10











  • Yes 1 and 0 as pass or fail
    – asadz
    Aug 3 at 14:33










  • You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
    – Arnaud Mortier
    Aug 3 at 14:47










  • 5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
    – asadz
    Aug 3 at 15:01














up vote
0
down vote

favorite












I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.



Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.



How can math help me?







share|cite|improve this question















  • 1




    What does your performance scale look like? Is it binary, as in bad systems and good systems?
    – Arnaud Mortier
    Aug 3 at 13:10











  • Yes 1 and 0 as pass or fail
    – asadz
    Aug 3 at 14:33










  • You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
    – Arnaud Mortier
    Aug 3 at 14:47










  • 5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
    – asadz
    Aug 3 at 15:01












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.



Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.



How can math help me?







share|cite|improve this question











I have a requirement where i want to perform testing of 1000 systems, but I want to limit or pick or sample size since the rest is assumed to have same configuration.



Whats criteria to pick sample size, Like I want to ensure high confidence rate e.g e.g if I pick 300 systems , with 5% or less margin of error the rest will have same error or issues.



How can math help me?









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Aug 3 at 13:01









asadz

1156




1156







  • 1




    What does your performance scale look like? Is it binary, as in bad systems and good systems?
    – Arnaud Mortier
    Aug 3 at 13:10











  • Yes 1 and 0 as pass or fail
    – asadz
    Aug 3 at 14:33










  • You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
    – Arnaud Mortier
    Aug 3 at 14:47










  • 5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
    – asadz
    Aug 3 at 15:01












  • 1




    What does your performance scale look like? Is it binary, as in bad systems and good systems?
    – Arnaud Mortier
    Aug 3 at 13:10











  • Yes 1 and 0 as pass or fail
    – asadz
    Aug 3 at 14:33










  • You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
    – Arnaud Mortier
    Aug 3 at 14:47










  • 5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
    – asadz
    Aug 3 at 15:01







1




1




What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10





What does your performance scale look like? Is it binary, as in bad systems and good systems?
– Arnaud Mortier
Aug 3 at 13:10













Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33




Yes 1 and 0 as pass or fail
– asadz
Aug 3 at 14:33












You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47




You want to look at the statistical notion of confidence intervals. To answer your question precisely one would need more information on what you mean by $5%$ here.
– Arnaud Mortier
Aug 3 at 14:47












5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01




5% means a deviation in results like 5% from remaining assets might fail on the same test. (show different result)
– asadz
Aug 3 at 15:01










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Your question is a little vague as it stands. But I hope I can show you
how to think about this productively.



Assuming the process is stable so that the probability $p$ of success is
constant over time, you could formulate this as a problem about
confidence intervals.



First, you have to decide how close $d$ to the true value of $p$ you need to come.
Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
$X$ is the number of Successes among the $n.$ Then a 95% CI for the true
$p$ is of the form
$$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$



Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
$n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
so that $n approx 400.$



In reporting results of public opinion polls, it is common to say that the 'margin of
sampling error' is $1/sqrtn,$ where the number of respondents is $n.$



In your problem, you might stop after a hundred tests and re-compute $n$ based
on your value of $hat p_100.$ And again after two hundred tests. So if
the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]



Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
formalized as 'sequential analysis', which you can google if you like.






share|cite|improve this answer























    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2871040%2fhow-to-pick-sample-size%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Your question is a little vague as it stands. But I hope I can show you
    how to think about this productively.



    Assuming the process is stable so that the probability $p$ of success is
    constant over time, you could formulate this as a problem about
    confidence intervals.



    First, you have to decide how close $d$ to the true value of $p$ you need to come.
    Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
    $X$ is the number of Successes among the $n.$ Then a 95% CI for the true
    $p$ is of the form
    $$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$



    Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
    take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
    because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
    $n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
    so that $n approx 400.$



    In reporting results of public opinion polls, it is common to say that the 'margin of
    sampling error' is $1/sqrtn,$ where the number of respondents is $n.$



    In your problem, you might stop after a hundred tests and re-compute $n$ based
    on your value of $hat p_100.$ And again after two hundred tests. So if
    the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]



    Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
    purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
    formalized as 'sequential analysis', which you can google if you like.






    share|cite|improve this answer



























      up vote
      0
      down vote













      Your question is a little vague as it stands. But I hope I can show you
      how to think about this productively.



      Assuming the process is stable so that the probability $p$ of success is
      constant over time, you could formulate this as a problem about
      confidence intervals.



      First, you have to decide how close $d$ to the true value of $p$ you need to come.
      Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
      $X$ is the number of Successes among the $n.$ Then a 95% CI for the true
      $p$ is of the form
      $$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$



      Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
      take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
      because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
      $n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
      so that $n approx 400.$



      In reporting results of public opinion polls, it is common to say that the 'margin of
      sampling error' is $1/sqrtn,$ where the number of respondents is $n.$



      In your problem, you might stop after a hundred tests and re-compute $n$ based
      on your value of $hat p_100.$ And again after two hundred tests. So if
      the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]



      Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
      purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
      formalized as 'sequential analysis', which you can google if you like.






      share|cite|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        Your question is a little vague as it stands. But I hope I can show you
        how to think about this productively.



        Assuming the process is stable so that the probability $p$ of success is
        constant over time, you could formulate this as a problem about
        confidence intervals.



        First, you have to decide how close $d$ to the true value of $p$ you need to come.
        Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
        $X$ is the number of Successes among the $n.$ Then a 95% CI for the true
        $p$ is of the form
        $$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$



        Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
        take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
        because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
        $n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
        so that $n approx 400.$



        In reporting results of public opinion polls, it is common to say that the 'margin of
        sampling error' is $1/sqrtn,$ where the number of respondents is $n.$



        In your problem, you might stop after a hundred tests and re-compute $n$ based
        on your value of $hat p_100.$ And again after two hundred tests. So if
        the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]



        Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
        purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
        formalized as 'sequential analysis', which you can google if you like.






        share|cite|improve this answer















        Your question is a little vague as it stands. But I hope I can show you
        how to think about this productively.



        Assuming the process is stable so that the probability $p$ of success is
        constant over time, you could formulate this as a problem about
        confidence intervals.



        First, you have to decide how close $d$ to the true value of $p$ you need to come.
        Suppose you will test $n$ items and estimate $p$ as $hat p = X/n,$ where
        $X$ is the number of Successes among the $n.$ Then a 95% CI for the true
        $p$ is of the form
        $$hat p pm 1.96sqrtfrachat p(1-hat p)n.$$



        Then suppose you want $d = .05$ so that $1.96sqrtfrachat p(1-hat p)n = 0.05.$ Because you can't know in advance what $hat p$ will be, it customary to
        take the 'worst-case-scenario' where $hat p = 1/2.$ (It's the worst case
        because $hat p(1-hat p)$ is largest for $hat p = 1/2.)$ Then you can find
        $n$ by solving $1.96sqrt1/4n = .05.$ This is very nearly $1/sqrtn = .05,$
        so that $n approx 400.$



        In reporting results of public opinion polls, it is common to say that the 'margin of
        sampling error' is $1/sqrtn,$ where the number of respondents is $n.$



        In your problem, you might stop after a hundred tests and re-compute $n$ based
        on your value of $hat p_100.$ And again after two hundred tests. So if
        the true $p$ is far from $1/2,$ you might not need to do all of the $400$ projected tests. [If the true $p = .9,$ you may need $n < 200.$]



        Notes: (a) A CI of the form $check p pm 1.96sqrtfraccheck p(1-check p)check n,$ where $check n = n+4$ and $check p = (X+2)/check n,$ is known to be better in terms of achieving a true 95% coverage than CI shown above. But for
        purposes of planning sample size $n,$ it is simpler and usually OK to use the CI shown above. [Perhaps see this Q&A and its references.] (b) The method (in my last paragraph) of revising $n$ based on intermediate results is
        formalized as 'sequential analysis', which you can google if you like.







        share|cite|improve this answer















        share|cite|improve this answer



        share|cite|improve this answer








        edited Aug 3 at 20:52


























        answered Aug 3 at 20:36









        BruceET

        33k61440




        33k61440






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2871040%2fhow-to-pick-sample-size%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?