Magic 8 Ball Problem

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
11
down vote

favorite
4












This problem is probably simple enough to have an analogous problem, I just don't know the name so I'm going to describe it and hopefully somebody can point me in the right direction.



The problem is this: estimating the number of sides of a Magic 8 Ball.



Let's say you perform the following process:



  1. Shake the "Magic 8 Ball" and mark down the result

  2. If the result has not previously been seen, add it to the set of possible results and label the trial "N" (for "new")

  3. If the result has been previously seen, label the trial "O" (for "old")

  4. Repeat all steps

I imagine the following kind of sequence occurring:
NNNNONNOONOONOOONOOOONONNOONOOONNOONOOOOOONOOONOOOOOOONOOOOOOOONOOOOOOOO...



Now imagine we don't know there are twenty sides to a Magic 8 Ball. Or imagine that you have a Magic 8 Ball with 1000 sides. As the number of "new" trials approaches the actual number of sides, we'll get increasingly more "old" trials showing up in the mix. Once we've seen all of the possible results, we'll always get "old" results. But we're never 100% sure we've seen every single possible result.



So here are the questions I'm interested in:



  • As we proceed with trials, can we estimate the total number of "sides" of the Magic 8 ball based on the number of "new" and "old" trials up to this point?

  • Can we calculate a probability that our current estimate is correct, or a probability that some bounded estimate is correct?






share|cite|improve this question















  • 1




    Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
    – muaddib
    Jul 19 '15 at 12:52







  • 1




    These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
    – joriki
    Jul 19 '15 at 13:12











  • @muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
    – aardvarkk
    Jul 19 '15 at 13:33















up vote
11
down vote

favorite
4












This problem is probably simple enough to have an analogous problem, I just don't know the name so I'm going to describe it and hopefully somebody can point me in the right direction.



The problem is this: estimating the number of sides of a Magic 8 Ball.



Let's say you perform the following process:



  1. Shake the "Magic 8 Ball" and mark down the result

  2. If the result has not previously been seen, add it to the set of possible results and label the trial "N" (for "new")

  3. If the result has been previously seen, label the trial "O" (for "old")

  4. Repeat all steps

I imagine the following kind of sequence occurring:
NNNNONNOONOONOOONOOOONONNOONOOONNOONOOOOOONOOONOOOOOOONOOOOOOOONOOOOOOOO...



Now imagine we don't know there are twenty sides to a Magic 8 Ball. Or imagine that you have a Magic 8 Ball with 1000 sides. As the number of "new" trials approaches the actual number of sides, we'll get increasingly more "old" trials showing up in the mix. Once we've seen all of the possible results, we'll always get "old" results. But we're never 100% sure we've seen every single possible result.



So here are the questions I'm interested in:



  • As we proceed with trials, can we estimate the total number of "sides" of the Magic 8 ball based on the number of "new" and "old" trials up to this point?

  • Can we calculate a probability that our current estimate is correct, or a probability that some bounded estimate is correct?






share|cite|improve this question















  • 1




    Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
    – muaddib
    Jul 19 '15 at 12:52







  • 1




    These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
    – joriki
    Jul 19 '15 at 13:12











  • @muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
    – aardvarkk
    Jul 19 '15 at 13:33













up vote
11
down vote

favorite
4









up vote
11
down vote

favorite
4






4





This problem is probably simple enough to have an analogous problem, I just don't know the name so I'm going to describe it and hopefully somebody can point me in the right direction.



The problem is this: estimating the number of sides of a Magic 8 Ball.



Let's say you perform the following process:



  1. Shake the "Magic 8 Ball" and mark down the result

  2. If the result has not previously been seen, add it to the set of possible results and label the trial "N" (for "new")

  3. If the result has been previously seen, label the trial "O" (for "old")

  4. Repeat all steps

I imagine the following kind of sequence occurring:
NNNNONNOONOONOOONOOOONONNOONOOONNOONOOOOOONOOONOOOOOOONOOOOOOOONOOOOOOOO...



Now imagine we don't know there are twenty sides to a Magic 8 Ball. Or imagine that you have a Magic 8 Ball with 1000 sides. As the number of "new" trials approaches the actual number of sides, we'll get increasingly more "old" trials showing up in the mix. Once we've seen all of the possible results, we'll always get "old" results. But we're never 100% sure we've seen every single possible result.



So here are the questions I'm interested in:



  • As we proceed with trials, can we estimate the total number of "sides" of the Magic 8 ball based on the number of "new" and "old" trials up to this point?

  • Can we calculate a probability that our current estimate is correct, or a probability that some bounded estimate is correct?






share|cite|improve this question











This problem is probably simple enough to have an analogous problem, I just don't know the name so I'm going to describe it and hopefully somebody can point me in the right direction.



The problem is this: estimating the number of sides of a Magic 8 Ball.



Let's say you perform the following process:



  1. Shake the "Magic 8 Ball" and mark down the result

  2. If the result has not previously been seen, add it to the set of possible results and label the trial "N" (for "new")

  3. If the result has been previously seen, label the trial "O" (for "old")

  4. Repeat all steps

I imagine the following kind of sequence occurring:
NNNNONNOONOONOOONOOOONONNOONOOONNOONOOOOOONOOONOOOOOOONOOOOOOOONOOOOOOOO...



Now imagine we don't know there are twenty sides to a Magic 8 Ball. Or imagine that you have a Magic 8 Ball with 1000 sides. As the number of "new" trials approaches the actual number of sides, we'll get increasingly more "old" trials showing up in the mix. Once we've seen all of the possible results, we'll always get "old" results. But we're never 100% sure we've seen every single possible result.



So here are the questions I'm interested in:



  • As we proceed with trials, can we estimate the total number of "sides" of the Magic 8 ball based on the number of "new" and "old" trials up to this point?

  • Can we calculate a probability that our current estimate is correct, or a probability that some bounded estimate is correct?








share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 19 '15 at 12:42









aardvarkk

17010




17010







  • 1




    Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
    – muaddib
    Jul 19 '15 at 12:52







  • 1




    These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
    – joriki
    Jul 19 '15 at 13:12











  • @muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
    – aardvarkk
    Jul 19 '15 at 13:33













  • 1




    Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
    – muaddib
    Jul 19 '15 at 12:52







  • 1




    These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
    – joriki
    Jul 19 '15 at 13:12











  • @muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
    – aardvarkk
    Jul 19 '15 at 13:33








1




1




Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
– muaddib
Jul 19 '15 at 12:52





Would you say that without even seeing a result the "magic 8 ball" has an equal chance of having any positive integer number of sides? There is no maximum possible number sides, and no preference amongst them? (I am looking for the a priori probabilities).
– muaddib
Jul 19 '15 at 12:52





1




1




These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
– joriki
Jul 19 '15 at 13:12





These are not the same problems; just to give some idea what sort of approaches are used for such problems: en.wikipedia.org/wiki/German_tank_problem, en.wikipedia.org/wiki/Mark_and_recapture.
– joriki
Jul 19 '15 at 13:12













@muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
– aardvarkk
Jul 19 '15 at 13:33





@muaddib I would say that it could have any positive integer number of sides, there is no maximum, and no preference among them, yes. Oh, and that all results are equally likely to occur (uniform distribution).
– aardvarkk
Jul 19 '15 at 13:33











1 Answer
1






active

oldest

votes

















up vote
11
down vote



accepted










This problem is similar to the German tank problem. You can use the same approaches used for that problem, with the number of distinct sides seen and its conditional distribution taking the place of the highest serial number seen and its conditional distribution. (Note that the only information we have that tells us anything about the number of sides of the magic ball is the number of distinct sides we've seen; all the details of which side we saw when, both the ones you recorded in your sequence and the ones you didn't record, are irrelevant, since their conditional distribution given the number of sides seen is the same irrespective of the total number of sides, i.e. they carry no information on the total number of sides.)



I'll work out the Bayesian approach in analogy to the treatment in the Wikipedia article linked to above; I won't go into much detail for the parts that are common to the two calculations.



So let $n$ be the number of sides of the magic ball, $k$ the number of shakes performed so far, and $m$ the number of distinct sides seen. (All the notation is in analogy to the article.) Then the desired credibility $(nmid k,m)$ is given by



$$
(nmid m,k) = (mmid n,k)frac (nmid k)(mmid k);.
$$



We can take the required conditional probability $(mmid n,k)$ of seeing $m$ distinct sides on $k$ shakes of a ball with $n$ sides from this answer:



$$
(mmid n,k)=fracm!n^kbinom nmleftkatop mright;,
$$



where the $leftkatop mright$ are Stirling numbers of the second kind. As in the article, we can eliminate $(mmid k)$ by marginalizing over all possible $n$:



$$
(mmid k)=sum_n=1^infty(mmid n,k)(nmid k);.
$$



Now comes the shady part, $(nmid k)$, your prior credibility that there are $n$ sides after having performed $k$ shakes. In contrast to the tank problem, where observing a tank changes not only the highest serial number seen but also the minimum number of tanks, in our case $(nmid k)$ doesn't depend on $k$. We can use the same trick of introducing a maximum number $Omega$ of sides up to which the prior credibility is uniform, and hoping that $Omega$ will magically drop out of the result. Then $(nmid k)=1/Omega$ for $1le nleOmega$. Putting it all together, we have



$$
(nmid m,k)=frac(mmid n,k)sum_n=1^Omega(mmid n,k)=fracfracm!n^kbinom nmleftkatop mrightsum_n=1^Omegafracm!n^kbinom nmleftkatop mright=fracfrac1n^kbinom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



Now, interestingly, if we want to let the sum in the denominator run to infinity to get rid of $Omega$, we find that it converges if and only if $mle k-2$. That is, if you had no prior idea how many sides the magic ball might have and you've never seen the same side twice, then you still have no idea how many sides it might have. That's perhaps to be expected; what's perhaps not quite as expected is that this is still true (though "only" with a logarithmic divergence in the denominator) if you've seen a single reoccurrence of a side. Only with at least two reoccurrences can you overcome your prior ignorance as to the possible scale of the number of sides.



Let's work out some examples. The first convergent case is $m=1$, $k=3$, i.e. you shook three times and got the same side every time. Then



$$
(nmid 1,3)=fracfrac1n^2sum_n=1^inftyfrac1n^2=frac6(npi)^2;.
$$



So at this point you're already $6/pi^2approx61%$ sure that the thing is a lemon and only ever shows this one side. More generally, we have



$$
(nmid 1,k)=frac1n^k-1zeta(k-1);,
$$



so after seeing the same side four times in a row your inclination to go to the store to get your money back has risen to $1/zeta(3)approx83%$. On the other hand, if you do see $m=2$ different sides in $k=4$ shakes, you have



$$
(nmid 2,4)=fracfrac1n^4binom n2sum_n=1^inftyfrac1n^4binom n2=fracfracn-1n^3sum_n=1^inftyfracn-1n^3=fracfracn-1n^3zeta(2)-zeta(3)approx 2.258fracn-1n^3;,
$$



so now only about $28%$ are concentrated on the possibility that these are the only two sides.



Now, you'd asked about an estimate of the number of sides. The expected value of $(nmid m,k)$ is



$$
mathbb E(nmid m,k)=fracsum_n=1^Omegafrac1n^k-1binom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



So it turns out that even after two reoccurences you can't get a finite estimate of the number of sides if you didn't have any prior clue how many there might be – you need three reoccurrences for the sum in the numerator to converge. In the case of always observing a single side, the expected value is



$$
mathbb E(nmid1,k)=fraczeta(k-2)zeta(k-1);.
$$



so after seeing the same four sides and nothing else, your estimate of the number of sides is $zeta(2)/zeta(3)approx1.368$.



To give a more realistic example, let's say you shake the ball with $n=20$ sides $k=20$ times. Then the mode of the distribution of the number of distinct sides you see is at $m=13$ (at a probability of about $28%$). So let's say you observe that most probable value and try to estimate the number of sides. Then



$$
(nmid 13,20)=fracfrac1n^20binom n13sum_n=1^inftyfrac1n^20binom n13approxfrac8.25cdot10^19n^20binom n13;.
$$



Here's a plot, with a nice mode at $n=20$. The estimate from the expectation value is



$$
mathbb E(nmid 13,20)=fracsum_n=1^Omegafrac1n^19binom n13sum_n=1^Omegafrac1n^20binom n13approx26.5;,
$$



which perhaps shows that you might be better off using the mode as the estimator rather than the mean, at least if you don't expect huge numbers of sides.



Another question we might ask is: If we've seen all $m=n$ sides, what number $k$ of trials do we need before we're, say, $99%$ sure that that's all there is? That is, for which $k$ do we first have



$$
(n=mmid m,k)=fracfrac1m^kbinom mmsum_n=1^inftyfrac1n^kbinom nm=fracfrac1m^ksum_n=1^inftyfrac1n^kbinom nmge0.99;?
$$



Some trial and error on the borders of Wolfram|Alpha's time limits shows that for $m=n=20$ this requires a full $k=157$ trials. (If the calculation times out, just try again.)



Regarding the approximation considered in the comments: If the term for $n=m$ is dominant, we can compare it to the next term, for $n=m+1$:



$$
frac(mmid n=m,k)(mmid n=m+1,k)
=
fracfrac1m^kbinom mmfrac1(m+1)^kbinomm+1m
=
fracleft(1+frac1mright)^km+1
=
fracmathrm e^klogleft(1+frac1mright)m+1;.
$$



We can already see that this is going to get very large for large $k$ (no approximation so far); to see more clearly how large, we can use $log(1+x)simeq x$ to get



$$
frac(mmid n=m,k)(mmid n=m+1,k)
simeq
fracmathrm e^k/mm+1;.
$$



Thus, for $kgg m$ this ratio grows exponentially. The third term is again smaller by a similar factor, so we can ignore it and all remaining terms and approximate the credibility for $n=m$ as



begineqnarray
(n=mmid m,k)
&simeq&
frac(mmid n=m,k)(mmid n=m,k)+(mmid n=m+1,k)
\
&simeq&
frac11+(m+1)mathrm e^-k/m
\
&simeq&
1-(m+1)mathrm e^-k/m;.
endeqnarray



So our doubts whether indeed $n=m$ decay exponentially with $k/m$.






share|cite|improve this answer























  • Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
    – aardvarkk
    Jul 19 '15 at 16:05











  • @aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
    – joriki
    Jul 19 '15 at 16:16











  • Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
    – aardvarkk
    Jul 19 '15 at 17:04






  • 1




    @aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
    – joriki
    Jul 19 '15 at 17:39











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1366491%2fmagic-8-ball-problem%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
11
down vote



accepted










This problem is similar to the German tank problem. You can use the same approaches used for that problem, with the number of distinct sides seen and its conditional distribution taking the place of the highest serial number seen and its conditional distribution. (Note that the only information we have that tells us anything about the number of sides of the magic ball is the number of distinct sides we've seen; all the details of which side we saw when, both the ones you recorded in your sequence and the ones you didn't record, are irrelevant, since their conditional distribution given the number of sides seen is the same irrespective of the total number of sides, i.e. they carry no information on the total number of sides.)



I'll work out the Bayesian approach in analogy to the treatment in the Wikipedia article linked to above; I won't go into much detail for the parts that are common to the two calculations.



So let $n$ be the number of sides of the magic ball, $k$ the number of shakes performed so far, and $m$ the number of distinct sides seen. (All the notation is in analogy to the article.) Then the desired credibility $(nmid k,m)$ is given by



$$
(nmid m,k) = (mmid n,k)frac (nmid k)(mmid k);.
$$



We can take the required conditional probability $(mmid n,k)$ of seeing $m$ distinct sides on $k$ shakes of a ball with $n$ sides from this answer:



$$
(mmid n,k)=fracm!n^kbinom nmleftkatop mright;,
$$



where the $leftkatop mright$ are Stirling numbers of the second kind. As in the article, we can eliminate $(mmid k)$ by marginalizing over all possible $n$:



$$
(mmid k)=sum_n=1^infty(mmid n,k)(nmid k);.
$$



Now comes the shady part, $(nmid k)$, your prior credibility that there are $n$ sides after having performed $k$ shakes. In contrast to the tank problem, where observing a tank changes not only the highest serial number seen but also the minimum number of tanks, in our case $(nmid k)$ doesn't depend on $k$. We can use the same trick of introducing a maximum number $Omega$ of sides up to which the prior credibility is uniform, and hoping that $Omega$ will magically drop out of the result. Then $(nmid k)=1/Omega$ for $1le nleOmega$. Putting it all together, we have



$$
(nmid m,k)=frac(mmid n,k)sum_n=1^Omega(mmid n,k)=fracfracm!n^kbinom nmleftkatop mrightsum_n=1^Omegafracm!n^kbinom nmleftkatop mright=fracfrac1n^kbinom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



Now, interestingly, if we want to let the sum in the denominator run to infinity to get rid of $Omega$, we find that it converges if and only if $mle k-2$. That is, if you had no prior idea how many sides the magic ball might have and you've never seen the same side twice, then you still have no idea how many sides it might have. That's perhaps to be expected; what's perhaps not quite as expected is that this is still true (though "only" with a logarithmic divergence in the denominator) if you've seen a single reoccurrence of a side. Only with at least two reoccurrences can you overcome your prior ignorance as to the possible scale of the number of sides.



Let's work out some examples. The first convergent case is $m=1$, $k=3$, i.e. you shook three times and got the same side every time. Then



$$
(nmid 1,3)=fracfrac1n^2sum_n=1^inftyfrac1n^2=frac6(npi)^2;.
$$



So at this point you're already $6/pi^2approx61%$ sure that the thing is a lemon and only ever shows this one side. More generally, we have



$$
(nmid 1,k)=frac1n^k-1zeta(k-1);,
$$



so after seeing the same side four times in a row your inclination to go to the store to get your money back has risen to $1/zeta(3)approx83%$. On the other hand, if you do see $m=2$ different sides in $k=4$ shakes, you have



$$
(nmid 2,4)=fracfrac1n^4binom n2sum_n=1^inftyfrac1n^4binom n2=fracfracn-1n^3sum_n=1^inftyfracn-1n^3=fracfracn-1n^3zeta(2)-zeta(3)approx 2.258fracn-1n^3;,
$$



so now only about $28%$ are concentrated on the possibility that these are the only two sides.



Now, you'd asked about an estimate of the number of sides. The expected value of $(nmid m,k)$ is



$$
mathbb E(nmid m,k)=fracsum_n=1^Omegafrac1n^k-1binom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



So it turns out that even after two reoccurences you can't get a finite estimate of the number of sides if you didn't have any prior clue how many there might be – you need three reoccurrences for the sum in the numerator to converge. In the case of always observing a single side, the expected value is



$$
mathbb E(nmid1,k)=fraczeta(k-2)zeta(k-1);.
$$



so after seeing the same four sides and nothing else, your estimate of the number of sides is $zeta(2)/zeta(3)approx1.368$.



To give a more realistic example, let's say you shake the ball with $n=20$ sides $k=20$ times. Then the mode of the distribution of the number of distinct sides you see is at $m=13$ (at a probability of about $28%$). So let's say you observe that most probable value and try to estimate the number of sides. Then



$$
(nmid 13,20)=fracfrac1n^20binom n13sum_n=1^inftyfrac1n^20binom n13approxfrac8.25cdot10^19n^20binom n13;.
$$



Here's a plot, with a nice mode at $n=20$. The estimate from the expectation value is



$$
mathbb E(nmid 13,20)=fracsum_n=1^Omegafrac1n^19binom n13sum_n=1^Omegafrac1n^20binom n13approx26.5;,
$$



which perhaps shows that you might be better off using the mode as the estimator rather than the mean, at least if you don't expect huge numbers of sides.



Another question we might ask is: If we've seen all $m=n$ sides, what number $k$ of trials do we need before we're, say, $99%$ sure that that's all there is? That is, for which $k$ do we first have



$$
(n=mmid m,k)=fracfrac1m^kbinom mmsum_n=1^inftyfrac1n^kbinom nm=fracfrac1m^ksum_n=1^inftyfrac1n^kbinom nmge0.99;?
$$



Some trial and error on the borders of Wolfram|Alpha's time limits shows that for $m=n=20$ this requires a full $k=157$ trials. (If the calculation times out, just try again.)



Regarding the approximation considered in the comments: If the term for $n=m$ is dominant, we can compare it to the next term, for $n=m+1$:



$$
frac(mmid n=m,k)(mmid n=m+1,k)
=
fracfrac1m^kbinom mmfrac1(m+1)^kbinomm+1m
=
fracleft(1+frac1mright)^km+1
=
fracmathrm e^klogleft(1+frac1mright)m+1;.
$$



We can already see that this is going to get very large for large $k$ (no approximation so far); to see more clearly how large, we can use $log(1+x)simeq x$ to get



$$
frac(mmid n=m,k)(mmid n=m+1,k)
simeq
fracmathrm e^k/mm+1;.
$$



Thus, for $kgg m$ this ratio grows exponentially. The third term is again smaller by a similar factor, so we can ignore it and all remaining terms and approximate the credibility for $n=m$ as



begineqnarray
(n=mmid m,k)
&simeq&
frac(mmid n=m,k)(mmid n=m,k)+(mmid n=m+1,k)
\
&simeq&
frac11+(m+1)mathrm e^-k/m
\
&simeq&
1-(m+1)mathrm e^-k/m;.
endeqnarray



So our doubts whether indeed $n=m$ decay exponentially with $k/m$.






share|cite|improve this answer























  • Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
    – aardvarkk
    Jul 19 '15 at 16:05











  • @aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
    – joriki
    Jul 19 '15 at 16:16











  • Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
    – aardvarkk
    Jul 19 '15 at 17:04






  • 1




    @aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
    – joriki
    Jul 19 '15 at 17:39















up vote
11
down vote



accepted










This problem is similar to the German tank problem. You can use the same approaches used for that problem, with the number of distinct sides seen and its conditional distribution taking the place of the highest serial number seen and its conditional distribution. (Note that the only information we have that tells us anything about the number of sides of the magic ball is the number of distinct sides we've seen; all the details of which side we saw when, both the ones you recorded in your sequence and the ones you didn't record, are irrelevant, since their conditional distribution given the number of sides seen is the same irrespective of the total number of sides, i.e. they carry no information on the total number of sides.)



I'll work out the Bayesian approach in analogy to the treatment in the Wikipedia article linked to above; I won't go into much detail for the parts that are common to the two calculations.



So let $n$ be the number of sides of the magic ball, $k$ the number of shakes performed so far, and $m$ the number of distinct sides seen. (All the notation is in analogy to the article.) Then the desired credibility $(nmid k,m)$ is given by



$$
(nmid m,k) = (mmid n,k)frac (nmid k)(mmid k);.
$$



We can take the required conditional probability $(mmid n,k)$ of seeing $m$ distinct sides on $k$ shakes of a ball with $n$ sides from this answer:



$$
(mmid n,k)=fracm!n^kbinom nmleftkatop mright;,
$$



where the $leftkatop mright$ are Stirling numbers of the second kind. As in the article, we can eliminate $(mmid k)$ by marginalizing over all possible $n$:



$$
(mmid k)=sum_n=1^infty(mmid n,k)(nmid k);.
$$



Now comes the shady part, $(nmid k)$, your prior credibility that there are $n$ sides after having performed $k$ shakes. In contrast to the tank problem, where observing a tank changes not only the highest serial number seen but also the minimum number of tanks, in our case $(nmid k)$ doesn't depend on $k$. We can use the same trick of introducing a maximum number $Omega$ of sides up to which the prior credibility is uniform, and hoping that $Omega$ will magically drop out of the result. Then $(nmid k)=1/Omega$ for $1le nleOmega$. Putting it all together, we have



$$
(nmid m,k)=frac(mmid n,k)sum_n=1^Omega(mmid n,k)=fracfracm!n^kbinom nmleftkatop mrightsum_n=1^Omegafracm!n^kbinom nmleftkatop mright=fracfrac1n^kbinom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



Now, interestingly, if we want to let the sum in the denominator run to infinity to get rid of $Omega$, we find that it converges if and only if $mle k-2$. That is, if you had no prior idea how many sides the magic ball might have and you've never seen the same side twice, then you still have no idea how many sides it might have. That's perhaps to be expected; what's perhaps not quite as expected is that this is still true (though "only" with a logarithmic divergence in the denominator) if you've seen a single reoccurrence of a side. Only with at least two reoccurrences can you overcome your prior ignorance as to the possible scale of the number of sides.



Let's work out some examples. The first convergent case is $m=1$, $k=3$, i.e. you shook three times and got the same side every time. Then



$$
(nmid 1,3)=fracfrac1n^2sum_n=1^inftyfrac1n^2=frac6(npi)^2;.
$$



So at this point you're already $6/pi^2approx61%$ sure that the thing is a lemon and only ever shows this one side. More generally, we have



$$
(nmid 1,k)=frac1n^k-1zeta(k-1);,
$$



so after seeing the same side four times in a row your inclination to go to the store to get your money back has risen to $1/zeta(3)approx83%$. On the other hand, if you do see $m=2$ different sides in $k=4$ shakes, you have



$$
(nmid 2,4)=fracfrac1n^4binom n2sum_n=1^inftyfrac1n^4binom n2=fracfracn-1n^3sum_n=1^inftyfracn-1n^3=fracfracn-1n^3zeta(2)-zeta(3)approx 2.258fracn-1n^3;,
$$



so now only about $28%$ are concentrated on the possibility that these are the only two sides.



Now, you'd asked about an estimate of the number of sides. The expected value of $(nmid m,k)$ is



$$
mathbb E(nmid m,k)=fracsum_n=1^Omegafrac1n^k-1binom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



So it turns out that even after two reoccurences you can't get a finite estimate of the number of sides if you didn't have any prior clue how many there might be – you need three reoccurrences for the sum in the numerator to converge. In the case of always observing a single side, the expected value is



$$
mathbb E(nmid1,k)=fraczeta(k-2)zeta(k-1);.
$$



so after seeing the same four sides and nothing else, your estimate of the number of sides is $zeta(2)/zeta(3)approx1.368$.



To give a more realistic example, let's say you shake the ball with $n=20$ sides $k=20$ times. Then the mode of the distribution of the number of distinct sides you see is at $m=13$ (at a probability of about $28%$). So let's say you observe that most probable value and try to estimate the number of sides. Then



$$
(nmid 13,20)=fracfrac1n^20binom n13sum_n=1^inftyfrac1n^20binom n13approxfrac8.25cdot10^19n^20binom n13;.
$$



Here's a plot, with a nice mode at $n=20$. The estimate from the expectation value is



$$
mathbb E(nmid 13,20)=fracsum_n=1^Omegafrac1n^19binom n13sum_n=1^Omegafrac1n^20binom n13approx26.5;,
$$



which perhaps shows that you might be better off using the mode as the estimator rather than the mean, at least if you don't expect huge numbers of sides.



Another question we might ask is: If we've seen all $m=n$ sides, what number $k$ of trials do we need before we're, say, $99%$ sure that that's all there is? That is, for which $k$ do we first have



$$
(n=mmid m,k)=fracfrac1m^kbinom mmsum_n=1^inftyfrac1n^kbinom nm=fracfrac1m^ksum_n=1^inftyfrac1n^kbinom nmge0.99;?
$$



Some trial and error on the borders of Wolfram|Alpha's time limits shows that for $m=n=20$ this requires a full $k=157$ trials. (If the calculation times out, just try again.)



Regarding the approximation considered in the comments: If the term for $n=m$ is dominant, we can compare it to the next term, for $n=m+1$:



$$
frac(mmid n=m,k)(mmid n=m+1,k)
=
fracfrac1m^kbinom mmfrac1(m+1)^kbinomm+1m
=
fracleft(1+frac1mright)^km+1
=
fracmathrm e^klogleft(1+frac1mright)m+1;.
$$



We can already see that this is going to get very large for large $k$ (no approximation so far); to see more clearly how large, we can use $log(1+x)simeq x$ to get



$$
frac(mmid n=m,k)(mmid n=m+1,k)
simeq
fracmathrm e^k/mm+1;.
$$



Thus, for $kgg m$ this ratio grows exponentially. The third term is again smaller by a similar factor, so we can ignore it and all remaining terms and approximate the credibility for $n=m$ as



begineqnarray
(n=mmid m,k)
&simeq&
frac(mmid n=m,k)(mmid n=m,k)+(mmid n=m+1,k)
\
&simeq&
frac11+(m+1)mathrm e^-k/m
\
&simeq&
1-(m+1)mathrm e^-k/m;.
endeqnarray



So our doubts whether indeed $n=m$ decay exponentially with $k/m$.






share|cite|improve this answer























  • Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
    – aardvarkk
    Jul 19 '15 at 16:05











  • @aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
    – joriki
    Jul 19 '15 at 16:16











  • Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
    – aardvarkk
    Jul 19 '15 at 17:04






  • 1




    @aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
    – joriki
    Jul 19 '15 at 17:39













up vote
11
down vote



accepted







up vote
11
down vote



accepted






This problem is similar to the German tank problem. You can use the same approaches used for that problem, with the number of distinct sides seen and its conditional distribution taking the place of the highest serial number seen and its conditional distribution. (Note that the only information we have that tells us anything about the number of sides of the magic ball is the number of distinct sides we've seen; all the details of which side we saw when, both the ones you recorded in your sequence and the ones you didn't record, are irrelevant, since their conditional distribution given the number of sides seen is the same irrespective of the total number of sides, i.e. they carry no information on the total number of sides.)



I'll work out the Bayesian approach in analogy to the treatment in the Wikipedia article linked to above; I won't go into much detail for the parts that are common to the two calculations.



So let $n$ be the number of sides of the magic ball, $k$ the number of shakes performed so far, and $m$ the number of distinct sides seen. (All the notation is in analogy to the article.) Then the desired credibility $(nmid k,m)$ is given by



$$
(nmid m,k) = (mmid n,k)frac (nmid k)(mmid k);.
$$



We can take the required conditional probability $(mmid n,k)$ of seeing $m$ distinct sides on $k$ shakes of a ball with $n$ sides from this answer:



$$
(mmid n,k)=fracm!n^kbinom nmleftkatop mright;,
$$



where the $leftkatop mright$ are Stirling numbers of the second kind. As in the article, we can eliminate $(mmid k)$ by marginalizing over all possible $n$:



$$
(mmid k)=sum_n=1^infty(mmid n,k)(nmid k);.
$$



Now comes the shady part, $(nmid k)$, your prior credibility that there are $n$ sides after having performed $k$ shakes. In contrast to the tank problem, where observing a tank changes not only the highest serial number seen but also the minimum number of tanks, in our case $(nmid k)$ doesn't depend on $k$. We can use the same trick of introducing a maximum number $Omega$ of sides up to which the prior credibility is uniform, and hoping that $Omega$ will magically drop out of the result. Then $(nmid k)=1/Omega$ for $1le nleOmega$. Putting it all together, we have



$$
(nmid m,k)=frac(mmid n,k)sum_n=1^Omega(mmid n,k)=fracfracm!n^kbinom nmleftkatop mrightsum_n=1^Omegafracm!n^kbinom nmleftkatop mright=fracfrac1n^kbinom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



Now, interestingly, if we want to let the sum in the denominator run to infinity to get rid of $Omega$, we find that it converges if and only if $mle k-2$. That is, if you had no prior idea how many sides the magic ball might have and you've never seen the same side twice, then you still have no idea how many sides it might have. That's perhaps to be expected; what's perhaps not quite as expected is that this is still true (though "only" with a logarithmic divergence in the denominator) if you've seen a single reoccurrence of a side. Only with at least two reoccurrences can you overcome your prior ignorance as to the possible scale of the number of sides.



Let's work out some examples. The first convergent case is $m=1$, $k=3$, i.e. you shook three times and got the same side every time. Then



$$
(nmid 1,3)=fracfrac1n^2sum_n=1^inftyfrac1n^2=frac6(npi)^2;.
$$



So at this point you're already $6/pi^2approx61%$ sure that the thing is a lemon and only ever shows this one side. More generally, we have



$$
(nmid 1,k)=frac1n^k-1zeta(k-1);,
$$



so after seeing the same side four times in a row your inclination to go to the store to get your money back has risen to $1/zeta(3)approx83%$. On the other hand, if you do see $m=2$ different sides in $k=4$ shakes, you have



$$
(nmid 2,4)=fracfrac1n^4binom n2sum_n=1^inftyfrac1n^4binom n2=fracfracn-1n^3sum_n=1^inftyfracn-1n^3=fracfracn-1n^3zeta(2)-zeta(3)approx 2.258fracn-1n^3;,
$$



so now only about $28%$ are concentrated on the possibility that these are the only two sides.



Now, you'd asked about an estimate of the number of sides. The expected value of $(nmid m,k)$ is



$$
mathbb E(nmid m,k)=fracsum_n=1^Omegafrac1n^k-1binom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



So it turns out that even after two reoccurences you can't get a finite estimate of the number of sides if you didn't have any prior clue how many there might be – you need three reoccurrences for the sum in the numerator to converge. In the case of always observing a single side, the expected value is



$$
mathbb E(nmid1,k)=fraczeta(k-2)zeta(k-1);.
$$



so after seeing the same four sides and nothing else, your estimate of the number of sides is $zeta(2)/zeta(3)approx1.368$.



To give a more realistic example, let's say you shake the ball with $n=20$ sides $k=20$ times. Then the mode of the distribution of the number of distinct sides you see is at $m=13$ (at a probability of about $28%$). So let's say you observe that most probable value and try to estimate the number of sides. Then



$$
(nmid 13,20)=fracfrac1n^20binom n13sum_n=1^inftyfrac1n^20binom n13approxfrac8.25cdot10^19n^20binom n13;.
$$



Here's a plot, with a nice mode at $n=20$. The estimate from the expectation value is



$$
mathbb E(nmid 13,20)=fracsum_n=1^Omegafrac1n^19binom n13sum_n=1^Omegafrac1n^20binom n13approx26.5;,
$$



which perhaps shows that you might be better off using the mode as the estimator rather than the mean, at least if you don't expect huge numbers of sides.



Another question we might ask is: If we've seen all $m=n$ sides, what number $k$ of trials do we need before we're, say, $99%$ sure that that's all there is? That is, for which $k$ do we first have



$$
(n=mmid m,k)=fracfrac1m^kbinom mmsum_n=1^inftyfrac1n^kbinom nm=fracfrac1m^ksum_n=1^inftyfrac1n^kbinom nmge0.99;?
$$



Some trial and error on the borders of Wolfram|Alpha's time limits shows that for $m=n=20$ this requires a full $k=157$ trials. (If the calculation times out, just try again.)



Regarding the approximation considered in the comments: If the term for $n=m$ is dominant, we can compare it to the next term, for $n=m+1$:



$$
frac(mmid n=m,k)(mmid n=m+1,k)
=
fracfrac1m^kbinom mmfrac1(m+1)^kbinomm+1m
=
fracleft(1+frac1mright)^km+1
=
fracmathrm e^klogleft(1+frac1mright)m+1;.
$$



We can already see that this is going to get very large for large $k$ (no approximation so far); to see more clearly how large, we can use $log(1+x)simeq x$ to get



$$
frac(mmid n=m,k)(mmid n=m+1,k)
simeq
fracmathrm e^k/mm+1;.
$$



Thus, for $kgg m$ this ratio grows exponentially. The third term is again smaller by a similar factor, so we can ignore it and all remaining terms and approximate the credibility for $n=m$ as



begineqnarray
(n=mmid m,k)
&simeq&
frac(mmid n=m,k)(mmid n=m,k)+(mmid n=m+1,k)
\
&simeq&
frac11+(m+1)mathrm e^-k/m
\
&simeq&
1-(m+1)mathrm e^-k/m;.
endeqnarray



So our doubts whether indeed $n=m$ decay exponentially with $k/m$.






share|cite|improve this answer















This problem is similar to the German tank problem. You can use the same approaches used for that problem, with the number of distinct sides seen and its conditional distribution taking the place of the highest serial number seen and its conditional distribution. (Note that the only information we have that tells us anything about the number of sides of the magic ball is the number of distinct sides we've seen; all the details of which side we saw when, both the ones you recorded in your sequence and the ones you didn't record, are irrelevant, since their conditional distribution given the number of sides seen is the same irrespective of the total number of sides, i.e. they carry no information on the total number of sides.)



I'll work out the Bayesian approach in analogy to the treatment in the Wikipedia article linked to above; I won't go into much detail for the parts that are common to the two calculations.



So let $n$ be the number of sides of the magic ball, $k$ the number of shakes performed so far, and $m$ the number of distinct sides seen. (All the notation is in analogy to the article.) Then the desired credibility $(nmid k,m)$ is given by



$$
(nmid m,k) = (mmid n,k)frac (nmid k)(mmid k);.
$$



We can take the required conditional probability $(mmid n,k)$ of seeing $m$ distinct sides on $k$ shakes of a ball with $n$ sides from this answer:



$$
(mmid n,k)=fracm!n^kbinom nmleftkatop mright;,
$$



where the $leftkatop mright$ are Stirling numbers of the second kind. As in the article, we can eliminate $(mmid k)$ by marginalizing over all possible $n$:



$$
(mmid k)=sum_n=1^infty(mmid n,k)(nmid k);.
$$



Now comes the shady part, $(nmid k)$, your prior credibility that there are $n$ sides after having performed $k$ shakes. In contrast to the tank problem, where observing a tank changes not only the highest serial number seen but also the minimum number of tanks, in our case $(nmid k)$ doesn't depend on $k$. We can use the same trick of introducing a maximum number $Omega$ of sides up to which the prior credibility is uniform, and hoping that $Omega$ will magically drop out of the result. Then $(nmid k)=1/Omega$ for $1le nleOmega$. Putting it all together, we have



$$
(nmid m,k)=frac(mmid n,k)sum_n=1^Omega(mmid n,k)=fracfracm!n^kbinom nmleftkatop mrightsum_n=1^Omegafracm!n^kbinom nmleftkatop mright=fracfrac1n^kbinom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



Now, interestingly, if we want to let the sum in the denominator run to infinity to get rid of $Omega$, we find that it converges if and only if $mle k-2$. That is, if you had no prior idea how many sides the magic ball might have and you've never seen the same side twice, then you still have no idea how many sides it might have. That's perhaps to be expected; what's perhaps not quite as expected is that this is still true (though "only" with a logarithmic divergence in the denominator) if you've seen a single reoccurrence of a side. Only with at least two reoccurrences can you overcome your prior ignorance as to the possible scale of the number of sides.



Let's work out some examples. The first convergent case is $m=1$, $k=3$, i.e. you shook three times and got the same side every time. Then



$$
(nmid 1,3)=fracfrac1n^2sum_n=1^inftyfrac1n^2=frac6(npi)^2;.
$$



So at this point you're already $6/pi^2approx61%$ sure that the thing is a lemon and only ever shows this one side. More generally, we have



$$
(nmid 1,k)=frac1n^k-1zeta(k-1);,
$$



so after seeing the same side four times in a row your inclination to go to the store to get your money back has risen to $1/zeta(3)approx83%$. On the other hand, if you do see $m=2$ different sides in $k=4$ shakes, you have



$$
(nmid 2,4)=fracfrac1n^4binom n2sum_n=1^inftyfrac1n^4binom n2=fracfracn-1n^3sum_n=1^inftyfracn-1n^3=fracfracn-1n^3zeta(2)-zeta(3)approx 2.258fracn-1n^3;,
$$



so now only about $28%$ are concentrated on the possibility that these are the only two sides.



Now, you'd asked about an estimate of the number of sides. The expected value of $(nmid m,k)$ is



$$
mathbb E(nmid m,k)=fracsum_n=1^Omegafrac1n^k-1binom nmsum_n=1^Omegafrac1n^kbinom nm;.
$$



So it turns out that even after two reoccurences you can't get a finite estimate of the number of sides if you didn't have any prior clue how many there might be – you need three reoccurrences for the sum in the numerator to converge. In the case of always observing a single side, the expected value is



$$
mathbb E(nmid1,k)=fraczeta(k-2)zeta(k-1);.
$$



so after seeing the same four sides and nothing else, your estimate of the number of sides is $zeta(2)/zeta(3)approx1.368$.



To give a more realistic example, let's say you shake the ball with $n=20$ sides $k=20$ times. Then the mode of the distribution of the number of distinct sides you see is at $m=13$ (at a probability of about $28%$). So let's say you observe that most probable value and try to estimate the number of sides. Then



$$
(nmid 13,20)=fracfrac1n^20binom n13sum_n=1^inftyfrac1n^20binom n13approxfrac8.25cdot10^19n^20binom n13;.
$$



Here's a plot, with a nice mode at $n=20$. The estimate from the expectation value is



$$
mathbb E(nmid 13,20)=fracsum_n=1^Omegafrac1n^19binom n13sum_n=1^Omegafrac1n^20binom n13approx26.5;,
$$



which perhaps shows that you might be better off using the mode as the estimator rather than the mean, at least if you don't expect huge numbers of sides.



Another question we might ask is: If we've seen all $m=n$ sides, what number $k$ of trials do we need before we're, say, $99%$ sure that that's all there is? That is, for which $k$ do we first have



$$
(n=mmid m,k)=fracfrac1m^kbinom mmsum_n=1^inftyfrac1n^kbinom nm=fracfrac1m^ksum_n=1^inftyfrac1n^kbinom nmge0.99;?
$$



Some trial and error on the borders of Wolfram|Alpha's time limits shows that for $m=n=20$ this requires a full $k=157$ trials. (If the calculation times out, just try again.)



Regarding the approximation considered in the comments: If the term for $n=m$ is dominant, we can compare it to the next term, for $n=m+1$:



$$
frac(mmid n=m,k)(mmid n=m+1,k)
=
fracfrac1m^kbinom mmfrac1(m+1)^kbinomm+1m
=
fracleft(1+frac1mright)^km+1
=
fracmathrm e^klogleft(1+frac1mright)m+1;.
$$



We can already see that this is going to get very large for large $k$ (no approximation so far); to see more clearly how large, we can use $log(1+x)simeq x$ to get



$$
frac(mmid n=m,k)(mmid n=m+1,k)
simeq
fracmathrm e^k/mm+1;.
$$



Thus, for $kgg m$ this ratio grows exponentially. The third term is again smaller by a similar factor, so we can ignore it and all remaining terms and approximate the credibility for $n=m$ as



begineqnarray
(n=mmid m,k)
&simeq&
frac(mmid n=m,k)(mmid n=m,k)+(mmid n=m+1,k)
\
&simeq&
frac11+(m+1)mathrm e^-k/m
\
&simeq&
1-(m+1)mathrm e^-k/m;.
endeqnarray



So our doubts whether indeed $n=m$ decay exponentially with $k/m$.







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited Apr 13 '17 at 12:21









Community♦

1




1











answered Jul 19 '15 at 15:08









joriki

164k10180328




164k10180328











  • Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
    – aardvarkk
    Jul 19 '15 at 16:05











  • @aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
    – joriki
    Jul 19 '15 at 16:16











  • Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
    – aardvarkk
    Jul 19 '15 at 17:04






  • 1




    @aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
    – joriki
    Jul 19 '15 at 17:39

















  • Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
    – aardvarkk
    Jul 19 '15 at 16:05











  • @aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
    – joriki
    Jul 19 '15 at 16:16











  • Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
    – aardvarkk
    Jul 19 '15 at 17:04






  • 1




    @aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
    – joriki
    Jul 19 '15 at 17:39
















Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
– aardvarkk
Jul 19 '15 at 16:05





Wow. This is an incredible answer. You are much smarter than me. I just want to verify: can we do a plot of (n|m,k) for m = 20 and k = 100,000 in Wolfram? I would expect a very strong peak at 20 if my interpretation of your answer is correct?
– aardvarkk
Jul 19 '15 at 16:05













@aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
– joriki
Jul 19 '15 at 16:16





@aardvarkk: Well, $k=100,000$ in the exponent is a bit much even for Wolfram|Alpha :-) But here's a plot with $k=200$ (and $m=20$), where the probability is already extremely concentrated at $n=20$. (I didn't include the factor of proportionality.)
– joriki
Jul 19 '15 at 16:16













Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
– aardvarkk
Jul 19 '15 at 17:04




Very cool. I will mark this as the answer. Thanks so much for your attention on this. I'm very impressed! I'm also curious, though -- do you think there's some approximation that could be done to deal with very large n, k, and m? Say, on the order of 100,000 as I mentioned before? If we're willing to throw away some accuracy?
– aardvarkk
Jul 19 '15 at 17:04




1




1




@aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
– joriki
Jul 19 '15 at 17:39





@aardvarkk: You're very welcome! I added an approximation to the answer; I hope this is roughly what you were looking for? (I also added a bit in a previous edit about the number of trials required for a certain certainty that $n=m$.)
– joriki
Jul 19 '15 at 17:39













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1366491%2fmagic-8-ball-problem%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?