Monte-Carlo approximation with small samples

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Let me suppose I have one function of $y$ given $x$ : $f(ymid x)$ and $N$ samples of $x$ : $x_i_i=1^N$. Here, I’d like to create a distribution over the space of $y$ based on this function $f$ given $x_i$ like:



$$
p(ymid x_i) = fracexpf(ymid x_i)sum_yinmathcalY expf(ymid x_i)
$$



However, unfortunately, the computation of $sum_yinmathcalY$ is infeasible. For example, the space $mathcalY$ is too large.
Also, let me assume that it takes high costs to generate a sample $y_i^(j)$ from $f$ given $x_i$. So I can utilize only a few samples of $y$ and it’s difficult to apply Monte-Carlo approximation.
As the extreme case, let me assume I have only one sample of $y$ per each $x_i$ : $y_i^(j)$.



Question: Can I simply use



beginalign
fracexpf(y_i^(j)mid x_i)sum_i=1^N expf(y_i^(j)mid x_i)
endalign



instead of



beginalign
fracexpf(y_i^(j)mid x_i)sum_j^prime=1^M expf(y_i^(j^prime)mid x_i)
endalign



as the rough approximation of $p(ymid x_i)$ at $y_i^(j)$ given $x_i$? If not, what is the basic methods for this kind of situation where one can utilize small numbers of samples?



Thank you very much for reading this question!







share|cite|improve this question

























    up vote
    1
    down vote

    favorite












    Let me suppose I have one function of $y$ given $x$ : $f(ymid x)$ and $N$ samples of $x$ : $x_i_i=1^N$. Here, I’d like to create a distribution over the space of $y$ based on this function $f$ given $x_i$ like:



    $$
    p(ymid x_i) = fracexpf(ymid x_i)sum_yinmathcalY expf(ymid x_i)
    $$



    However, unfortunately, the computation of $sum_yinmathcalY$ is infeasible. For example, the space $mathcalY$ is too large.
    Also, let me assume that it takes high costs to generate a sample $y_i^(j)$ from $f$ given $x_i$. So I can utilize only a few samples of $y$ and it’s difficult to apply Monte-Carlo approximation.
    As the extreme case, let me assume I have only one sample of $y$ per each $x_i$ : $y_i^(j)$.



    Question: Can I simply use



    beginalign
    fracexpf(y_i^(j)mid x_i)sum_i=1^N expf(y_i^(j)mid x_i)
    endalign



    instead of



    beginalign
    fracexpf(y_i^(j)mid x_i)sum_j^prime=1^M expf(y_i^(j^prime)mid x_i)
    endalign



    as the rough approximation of $p(ymid x_i)$ at $y_i^(j)$ given $x_i$? If not, what is the basic methods for this kind of situation where one can utilize small numbers of samples?



    Thank you very much for reading this question!







    share|cite|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Let me suppose I have one function of $y$ given $x$ : $f(ymid x)$ and $N$ samples of $x$ : $x_i_i=1^N$. Here, I’d like to create a distribution over the space of $y$ based on this function $f$ given $x_i$ like:



      $$
      p(ymid x_i) = fracexpf(ymid x_i)sum_yinmathcalY expf(ymid x_i)
      $$



      However, unfortunately, the computation of $sum_yinmathcalY$ is infeasible. For example, the space $mathcalY$ is too large.
      Also, let me assume that it takes high costs to generate a sample $y_i^(j)$ from $f$ given $x_i$. So I can utilize only a few samples of $y$ and it’s difficult to apply Monte-Carlo approximation.
      As the extreme case, let me assume I have only one sample of $y$ per each $x_i$ : $y_i^(j)$.



      Question: Can I simply use



      beginalign
      fracexpf(y_i^(j)mid x_i)sum_i=1^N expf(y_i^(j)mid x_i)
      endalign



      instead of



      beginalign
      fracexpf(y_i^(j)mid x_i)sum_j^prime=1^M expf(y_i^(j^prime)mid x_i)
      endalign



      as the rough approximation of $p(ymid x_i)$ at $y_i^(j)$ given $x_i$? If not, what is the basic methods for this kind of situation where one can utilize small numbers of samples?



      Thank you very much for reading this question!







      share|cite|improve this question













      Let me suppose I have one function of $y$ given $x$ : $f(ymid x)$ and $N$ samples of $x$ : $x_i_i=1^N$. Here, I’d like to create a distribution over the space of $y$ based on this function $f$ given $x_i$ like:



      $$
      p(ymid x_i) = fracexpf(ymid x_i)sum_yinmathcalY expf(ymid x_i)
      $$



      However, unfortunately, the computation of $sum_yinmathcalY$ is infeasible. For example, the space $mathcalY$ is too large.
      Also, let me assume that it takes high costs to generate a sample $y_i^(j)$ from $f$ given $x_i$. So I can utilize only a few samples of $y$ and it’s difficult to apply Monte-Carlo approximation.
      As the extreme case, let me assume I have only one sample of $y$ per each $x_i$ : $y_i^(j)$.



      Question: Can I simply use



      beginalign
      fracexpf(y_i^(j)mid x_i)sum_i=1^N expf(y_i^(j)mid x_i)
      endalign



      instead of



      beginalign
      fracexpf(y_i^(j)mid x_i)sum_j^prime=1^M expf(y_i^(j^prime)mid x_i)
      endalign



      as the rough approximation of $p(ymid x_i)$ at $y_i^(j)$ given $x_i$? If not, what is the basic methods for this kind of situation where one can utilize small numbers of samples?



      Thank you very much for reading this question!









      share|cite|improve this question












      share|cite|improve this question




      share|cite|improve this question








      edited Jul 29 at 16:29









      Michael Hardy

      204k23185461




      204k23185461









      asked Jul 29 at 14:42









      sotetsuk

      262




      262




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.



          In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-infty$ (in an appropriate compactification of $mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.



          Note that in order for this to be a probability distribution, you would want that the following formula holds:
          $$
          forall y in Y: p(y) = sum_i=1^N p(x_i) p(ymid x_i) Rightarrow 1 = sum_y in Y sum_i=1^N p(x_i) p(ymid x_i) = sum_i=1^N sum_y in Y p(x_i) p(ymid x_i)
          $$
          So you would need that for all $i$, $p(ymid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)



          Your approximation is again indexed wrongly: It should read
          $$
          p(y_m, x_k) = fracexpf(y_m^(j)mid x_k)sum_i=1^N expf(y_i^(j)mid x_k).
          $$
          From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^(j)$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).



          Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.



          Your second expression is completely off: It should be the average of the expression above over $(j)$.






          share|cite|improve this answer























          • If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
            – Michael Hardy
            Jul 29 at 16:39











          • I worked with what was given to me. In principle, I agree with you.
            – AlgebraicsAnonymous
            Jul 29 at 16:44










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2866137%2fmonte-carlo-approximation-with-small-samples%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.



          In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-infty$ (in an appropriate compactification of $mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.



          Note that in order for this to be a probability distribution, you would want that the following formula holds:
          $$
          forall y in Y: p(y) = sum_i=1^N p(x_i) p(ymid x_i) Rightarrow 1 = sum_y in Y sum_i=1^N p(x_i) p(ymid x_i) = sum_i=1^N sum_y in Y p(x_i) p(ymid x_i)
          $$
          So you would need that for all $i$, $p(ymid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)



          Your approximation is again indexed wrongly: It should read
          $$
          p(y_m, x_k) = fracexpf(y_m^(j)mid x_k)sum_i=1^N expf(y_i^(j)mid x_k).
          $$
          From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^(j)$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).



          Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.



          Your second expression is completely off: It should be the average of the expression above over $(j)$.






          share|cite|improve this answer























          • If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
            – Michael Hardy
            Jul 29 at 16:39











          • I worked with what was given to me. In principle, I agree with you.
            – AlgebraicsAnonymous
            Jul 29 at 16:44














          up vote
          1
          down vote













          Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.



          In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-infty$ (in an appropriate compactification of $mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.



          Note that in order for this to be a probability distribution, you would want that the following formula holds:
          $$
          forall y in Y: p(y) = sum_i=1^N p(x_i) p(ymid x_i) Rightarrow 1 = sum_y in Y sum_i=1^N p(x_i) p(ymid x_i) = sum_i=1^N sum_y in Y p(x_i) p(ymid x_i)
          $$
          So you would need that for all $i$, $p(ymid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)



          Your approximation is again indexed wrongly: It should read
          $$
          p(y_m, x_k) = fracexpf(y_m^(j)mid x_k)sum_i=1^N expf(y_i^(j)mid x_k).
          $$
          From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^(j)$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).



          Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.



          Your second expression is completely off: It should be the average of the expression above over $(j)$.






          share|cite|improve this answer























          • If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
            – Michael Hardy
            Jul 29 at 16:39











          • I worked with what was given to me. In principle, I agree with you.
            – AlgebraicsAnonymous
            Jul 29 at 16:44












          up vote
          1
          down vote










          up vote
          1
          down vote









          Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.



          In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-infty$ (in an appropriate compactification of $mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.



          Note that in order for this to be a probability distribution, you would want that the following formula holds:
          $$
          forall y in Y: p(y) = sum_i=1^N p(x_i) p(ymid x_i) Rightarrow 1 = sum_y in Y sum_i=1^N p(x_i) p(ymid x_i) = sum_i=1^N sum_y in Y p(x_i) p(ymid x_i)
          $$
          So you would need that for all $i$, $p(ymid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)



          Your approximation is again indexed wrongly: It should read
          $$
          p(y_m, x_k) = fracexpf(y_m^(j)mid x_k)sum_i=1^N expf(y_i^(j)mid x_k).
          $$
          From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^(j)$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).



          Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.



          Your second expression is completely off: It should be the average of the expression above over $(j)$.






          share|cite|improve this answer















          Let me first say: Your notation is what's called overloaded; e.g. in your first equation, $y$ is both an argument and a summation index.



          In principle, it is possible to sum over a possibly uncountable space; just integrate your function over the counting measure. The problem still remains in your case though, because only countably many of the summands may be nonzero for the integral to exist. Still, $f$ could be $-infty$ (in an appropriate compactification of $mathbb R$) everywhere except at countably many points, and in this case your first approach would actually work.



          Note that in order for this to be a probability distribution, you would want that the following formula holds:
          $$
          forall y in Y: p(y) = sum_i=1^N p(x_i) p(ymid x_i) Rightarrow 1 = sum_y in Y sum_i=1^N p(x_i) p(ymid x_i) = sum_i=1^N sum_y in Y p(x_i) p(ymid x_i)
          $$
          So you would need that for all $i$, $p(ymid x_i)$ is nonzero at most at countably many places, but still for some $i$ nonzero at some places, and this wouldn't work out whenever the sum was infinite at all $i$. So for some $i$, $f$ would have to have the property indicated above. (At all other $i$, the sum could be infinite though, and then $p$ would be zero there.)



          Your approximation is again indexed wrongly: It should read
          $$
          p(y_m, x_k) = fracexpf(y_m^(j)mid x_k)sum_i=1^N expf(y_i^(j)mid x_k).
          $$
          From what I've understood, $x$ and $y$ are not independent; that is, $y$ depends on $x$. Thus, whether or not the sample $y^(j)$ will be good depends on the variance of $y$ with respect to $x$ (since I don't know in which space $y$ lies, let's say that the variance is some abstract measure of how much $y$ usually deviates from its "standard" value).



          Moreover, you would have to hope that all the other (possibly infinitely many) summands over $y$ do not amount to much. Also, it does not necessarily sum to $1$ when you sum over all $y$.



          Your second expression is completely off: It should be the average of the expression above over $(j)$.







          share|cite|improve this answer















          share|cite|improve this answer



          share|cite|improve this answer








          edited Jul 29 at 16:31









          Michael Hardy

          204k23185461




          204k23185461











          answered Jul 29 at 16:25









          AlgebraicsAnonymous

          66611




          66611











          • If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
            – Michael Hardy
            Jul 29 at 16:39











          • I worked with what was given to me. In principle, I agree with you.
            – AlgebraicsAnonymous
            Jul 29 at 16:44
















          • If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
            – Michael Hardy
            Jul 29 at 16:39











          • I worked with what was given to me. In principle, I agree with you.
            – AlgebraicsAnonymous
            Jul 29 at 16:44















          If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
          – Michael Hardy
          Jul 29 at 16:39





          If you're criticizing someone's notation, put your own notation in order. The practice of using the same latter, in this case $p,$ for different functions, and the practice of using $x$ and $y$ both to refer to random variable and as the argument to their densities is pernicious. When one writes $Pr(X = x)$ one can understand it because $X$ and $x$ are two different things. If $X,Y$ are random variables then $p_X(x) = Pr(X=x)$ and $p_X(5) = Pr(X=5)$ and $p_Y(5) = Pr(Y=5). qquad $
          – Michael Hardy
          Jul 29 at 16:39













          I worked with what was given to me. In principle, I agree with you.
          – AlgebraicsAnonymous
          Jul 29 at 16:44




          I worked with what was given to me. In principle, I agree with you.
          – AlgebraicsAnonymous
          Jul 29 at 16:44












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2866137%2fmonte-carlo-approximation-with-small-samples%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What is the equation of a 3D cone with generalised tilt?

          Color the edges and diagonals of a regular polygon

          Relationship between determinant of matrix and determinant of adjoint?