How the Bayes rule for density functions is formulated in probability theory?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
2












Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.



Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.




I can find related definition for "conditional density" in the following way. There could be other definitions.




We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability



$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$



We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$




I list all relations I can conceive, based on above definition,




$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$



$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$








share|cite|improve this question





















  • Out of curiosity, where is this definition of conditional probability density function quoted from?
    – littleO
    Jul 23 at 2:46






  • 1




    @littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
    – Tony
    Jul 23 at 4:29











  • I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
    – Tony
    Jul 23 at 4:30














up vote
3
down vote

favorite
2












Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.



Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.




I can find related definition for "conditional density" in the following way. There could be other definitions.




We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability



$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$



We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$




I list all relations I can conceive, based on above definition,




$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$



$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$








share|cite|improve this question





















  • Out of curiosity, where is this definition of conditional probability density function quoted from?
    – littleO
    Jul 23 at 2:46






  • 1




    @littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
    – Tony
    Jul 23 at 4:29











  • I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
    – Tony
    Jul 23 at 4:30












up vote
3
down vote

favorite
2









up vote
3
down vote

favorite
2






2





Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.



Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.




I can find related definition for "conditional density" in the following way. There could be other definitions.




We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability



$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$



We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$




I list all relations I can conceive, based on above definition,




$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$



$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$








share|cite|improve this question













Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.



Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.




I can find related definition for "conditional density" in the following way. There could be other definitions.




We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability



$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$



We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$




I list all relations I can conceive, based on above definition,




$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$



$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$



$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$










share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 23 at 4:28
























asked Jul 23 at 0:49









Tony

2,1241625




2,1241625











  • Out of curiosity, where is this definition of conditional probability density function quoted from?
    – littleO
    Jul 23 at 2:46






  • 1




    @littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
    – Tony
    Jul 23 at 4:29











  • I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
    – Tony
    Jul 23 at 4:30
















  • Out of curiosity, where is this definition of conditional probability density function quoted from?
    – littleO
    Jul 23 at 2:46






  • 1




    @littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
    – Tony
    Jul 23 at 4:29











  • I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
    – Tony
    Jul 23 at 4:30















Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46




Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46




1




1




@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29





@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29













I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30




I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










Here an outline about how you can get to the result you're looking for.



  1. Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
    as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
    via Radon-Nikodym theorem.

  2. If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$

  3. If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.

  4. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.

  5. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$

  6. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
    $$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
    In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
    and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
    and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
    for $mathbbP_Y$-a.e. $yinmathbbR$.





share|cite|improve this answer























  • Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
    – Tony
    Jul 25 at 4:26











  • About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
    – Bob
    Jul 25 at 4:56










  • About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
    – Bob
    Jul 25 at 4:57


















up vote
2
down vote













Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.



Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)



Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.



Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$



Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.



Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.



qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932





share|cite|improve this answer























    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859905%2fhow-the-bayes-rule-for-density-functions-is-formulated-in-probability-theory%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Here an outline about how you can get to the result you're looking for.



    1. Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
      as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
      via Radon-Nikodym theorem.

    2. If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$

    3. If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.

    4. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.

    5. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$

    6. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
      $$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
      In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
      and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
      and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
      for $mathbbP_Y$-a.e. $yinmathbbR$.





    share|cite|improve this answer























    • Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
      – Tony
      Jul 25 at 4:26











    • About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
      – Bob
      Jul 25 at 4:56










    • About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
      – Bob
      Jul 25 at 4:57















    up vote
    2
    down vote



    accepted










    Here an outline about how you can get to the result you're looking for.



    1. Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
      as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
      via Radon-Nikodym theorem.

    2. If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$

    3. If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.

    4. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.

    5. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$

    6. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
      $$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
      In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
      and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
      and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
      for $mathbbP_Y$-a.e. $yinmathbbR$.





    share|cite|improve this answer























    • Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
      – Tony
      Jul 25 at 4:26











    • About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
      – Bob
      Jul 25 at 4:56










    • About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
      – Bob
      Jul 25 at 4:57













    up vote
    2
    down vote



    accepted







    up vote
    2
    down vote



    accepted






    Here an outline about how you can get to the result you're looking for.



    1. Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
      as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
      via Radon-Nikodym theorem.

    2. If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$

    3. If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.

    4. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.

    5. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$

    6. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
      $$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
      In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
      and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
      and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
      for $mathbbP_Y$-a.e. $yinmathbbR$.





    share|cite|improve this answer















    Here an outline about how you can get to the result you're looking for.



    1. Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
      as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
      via Radon-Nikodym theorem.

    2. If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$

    3. If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.

    4. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.

    5. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$

    6. If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
      $$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
      In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
      and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
      and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
      for $mathbbP_Y$-a.e. $yinmathbbR$.






    share|cite|improve this answer















    share|cite|improve this answer



    share|cite|improve this answer








    edited Jul 23 at 5:29


























    answered Jul 23 at 5:23









    Bob

    1,517522




    1,517522











    • Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
      – Tony
      Jul 25 at 4:26











    • About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
      – Bob
      Jul 25 at 4:56










    • About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
      – Bob
      Jul 25 at 4:57

















    • Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
      – Tony
      Jul 25 at 4:26











    • About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
      – Bob
      Jul 25 at 4:56










    • About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
      – Bob
      Jul 25 at 4:57
















    Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
    – Tony
    Jul 25 at 4:26





    Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
    – Tony
    Jul 25 at 4:26













    About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
    – Bob
    Jul 25 at 4:56




    About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
    – Bob
    Jul 25 at 4:56












    About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
    – Bob
    Jul 25 at 4:57





    About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
    – Bob
    Jul 25 at 4:57











    up vote
    2
    down vote













    Here is a very elementary practical problems about election polling that illustrates how to get a
    posterior probability interval (credible interval) from a prior and data.



    Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)



    Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.



    Bayes' Theorem gives Posterior.
    $$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
    theta^x(1-theta)^n-x \
    = theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
    = theta^alpha_n - 1(1-theta)^beta_n -1.$$
    Notice that constants of integration are omitted, hence the use of
    $propto$ ('proportional to') instead of $=.$



    Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.



    Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
    $(0.570, 0.618),$ using R statistical software.



    qbeta(c(.025, .975), 950, 650)
    [1] 0.5695848 0.6176932





    share|cite|improve this answer



























      up vote
      2
      down vote













      Here is a very elementary practical problems about election polling that illustrates how to get a
      posterior probability interval (credible interval) from a prior and data.



      Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)



      Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.



      Bayes' Theorem gives Posterior.
      $$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
      theta^x(1-theta)^n-x \
      = theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
      = theta^alpha_n - 1(1-theta)^beta_n -1.$$
      Notice that constants of integration are omitted, hence the use of
      $propto$ ('proportional to') instead of $=.$



      Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.



      Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
      $(0.570, 0.618),$ using R statistical software.



      qbeta(c(.025, .975), 950, 650)
      [1] 0.5695848 0.6176932





      share|cite|improve this answer

























        up vote
        2
        down vote










        up vote
        2
        down vote









        Here is a very elementary practical problems about election polling that illustrates how to get a
        posterior probability interval (credible interval) from a prior and data.



        Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)



        Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.



        Bayes' Theorem gives Posterior.
        $$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
        theta^x(1-theta)^n-x \
        = theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
        = theta^alpha_n - 1(1-theta)^beta_n -1.$$
        Notice that constants of integration are omitted, hence the use of
        $propto$ ('proportional to') instead of $=.$



        Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.



        Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
        $(0.570, 0.618),$ using R statistical software.



        qbeta(c(.025, .975), 950, 650)
        [1] 0.5695848 0.6176932





        share|cite|improve this answer















        Here is a very elementary practical problems about election polling that illustrates how to get a
        posterior probability interval (credible interval) from a prior and data.



        Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)



        Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.



        Bayes' Theorem gives Posterior.
        $$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
        theta^x(1-theta)^n-x \
        = theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
        = theta^alpha_n - 1(1-theta)^beta_n -1.$$
        Notice that constants of integration are omitted, hence the use of
        $propto$ ('proportional to') instead of $=.$



        Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.



        Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
        $(0.570, 0.618),$ using R statistical software.



        qbeta(c(.025, .975), 950, 650)
        [1] 0.5695848 0.6176932






        share|cite|improve this answer















        share|cite|improve this answer



        share|cite|improve this answer








        edited Jul 23 at 2:44


























        answered Jul 23 at 2:38









        BruceET

        33.2k61440




        33.2k61440






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859905%2fhow-the-bayes-rule-for-density-functions-is-formulated-in-probability-theory%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?