How the Bayes rule for density functions is formulated in probability theory?
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.
Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.
I can find related definition for "conditional density" in the following way. There could be other definitions.
We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$
We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$
I list all relations I can conceive, based on above definition,
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$
$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
real-analysis probability probability-theory measure-theory
add a comment |Â
up vote
3
down vote
favorite
Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.
Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.
I can find related definition for "conditional density" in the following way. There could be other definitions.
We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$
We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$
I list all relations I can conceive, based on above definition,
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$
$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
real-analysis probability probability-theory measure-theory
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
1
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.
Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.
I can find related definition for "conditional density" in the following way. There could be other definitions.
We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$
We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$
I list all relations I can conceive, based on above definition,
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$
$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
real-analysis probability probability-theory measure-theory
Given a probability space
$left( Omegamathcal,F,mathbbP right)$, and two
$mathcalF$-measurable real-valued random variables $X,Y$, then the
joint random variable $left( X,Y right)$ can be defined on a product space
$left( Omega^2,sigmaleft( mathcalF^2 right),mathbbP times P right)$
where $mathbbP times P$ is the product measure of $mathbbP$. Let
$fleft( x,y right),f_Xleft( x,y right),f_Yleft( y right)$ be
the density functions (Randon-Nikodym derivatives) of
$left( X,Y right),X,Y$ respectively, and let
$f_Yleft( x,y right)$ be the density function of $X$ conditioned on $Y$.
Anyone can help with a construction, or proof or related materials about the Bayes rule
$f_Yleft( x|y right) = fracfleft( x,y right)f_Yleft( y right)$?
We may also instead consider the other version $f_Yleft( x|y right) = fracf_Xleft( yf_Yleft( y right)$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.
I can find related definition for "conditional density" in the following way. There could be other definitions.
We denote the integration w.r.t. the measure
$mathbbP circX^- 1$ of a RV as
$int_B^dX := int_B^dleft( mathbbP circX^- 1 right)$
for simplicity. Define the conditional probability measures
$mathbbP_y,y in Yleft( Omega right)$ as a family of probability
measures on $left( Omegamathcal,F right)$ s.t. two axioms hold: 1)
$mathbbP_yleft( A right)$ is
$left( mathbbR,mathcalBleft( mathbbR right) right)$-measurable
for any $A in mathcalF$ (given a fixed
$A in mathcalF$,$ mathbbP_yleft( A right)$ is a
$mathbbR rightarrowleftlbrack 0,1 rightrbrack$ function w.r.t.
index $y$); and 2) the general version of law of total
probability
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall A in mathcalF, B in mathcalBleft ( mathbb R right)$$
We then denote
$mathbbPleft( A|Y = y right) = mathbbP_yleft( A right),forall A in mathcalF$
as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function
$f_Yleft( x|y right)$ is the Radon-Nikodym derivative of
distribution $mathbbP_y circ X^- 1$
I list all relations I can conceive, based on above definition,
$$int_B^mathbbP_yleft( A right)dYmathbb= Pleft( Abigcap Y^- 1left( B right) right),forall Amathcalin F,B in mathcalBleft( mathbbR right)$$
$$int_B^mathbbP_xleft( A right)dYmathbb= Pleft( Abigcap X^- 1left( B right) right),forall Amathcalin F,X in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( x = mathbbP_yleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^x right) = mathbbP_xleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Yleft( y right) = mathbbPleft Y^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
$$int_B^f_Xleft( x right) = mathbbPleft X^- 1left( B right) right,forall B in mathcalBleft( mathbbR right)$$
real-analysis probability probability-theory measure-theory
edited Jul 23 at 4:28
asked Jul 23 at 0:49
Tony
2,1241625
2,1241625
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
1
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30
add a comment |Â
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
1
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
1
1
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
Here an outline about how you can get to the result you're looking for.
- Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
via Radon-Nikodym theorem. - If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$
- If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
$$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
for $mathbbP_Y$-a.e. $yinmathbbR$.
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
add a comment |Â
up vote
2
down vote
Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.
Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)
Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.
Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$
Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.
Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Here an outline about how you can get to the result you're looking for.
- Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
via Radon-Nikodym theorem. - If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$
- If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
$$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
for $mathbbP_Y$-a.e. $yinmathbbR$.
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
add a comment |Â
up vote
2
down vote
accepted
Here an outline about how you can get to the result you're looking for.
- Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
via Radon-Nikodym theorem. - If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$
- If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
$$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
for $mathbbP_Y$-a.e. $yinmathbbR$.
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Here an outline about how you can get to the result you're looking for.
- Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
via Radon-Nikodym theorem. - If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$
- If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
$$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
for $mathbbP_Y$-a.e. $yinmathbbR$.
Here an outline about how you can get to the result you're looking for.
- Given a sub-$sigma$-algebra $mathcalG$ of $mathcalF$ and given $FinmathcalF$, define $$mathbbP(F|mathcalG) : left(Omega,mathcalGright) rightarrow left(mathbbR,mathcalB_mathbbRright)$$
as the (essentially) unique $left(Omega,mathcalGright) - left(mathbbR,mathcalB_mathbbRright)$ -measurable function such that $$forall Gin mathcalG, mathbbP(Fcap G)=int_G mathbbP(F|mathcalG)operatornamedmathbbP,$$
via Radon-Nikodym theorem. - If $mathcalG$ is a sub-$sigma$-algebra of $mathcalF$ and $X : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $$mathbbP_mathcalG(A):=mathbbP(Xin A | mathcalG).$$
- If $Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $FinmathcalF$, the map $mathbbP(F|sigma(Y))$ is $left(Omega,sigma(Y)right)-left(mathbbR,mathcalB_mathbbRright)$ - measurable, and so there exists $varphi :(mathbbR,mathcalB_mathbbR)rightarrow (mathbbR,mathcalB_mathbbR)$ such that $varphi circ Y = mathbbP(F|sigma(Y))$. Notice that if $psi$ is another map that does the same work, then $$varphi=psi$$ $mathbbP_Y$-a.e.. So, define $mathbbP(F|Y):=varphi$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $AinmathcalB_mathbbR$, define $mathbbP_Y(A):=mathbbP(Xin A|Y)$. Then we have $$mathbbP_Y(A)circ Y=mathbbP_X(A).$$ If $yinmathbbR$, let's denote $mathbbP_Y(A)(y)$ with the less clumsy notation $mathbbP_Y=y(A)$.
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $A,BinmathcalB_mathbbR$, then: $$mathbbP(Xin Acap Yin B)= int_Y^-1(B)mathbbPleft(Xin A|sigma(Y)right) operatornamedmathbbP = int_Y^-1(B)mathbbP_sigma(Y)(A) operatornamedmathbbP \ = int_Y^-1(B)mathbbP_Y(A)circ Y operatornamedmathbbP = int_BmathbbP_Y(A) operatornamedmathbbP_Y = int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y).$$
- If $X,Y : left(Omega,mathcalFright) rightarrow left(mathbbR,mathcalB_mathbbRright)$ and $mathbbP_(X,Y)$ has a density w.r.t. the Lebesgue measure on $mathbbR^2$, say $f_(X,Y)$, then also $Y$ has one with respect to Lebesgue measure on $mathbbR$, say $f_Y$ and:
$$forall AinmathcalB_mathbbR, textfor mathbbP_Y-a.e. yinmathbbR, mathbbP_Y=y(A)=int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx.$$
In order to prove that, fix $AinmathcalB_mathbbR$, and notice that $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right)operatornamedmathbbP_Y(y) = int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx right) f_Y(y) operatornamedy \ = int_B left( int_A f_(X,Y)(x,y)operatorname dx right) operatornamedy = int_Atimes B f_(X,Y)(x,y)operatornamedxoperatornamedy = mathbbP_(X,Y)(Atimes B) = mathbbP(Xin Acap Yin B)= int_BmathbbP_Y=y(A) operatornamedmathbbP_Y(y)$$
and so $$forall BinmathcalB_mathbbR, int_B left( int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A) right)operatornamedmathbbP_Y(y)=0,$$
and then $$int_A fracf_(X,Y)(x,y)f_Y(y)operatorname dx-mathbbP_Y=y(A)=0$$
for $mathbbP_Y$-a.e. $yinmathbbR$.
edited Jul 23 at 5:29
answered Jul 23 at 5:23


Bob
1,517522
1,517522
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
add a comment |Â
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
Thanks a lot! Could I ask why the existence and a.s. uniqueness of the third step holds?
– Tony
Jul 25 at 4:26
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the existence, you can find a proof in the book Probability with martingales by David Williams (the only lemma in the appendix to chapter 3). It is a beautiful result, because basically states that it is true what intuitively you are expecting to happens: if a random variable is known if you know all the information in $Y$, then this random variable is a function of $Y$. It is a factorization theorem, however, I think about such a result as "Radon-Nikodym theorem for information valued measures".
– Bob
Jul 25 at 4:56
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
About the uniqueness, suppose $varphi, psi$ satisfy that relation. Then $varphicirc Y$ and $psicirc Y$ are both a version of $mathbbP(F|sigma(Y))$, so they differ at most on a set of $mathbbP$ measure zero. Suppose to get a contradiction that $varphi$ and $psi$ differ on a set of non-null $mathbbP_Y$ measure, say $A$. Then $0neqmathbbP_Y(A)= mathbbP(Y^-1(A))$ and $forallomegain Y^-1(A), varphicirc Y (omega) = varphi( Y (omega))neqpsi( Y (omega)) = psicirc Y (omega)$, and so they differ on a set of positive $mathbbP$ measure, absurd.
– Bob
Jul 25 at 4:57
add a comment |Â
up vote
2
down vote
Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.
Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)
Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.
Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$
Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.
Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
add a comment |Â
up vote
2
down vote
Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.
Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)
Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.
Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$
Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.
Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.
Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)
Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.
Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$
Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.
Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
Here is a very elementary practical problems about election polling that illustrates how to get a
posterior probability interval (credible interval) from a prior and data.
Prior. Expert's prior on proportion $theta$ in favor of Candidate A is $mathsfBeta(alpha_0=330, beta_0=270),$ which has $P(0.51 < theta < 0.59) approx 0.95.$ Mean, median, mode 0.55. (Expert thinks candidate will win, but not by much.)
Data. $x = 620$ of $n = 1000$ prospective voters polled favor the candidate.
Bayes' Theorem gives Posterior.
$$ p(theta | x) propto p(theta) times p(x|theta) propto theta^alpha_0-1(1-theta)^beta_0 - 1 times
theta^x(1-theta)^n-x \
= theta^alpha_0 + x -1(1-theta)^beta_0 + n - x -1
= theta^alpha_n - 1(1-theta)^beta_n -1.$$
Notice that constants of integration are omitted, hence the use of
$propto$ ('proportional to') instead of $=.$
Because prior and likelihood are 'conjugate' (mathematically compatible) we can notice that the posterior has the kernel of $mathsfBeta(alpha_n=alpha_0 + x, beta_n = beta_0 + n - x),$ so we can identify the exact posterior distribution $mathsfBetaalpha_n = 950, beta_n = 650$ without having to evaluate the integral in the denominator of the right-hand side of Bayes' Theorem.
Posterior probability interval. One way to get a 95% credible interval is to take quantiles 0.025 and 0.975 of $mathsfBeta(950, 650)$ to obtain
$(0.570, 0.618),$ using R statistical software.
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
edited Jul 23 at 2:44
answered Jul 23 at 2:38
BruceET
33.2k61440
33.2k61440
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859905%2fhow-the-bayes-rule-for-density-functions-is-formulated-in-probability-theory%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Out of curiosity, where is this definition of conditional probability density function quoted from?
– littleO
Jul 23 at 2:46
1
@littleO It is from lecture notes. I am not aware of its original source. This answer seems to use the same definition. math.stackexchange.com/questions/496608/…
– Tony
Jul 23 at 4:29
I am very glad to know if you have any other definition for the conditional density function? Maybe some others are more friendly with the Bayes rule.
– Tony
Jul 23 at 4:30