Conditional expectation from joint distribution
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.
Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:
$$
textbfx = Lambda textbfz + textbfu
$$
Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:
$$
beginalign
textbfx
&= Lambda textbfz + textbfu
\
&= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
\
&= mathcalN(0, Lambda Lambda^top + Psi)
endalign
$$
Now, we have the joint distribution:
$$
Pbigg(
beginbmatrix
textbfx \ textbfz
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix 0 \ 0 endbmatrix
,
beginbmatrix
Lambda Lambda^top + Psi & Lambda
\
Lambda^top & I endbmatrix
bigg)
$$
I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:
$$
beginalign
textCov(textbfx, textbfz)
&= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
\
&= mathbbE[(textbfx - 0)(textbfz - 0)^top]
\
&= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
\
&= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
\
&= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
\
&= Lambda
endalign
$$
Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:
$$
beginalign
textVar(textbfz)
&= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
\
I_k &= mathbbE[textbfztextbfz^top] + 0
endalign
$$
So far so good.
Question
The authors then claim that the conditional expectation of the first and second moments of the factors are:
$$
beginalign
mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
\
\
mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
endalign
$$
The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.
probability conditional-expectation
add a comment |Â
up vote
3
down vote
favorite
I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.
Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:
$$
textbfx = Lambda textbfz + textbfu
$$
Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:
$$
beginalign
textbfx
&= Lambda textbfz + textbfu
\
&= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
\
&= mathcalN(0, Lambda Lambda^top + Psi)
endalign
$$
Now, we have the joint distribution:
$$
Pbigg(
beginbmatrix
textbfx \ textbfz
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix 0 \ 0 endbmatrix
,
beginbmatrix
Lambda Lambda^top + Psi & Lambda
\
Lambda^top & I endbmatrix
bigg)
$$
I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:
$$
beginalign
textCov(textbfx, textbfz)
&= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
\
&= mathbbE[(textbfx - 0)(textbfz - 0)^top]
\
&= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
\
&= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
\
&= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
\
&= Lambda
endalign
$$
Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:
$$
beginalign
textVar(textbfz)
&= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
\
I_k &= mathbbE[textbfztextbfz^top] + 0
endalign
$$
So far so good.
Question
The authors then claim that the conditional expectation of the first and second moments of the factors are:
$$
beginalign
mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
\
\
mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
endalign
$$
The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.
probability conditional-expectation
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.
Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:
$$
textbfx = Lambda textbfz + textbfu
$$
Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:
$$
beginalign
textbfx
&= Lambda textbfz + textbfu
\
&= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
\
&= mathcalN(0, Lambda Lambda^top + Psi)
endalign
$$
Now, we have the joint distribution:
$$
Pbigg(
beginbmatrix
textbfx \ textbfz
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix 0 \ 0 endbmatrix
,
beginbmatrix
Lambda Lambda^top + Psi & Lambda
\
Lambda^top & I endbmatrix
bigg)
$$
I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:
$$
beginalign
textCov(textbfx, textbfz)
&= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
\
&= mathbbE[(textbfx - 0)(textbfz - 0)^top]
\
&= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
\
&= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
\
&= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
\
&= Lambda
endalign
$$
Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:
$$
beginalign
textVar(textbfz)
&= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
\
I_k &= mathbbE[textbfztextbfz^top] + 0
endalign
$$
So far so good.
Question
The authors then claim that the conditional expectation of the first and second moments of the factors are:
$$
beginalign
mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
\
\
mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
endalign
$$
The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.
probability conditional-expectation
I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.
Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:
$$
textbfx = Lambda textbfz + textbfu
$$
Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:
$$
beginalign
textbfx
&= Lambda textbfz + textbfu
\
&= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
\
&= mathcalN(0, Lambda Lambda^top + Psi)
endalign
$$
Now, we have the joint distribution:
$$
Pbigg(
beginbmatrix
textbfx \ textbfz
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix 0 \ 0 endbmatrix
,
beginbmatrix
Lambda Lambda^top + Psi & Lambda
\
Lambda^top & I endbmatrix
bigg)
$$
I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:
$$
beginalign
textCov(textbfx, textbfz)
&= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
\
&= mathbbE[(textbfx - 0)(textbfz - 0)^top]
\
&= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
\
&= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
\
&= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
\
&= Lambda
endalign
$$
Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:
$$
beginalign
textVar(textbfz)
&= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
\
I_k &= mathbbE[textbfztextbfz^top] + 0
endalign
$$
So far so good.
Question
The authors then claim that the conditional expectation of the first and second moments of the factors are:
$$
beginalign
mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
\
\
mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
endalign
$$
The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.
probability conditional-expectation
edited yesterday
pointguard0
514215
514215
asked yesterday
gwg
8411920
8411920
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
$$
Pbigg(
beginbmatrix
x_1 \ x_2
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix mu_1 \ mu_2 endbmatrix
,
beginbmatrix
Sigma_11 & Sigma_12
\
Sigma_21 & Sigma_22 endbmatrix
bigg).
$$
Then the conditional distribution of $x_1$ given $x_2$ is
$$
P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
$$
Therefore we have the conditional expectation as you give, because in this case,
$$
Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
$$
For the second quantity, just use the fact that
$$
mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
$$
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
$$
Pbigg(
beginbmatrix
x_1 \ x_2
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix mu_1 \ mu_2 endbmatrix
,
beginbmatrix
Sigma_11 & Sigma_12
\
Sigma_21 & Sigma_22 endbmatrix
bigg).
$$
Then the conditional distribution of $x_1$ given $x_2$ is
$$
P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
$$
Therefore we have the conditional expectation as you give, because in this case,
$$
Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
$$
For the second quantity, just use the fact that
$$
mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
$$
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
add a comment |Â
up vote
2
down vote
Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
$$
Pbigg(
beginbmatrix
x_1 \ x_2
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix mu_1 \ mu_2 endbmatrix
,
beginbmatrix
Sigma_11 & Sigma_12
\
Sigma_21 & Sigma_22 endbmatrix
bigg).
$$
Then the conditional distribution of $x_1$ given $x_2$ is
$$
P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
$$
Therefore we have the conditional expectation as you give, because in this case,
$$
Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
$$
For the second quantity, just use the fact that
$$
mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
$$
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
$$
Pbigg(
beginbmatrix
x_1 \ x_2
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix mu_1 \ mu_2 endbmatrix
,
beginbmatrix
Sigma_11 & Sigma_12
\
Sigma_21 & Sigma_22 endbmatrix
bigg).
$$
Then the conditional distribution of $x_1$ given $x_2$ is
$$
P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
$$
Therefore we have the conditional expectation as you give, because in this case,
$$
Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
$$
For the second quantity, just use the fact that
$$
mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
$$
Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
$$
Pbigg(
beginbmatrix
x_1 \ x_2
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix mu_1 \ mu_2 endbmatrix
,
beginbmatrix
Sigma_11 & Sigma_12
\
Sigma_21 & Sigma_22 endbmatrix
bigg).
$$
Then the conditional distribution of $x_1$ given $x_2$ is
$$
P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
$$
Therefore we have the conditional expectation as you give, because in this case,
$$
Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
$$
For the second quantity, just use the fact that
$$
mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
$$
edited 21 hours ago
answered yesterday
Wanshan
934113
934113
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
add a comment |Â
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Is the conditional expectation of a conditional probability density just the mean of that density?
– gwg
23 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
Yes, you are right ^_^
– Wanshan
21 hours ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872334%2fconditional-expectation-from-joint-distribution%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password