Conditional expectation from joint distribution

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.



Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:



$$
textbfx = Lambda textbfz + textbfu
$$



Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:



$$
beginalign
textbfx
&= Lambda textbfz + textbfu
\
&= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
\
&= mathcalN(0, Lambda Lambda^top + Psi)
endalign
$$



Now, we have the joint distribution:



$$
Pbigg(
beginbmatrix
textbfx \ textbfz
endbmatrix
bigg)
=
mathcalNbigg(
beginbmatrix 0 \ 0 endbmatrix
,
beginbmatrix
Lambda Lambda^top + Psi & Lambda
\
Lambda^top & I endbmatrix
bigg)
$$



I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:



$$
beginalign
textCov(textbfx, textbfz)
&= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
\
&= mathbbE[(textbfx - 0)(textbfz - 0)^top]
\
&= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
\
&= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
\
&= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
\
&= Lambda
endalign
$$



Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:



$$
beginalign
textVar(textbfz)
&= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
\
I_k &= mathbbE[textbfztextbfz^top] + 0
endalign
$$



So far so good.



Question



The authors then claim that the conditional expectation of the first and second moments of the factors are:



$$
beginalign
mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
\
\
mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
endalign
$$



The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.







share|cite|improve this question

























    up vote
    3
    down vote

    favorite












    I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.



    Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:



    $$
    textbfx = Lambda textbfz + textbfu
    $$



    Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:



    $$
    beginalign
    textbfx
    &= Lambda textbfz + textbfu
    \
    &= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
    \
    &= mathcalN(0, Lambda Lambda^top + Psi)
    endalign
    $$



    Now, we have the joint distribution:



    $$
    Pbigg(
    beginbmatrix
    textbfx \ textbfz
    endbmatrix
    bigg)
    =
    mathcalNbigg(
    beginbmatrix 0 \ 0 endbmatrix
    ,
    beginbmatrix
    Lambda Lambda^top + Psi & Lambda
    \
    Lambda^top & I endbmatrix
    bigg)
    $$



    I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:



    $$
    beginalign
    textCov(textbfx, textbfz)
    &= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
    \
    &= mathbbE[(textbfx - 0)(textbfz - 0)^top]
    \
    &= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
    \
    &= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
    \
    &= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
    \
    &= Lambda
    endalign
    $$



    Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:



    $$
    beginalign
    textVar(textbfz)
    &= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
    \
    I_k &= mathbbE[textbfztextbfz^top] + 0
    endalign
    $$



    So far so good.



    Question



    The authors then claim that the conditional expectation of the first and second moments of the factors are:



    $$
    beginalign
    mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
    \
    \
    mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
    endalign
    $$



    The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.







    share|cite|improve this question























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.



      Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:



      $$
      textbfx = Lambda textbfz + textbfu
      $$



      Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:



      $$
      beginalign
      textbfx
      &= Lambda textbfz + textbfu
      \
      &= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
      \
      &= mathcalN(0, Lambda Lambda^top + Psi)
      endalign
      $$



      Now, we have the joint distribution:



      $$
      Pbigg(
      beginbmatrix
      textbfx \ textbfz
      endbmatrix
      bigg)
      =
      mathcalNbigg(
      beginbmatrix 0 \ 0 endbmatrix
      ,
      beginbmatrix
      Lambda Lambda^top + Psi & Lambda
      \
      Lambda^top & I endbmatrix
      bigg)
      $$



      I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:



      $$
      beginalign
      textCov(textbfx, textbfz)
      &= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
      \
      &= mathbbE[(textbfx - 0)(textbfz - 0)^top]
      \
      &= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
      \
      &= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
      \
      &= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
      \
      &= Lambda
      endalign
      $$



      Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:



      $$
      beginalign
      textVar(textbfz)
      &= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
      \
      I_k &= mathbbE[textbfztextbfz^top] + 0
      endalign
      $$



      So far so good.



      Question



      The authors then claim that the conditional expectation of the first and second moments of the factors are:



      $$
      beginalign
      mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
      \
      \
      mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
      endalign
      $$



      The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.







      share|cite|improve this question













      I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.



      Given a $p$-dimensional vector $textbfx$ modeled using a $k$-dimensional factor $textbfz$ where typically $k < p$, the model for factor analysis is:



      $$
      textbfx = Lambda textbfz + textbfu
      $$



      Where $Lambda$ is a matrix, $textbfu sim mathcalN(0, Psi)$, and $textbfz sim mathcalN(0, I)$. This means $textbfx sim mathcalN(0, Lambda Lambda^top + Psi)$ because:



      $$
      beginalign
      textbfx
      &= Lambda textbfz + textbfu
      \
      &= Lambda mathcalN(0, I_k) + mathcalN(0, Psi)
      \
      &= mathcalN(0, Lambda Lambda^top + Psi)
      endalign
      $$



      Now, we have the joint distribution:



      $$
      Pbigg(
      beginbmatrix
      textbfx \ textbfz
      endbmatrix
      bigg)
      =
      mathcalNbigg(
      beginbmatrix 0 \ 0 endbmatrix
      ,
      beginbmatrix
      Lambda Lambda^top + Psi & Lambda
      \
      Lambda^top & I endbmatrix
      bigg)
      $$



      I can convince myself that this is correct and fairly easily. $textVar(textbfx)$ and $textVar(textbfz)$ come from their definitions, while $textCov(textbfx, textbfz)$ and $textCov(textbfz, textbfx)$ are easy enough to compute, e.g.:



      $$
      beginalign
      textCov(textbfx, textbfz)
      &= mathbbE[(textbfx - mathbbE[textbfx])(textbfz - mathbbE[textbfz])^top]
      \
      &= mathbbE[(textbfx - 0)(textbfz - 0)^top]
      \
      &= mathbbE[(Lambda textbfz + textbfu)(textbfz)^top]
      \
      &= mathbbE[Lambda textbfz textbfz^top + textbfu textbfz^top]
      \
      &= Lambda mathbbE[textbfz textbfz^top] + mathbbE[textbfu textbfz^top]
      \
      &= Lambda
      endalign
      $$



      Where $mathbbE[textbfu textbfz^top] = mathbbE[textbfu]mathbbE[textbfz^top] = 0 cdot 0$ and $mathbbE[textbfztextbfz^top] = I_k$ because:



      $$
      beginalign
      textVar(textbfz)
      &= mathbbE[textbfztextbfz^top] + mathbbE[textbfz] mathbbE[textbfz]^top
      \
      I_k &= mathbbE[textbfztextbfz^top] + 0
      endalign
      $$



      So far so good.



      Question



      The authors then claim that the conditional expectation of the first and second moments of the factors are:



      $$
      beginalign
      mathbbE[textbfz mid textbfx] &= Lambda^top (Psi + Lambda Lambda^top)^-1 textbfx
      \
      \
      mathbbE[textbfz textbfz^top mid textbfx] &= I_k - (Lambda^top Psi + Lambda Lambda^top)^-1 Lambda + Lambda^top (Psi + Lambda Lambda^top)^-1 Lambda^top textbfx textbfx^top ((Psi + Lambda Lambda^top)^-1)^top
      endalign
      $$



      The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.









      share|cite|improve this question












      share|cite|improve this question




      share|cite|improve this question








      edited yesterday









      pointguard0

      514215




      514215









      asked yesterday









      gwg

      8411920




      8411920




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote













          Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
          $$
          Pbigg(
          beginbmatrix
          x_1 \ x_2
          endbmatrix
          bigg)
          =
          mathcalNbigg(
          beginbmatrix mu_1 \ mu_2 endbmatrix
          ,
          beginbmatrix
          Sigma_11 & Sigma_12
          \
          Sigma_21 & Sigma_22 endbmatrix
          bigg).
          $$
          Then the conditional distribution of $x_1$ given $x_2$ is
          $$
          P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
          $$
          Therefore we have the conditional expectation as you give, because in this case,
          $$
          Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
          $$
          For the second quantity, just use the fact that
          $$
          mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
          $$






          share|cite|improve this answer























          • Is the conditional expectation of a conditional probability density just the mean of that density?
            – gwg
            23 hours ago










          • Yes, you are right ^_^
            – Wanshan
            21 hours ago










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872334%2fconditional-expectation-from-joint-distribution%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote













          Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
          $$
          Pbigg(
          beginbmatrix
          x_1 \ x_2
          endbmatrix
          bigg)
          =
          mathcalNbigg(
          beginbmatrix mu_1 \ mu_2 endbmatrix
          ,
          beginbmatrix
          Sigma_11 & Sigma_12
          \
          Sigma_21 & Sigma_22 endbmatrix
          bigg).
          $$
          Then the conditional distribution of $x_1$ given $x_2$ is
          $$
          P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
          $$
          Therefore we have the conditional expectation as you give, because in this case,
          $$
          Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
          $$
          For the second quantity, just use the fact that
          $$
          mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
          $$






          share|cite|improve this answer























          • Is the conditional expectation of a conditional probability density just the mean of that density?
            – gwg
            23 hours ago










          • Yes, you are right ^_^
            – Wanshan
            21 hours ago














          up vote
          2
          down vote













          Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
          $$
          Pbigg(
          beginbmatrix
          x_1 \ x_2
          endbmatrix
          bigg)
          =
          mathcalNbigg(
          beginbmatrix mu_1 \ mu_2 endbmatrix
          ,
          beginbmatrix
          Sigma_11 & Sigma_12
          \
          Sigma_21 & Sigma_22 endbmatrix
          bigg).
          $$
          Then the conditional distribution of $x_1$ given $x_2$ is
          $$
          P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
          $$
          Therefore we have the conditional expectation as you give, because in this case,
          $$
          Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
          $$
          For the second quantity, just use the fact that
          $$
          mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
          $$






          share|cite|improve this answer























          • Is the conditional expectation of a conditional probability density just the mean of that density?
            – gwg
            23 hours ago










          • Yes, you are right ^_^
            – Wanshan
            21 hours ago












          up vote
          2
          down vote










          up vote
          2
          down vote









          Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
          $$
          Pbigg(
          beginbmatrix
          x_1 \ x_2
          endbmatrix
          bigg)
          =
          mathcalNbigg(
          beginbmatrix mu_1 \ mu_2 endbmatrix
          ,
          beginbmatrix
          Sigma_11 & Sigma_12
          \
          Sigma_21 & Sigma_22 endbmatrix
          bigg).
          $$
          Then the conditional distribution of $x_1$ given $x_2$ is
          $$
          P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
          $$
          Therefore we have the conditional expectation as you give, because in this case,
          $$
          Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
          $$
          For the second quantity, just use the fact that
          $$
          mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
          $$






          share|cite|improve this answer















          Actually this is an important property of Gaussian distribution and it is frequently used. Suppose
          $$
          Pbigg(
          beginbmatrix
          x_1 \ x_2
          endbmatrix
          bigg)
          =
          mathcalNbigg(
          beginbmatrix mu_1 \ mu_2 endbmatrix
          ,
          beginbmatrix
          Sigma_11 & Sigma_12
          \
          Sigma_21 & Sigma_22 endbmatrix
          bigg).
          $$
          Then the conditional distribution of $x_1$ given $x_2$ is
          $$
          P(x_1|x_2) = mathcalN(mu_1 + Sigma_12Sigma_22^-1(x_2-mu_2),Sigma_11 - Sigma_12Sigma_22^-1Sigma_21).
          $$
          Therefore we have the conditional expectation as you give, because in this case,
          $$
          Sigma_11 = LambdaLambda^top + Psi, Sigma_12 = Lambda,Sigma_22 = I,mu_1 = mu_2 = 0.
          $$
          For the second quantity, just use the fact that
          $$
          mathbbE[textbfz textbfz^top mid textbfx] = mathbbE[z|x](mathbbE[z|x])^top + rm Var(z|x).
          $$







          share|cite|improve this answer















          share|cite|improve this answer



          share|cite|improve this answer








          edited 21 hours ago


























          answered yesterday









          Wanshan

          934113




          934113











          • Is the conditional expectation of a conditional probability density just the mean of that density?
            – gwg
            23 hours ago










          • Yes, you are right ^_^
            – Wanshan
            21 hours ago
















          • Is the conditional expectation of a conditional probability density just the mean of that density?
            – gwg
            23 hours ago










          • Yes, you are right ^_^
            – Wanshan
            21 hours ago















          Is the conditional expectation of a conditional probability density just the mean of that density?
          – gwg
          23 hours ago




          Is the conditional expectation of a conditional probability density just the mean of that density?
          – gwg
          23 hours ago












          Yes, you are right ^_^
          – Wanshan
          21 hours ago




          Yes, you are right ^_^
          – Wanshan
          21 hours ago












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872334%2fconditional-expectation-from-joint-distribution%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What is the equation of a 3D cone with generalised tilt?

          Color the edges and diagonals of a regular polygon

          Relationship between determinant of matrix and determinant of adjoint?