Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:



$$
textbfx sim mathcalN(boldsymbolmu, Sigma)
$$



In probabilistic CCA, we define $Sigma = W W^top + Psi$, where $W in mathbbR^d times q$ and $Psi in mathbbR^d times d$. I'd like to compute the derivative w.r.t. $boldsymbolmu$, $W$, and $Psi$ for the negative log-likelihood.



The stationary point for $boldsymbolmu$ is just the empirical mean (shown below*) or $hatboldsymbolmu$. Plugging in the minimum for the parameter $boldsymbolmu$ into the negative log-likelihood, we get:



$$
fracpartial mathcalLpartial W
=
fracpartialpartial W Big
overbrace
frac12 sum_i=1^n(textbfx_i - hatboldsymbolmu)^top Sigma^-1 (textbfx_i - hatboldsymbolmu)
^A
+
overbracefracn2 ln ^B + overbracetextconst^C
Big
$$



Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.




*Derivative w.r.t. $boldsymbolmu$



The negative log-likelihood is:



$$
mathcalL
=
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
$$



The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:



$$
fracpartialpartial boldsymbolmu
Big
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$



By the linearity of differentiation, we have:



$$
frac12
sum_i=1^n
fracpartialpartial boldsymbolmu
Big
(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$



Using Equation ($86$) from the Matrix Cookbox, we get:



$$
frac12
sum_i=1^n
Big
-2 Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$



Finally, solve for $boldsymbolmu$, we get:



$$
beginalign
0
&= frac12 sum_i=1^n Big -2 Sigma^-1 (textbfx_i - boldsymbolmu) Big
\
&= - sum_i=1^n Big Sigma^-1 textbfx_i - Sigma^-1 boldsymbolmu Big
\
&= - sum_i=1^n Big Sigma^-1 textbfx_i Big + n Sigma^-1 boldsymbolmu
\
- n Sigma^-1 boldsymbolmu &= - Sigma^-1 sum_i=1^n textbfx_i
\
boldsymbolmu &= frac1n sum_i=1^n textbfx_i
endalign
$$



And we're done.







share|cite|improve this question























    up vote
    0
    down vote

    favorite
    1












    I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:



    $$
    textbfx sim mathcalN(boldsymbolmu, Sigma)
    $$



    In probabilistic CCA, we define $Sigma = W W^top + Psi$, where $W in mathbbR^d times q$ and $Psi in mathbbR^d times d$. I'd like to compute the derivative w.r.t. $boldsymbolmu$, $W$, and $Psi$ for the negative log-likelihood.



    The stationary point for $boldsymbolmu$ is just the empirical mean (shown below*) or $hatboldsymbolmu$. Plugging in the minimum for the parameter $boldsymbolmu$ into the negative log-likelihood, we get:



    $$
    fracpartial mathcalLpartial W
    =
    fracpartialpartial W Big
    overbrace
    frac12 sum_i=1^n(textbfx_i - hatboldsymbolmu)^top Sigma^-1 (textbfx_i - hatboldsymbolmu)
    ^A
    +
    overbracefracn2 ln ^B + overbracetextconst^C
    Big
    $$



    Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.




    *Derivative w.r.t. $boldsymbolmu$



    The negative log-likelihood is:



    $$
    mathcalL
    =
    frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
    $$



    The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:



    $$
    fracpartialpartial boldsymbolmu
    Big
    frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
    Big
    =
    0
    $$



    By the linearity of differentiation, we have:



    $$
    frac12
    sum_i=1^n
    fracpartialpartial boldsymbolmu
    Big
    (textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
    Big
    =
    0
    $$



    Using Equation ($86$) from the Matrix Cookbox, we get:



    $$
    frac12
    sum_i=1^n
    Big
    -2 Sigma^-1 (textbfx_i - boldsymbolmu)
    Big
    =
    0
    $$



    Finally, solve for $boldsymbolmu$, we get:



    $$
    beginalign
    0
    &= frac12 sum_i=1^n Big -2 Sigma^-1 (textbfx_i - boldsymbolmu) Big
    \
    &= - sum_i=1^n Big Sigma^-1 textbfx_i - Sigma^-1 boldsymbolmu Big
    \
    &= - sum_i=1^n Big Sigma^-1 textbfx_i Big + n Sigma^-1 boldsymbolmu
    \
    - n Sigma^-1 boldsymbolmu &= - Sigma^-1 sum_i=1^n textbfx_i
    \
    boldsymbolmu &= frac1n sum_i=1^n textbfx_i
    endalign
    $$



    And we're done.







    share|cite|improve this question





















      up vote
      0
      down vote

      favorite
      1









      up vote
      0
      down vote

      favorite
      1






      1





      I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:



      $$
      textbfx sim mathcalN(boldsymbolmu, Sigma)
      $$



      In probabilistic CCA, we define $Sigma = W W^top + Psi$, where $W in mathbbR^d times q$ and $Psi in mathbbR^d times d$. I'd like to compute the derivative w.r.t. $boldsymbolmu$, $W$, and $Psi$ for the negative log-likelihood.



      The stationary point for $boldsymbolmu$ is just the empirical mean (shown below*) or $hatboldsymbolmu$. Plugging in the minimum for the parameter $boldsymbolmu$ into the negative log-likelihood, we get:



      $$
      fracpartial mathcalLpartial W
      =
      fracpartialpartial W Big
      overbrace
      frac12 sum_i=1^n(textbfx_i - hatboldsymbolmu)^top Sigma^-1 (textbfx_i - hatboldsymbolmu)
      ^A
      +
      overbracefracn2 ln ^B + overbracetextconst^C
      Big
      $$



      Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.




      *Derivative w.r.t. $boldsymbolmu$



      The negative log-likelihood is:



      $$
      mathcalL
      =
      frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
      $$



      The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:



      $$
      fracpartialpartial boldsymbolmu
      Big
      frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      By the linearity of differentiation, we have:



      $$
      frac12
      sum_i=1^n
      fracpartialpartial boldsymbolmu
      Big
      (textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      Using Equation ($86$) from the Matrix Cookbox, we get:



      $$
      frac12
      sum_i=1^n
      Big
      -2 Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      Finally, solve for $boldsymbolmu$, we get:



      $$
      beginalign
      0
      &= frac12 sum_i=1^n Big -2 Sigma^-1 (textbfx_i - boldsymbolmu) Big
      \
      &= - sum_i=1^n Big Sigma^-1 textbfx_i - Sigma^-1 boldsymbolmu Big
      \
      &= - sum_i=1^n Big Sigma^-1 textbfx_i Big + n Sigma^-1 boldsymbolmu
      \
      - n Sigma^-1 boldsymbolmu &= - Sigma^-1 sum_i=1^n textbfx_i
      \
      boldsymbolmu &= frac1n sum_i=1^n textbfx_i
      endalign
      $$



      And we're done.







      share|cite|improve this question











      I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:



      $$
      textbfx sim mathcalN(boldsymbolmu, Sigma)
      $$



      In probabilistic CCA, we define $Sigma = W W^top + Psi$, where $W in mathbbR^d times q$ and $Psi in mathbbR^d times d$. I'd like to compute the derivative w.r.t. $boldsymbolmu$, $W$, and $Psi$ for the negative log-likelihood.



      The stationary point for $boldsymbolmu$ is just the empirical mean (shown below*) or $hatboldsymbolmu$. Plugging in the minimum for the parameter $boldsymbolmu$ into the negative log-likelihood, we get:



      $$
      fracpartial mathcalLpartial W
      =
      fracpartialpartial W Big
      overbrace
      frac12 sum_i=1^n(textbfx_i - hatboldsymbolmu)^top Sigma^-1 (textbfx_i - hatboldsymbolmu)
      ^A
      +
      overbracefracn2 ln ^B + overbracetextconst^C
      Big
      $$



      Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.




      *Derivative w.r.t. $boldsymbolmu$



      The negative log-likelihood is:



      $$
      mathcalL
      =
      frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
      $$



      The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:



      $$
      fracpartialpartial boldsymbolmu
      Big
      frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      By the linearity of differentiation, we have:



      $$
      frac12
      sum_i=1^n
      fracpartialpartial boldsymbolmu
      Big
      (textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      Using Equation ($86$) from the Matrix Cookbox, we get:



      $$
      frac12
      sum_i=1^n
      Big
      -2 Sigma^-1 (textbfx_i - boldsymbolmu)
      Big
      =
      0
      $$



      Finally, solve for $boldsymbolmu$, we get:



      $$
      beginalign
      0
      &= frac12 sum_i=1^n Big -2 Sigma^-1 (textbfx_i - boldsymbolmu) Big
      \
      &= - sum_i=1^n Big Sigma^-1 textbfx_i - Sigma^-1 boldsymbolmu Big
      \
      &= - sum_i=1^n Big Sigma^-1 textbfx_i Big + n Sigma^-1 boldsymbolmu
      \
      - n Sigma^-1 boldsymbolmu &= - Sigma^-1 sum_i=1^n textbfx_i
      \
      boldsymbolmu &= frac1n sum_i=1^n textbfx_i
      endalign
      $$



      And we're done.









      share|cite|improve this question










      share|cite|improve this question




      share|cite|improve this question









      asked Jul 30 at 13:34









      gwg

      8501920




      8501920




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          All those Greek letters are a pain to type, so let's use these variables
          $$eqalign
          S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
          $$
          where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.



          Further, let's use a colon to denote the trace/Frobenius product
          $$A:B = rm tr(A^TB)$$
          Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
          $$eqalign
          L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
          dL
          &= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
          $$
          Setting $dW=0$ yields the gradient wrt $P$
          $$eqalign
          dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
          fracpartial Lpartial P
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
          $$
          While setting $dP=0$ recovers the gradient wrt $W$
          $$eqalign
          dL
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
          fracpartial Lpartial W
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
          $$
          In several of the steps, we've made use of the fact that $S$ is symmetric.






          share|cite|improve this answer























          • Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
            – gwg
            Jul 30 at 19:47











          • Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
            – greg
            Jul 30 at 20:36











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2867022%2fderivation-of-derivative-of-multivariate-gaussian-w-r-t-covariance-matrix%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          All those Greek letters are a pain to type, so let's use these variables
          $$eqalign
          S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
          $$
          where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.



          Further, let's use a colon to denote the trace/Frobenius product
          $$A:B = rm tr(A^TB)$$
          Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
          $$eqalign
          L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
          dL
          &= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
          $$
          Setting $dW=0$ yields the gradient wrt $P$
          $$eqalign
          dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
          fracpartial Lpartial P
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
          $$
          While setting $dP=0$ recovers the gradient wrt $W$
          $$eqalign
          dL
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
          fracpartial Lpartial W
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
          $$
          In several of the steps, we've made use of the fact that $S$ is symmetric.






          share|cite|improve this answer























          • Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
            – gwg
            Jul 30 at 19:47











          • Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
            – greg
            Jul 30 at 20:36















          up vote
          2
          down vote



          accepted










          All those Greek letters are a pain to type, so let's use these variables
          $$eqalign
          S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
          $$
          where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.



          Further, let's use a colon to denote the trace/Frobenius product
          $$A:B = rm tr(A^TB)$$
          Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
          $$eqalign
          L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
          dL
          &= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
          $$
          Setting $dW=0$ yields the gradient wrt $P$
          $$eqalign
          dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
          fracpartial Lpartial P
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
          $$
          While setting $dP=0$ recovers the gradient wrt $W$
          $$eqalign
          dL
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
          fracpartial Lpartial W
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
          $$
          In several of the steps, we've made use of the fact that $S$ is symmetric.






          share|cite|improve this answer























          • Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
            – gwg
            Jul 30 at 19:47











          • Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
            – greg
            Jul 30 at 20:36













          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          All those Greek letters are a pain to type, so let's use these variables
          $$eqalign
          S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
          $$
          where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.



          Further, let's use a colon to denote the trace/Frobenius product
          $$A:B = rm tr(A^TB)$$
          Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
          $$eqalign
          L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
          dL
          &= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
          $$
          Setting $dW=0$ yields the gradient wrt $P$
          $$eqalign
          dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
          fracpartial Lpartial P
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
          $$
          While setting $dP=0$ recovers the gradient wrt $W$
          $$eqalign
          dL
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
          fracpartial Lpartial W
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
          $$
          In several of the steps, we've made use of the fact that $S$ is symmetric.






          share|cite|improve this answer















          All those Greek letters are a pain to type, so let's use these variables
          $$eqalign
          S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
          $$
          where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.



          Further, let's use a colon to denote the trace/Frobenius product
          $$A:B = rm tr(A^TB)$$
          Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
          $$eqalign
          L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
          dL
          &= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
          $$
          Setting $dW=0$ yields the gradient wrt $P$
          $$eqalign
          dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
          fracpartial Lpartial P
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
          $$
          While setting $dP=0$ recovers the gradient wrt $W$
          $$eqalign
          dL
          &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
          fracpartial Lpartial W
          &= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
          $$
          In several of the steps, we've made use of the fact that $S$ is symmetric.







          share|cite|improve this answer















          share|cite|improve this answer



          share|cite|improve this answer








          edited Jul 30 at 21:21


























          answered Jul 30 at 18:01









          greg

          5,6331715




          5,6331715











          • Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
            – gwg
            Jul 30 at 19:47











          • Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
            – greg
            Jul 30 at 20:36

















          • Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
            – gwg
            Jul 30 at 19:47











          • Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
            – greg
            Jul 30 at 20:36
















          Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
          – gwg
          Jul 30 at 19:47





          Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
          – gwg
          Jul 30 at 19:47













          Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
          – greg
          Jul 30 at 20:36





          Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
          – greg
          Jul 30 at 20:36













           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2867022%2fderivation-of-derivative-of-multivariate-gaussian-w-r-t-covariance-matrix%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What is the equation of a 3D cone with generalised tilt?

          Color the edges and diagonals of a regular polygon

          Relationship between determinant of matrix and determinant of adjoint?