Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix

up vote
0
down vote

favorite

I'm reading a paper, probabilistic CCA, in which the authors state derivatives without showing derivations. I would like step-by-step derivations to convince myself. Consider a $d$-dimensional multivariate Gaussian random variable:

$$
textbfx sim mathcalN(boldsymbolmu, Sigma)
$$

In probabilistic CCA, we define $Sigma = W W^top + Psi$, where $W in mathbbR^d times q$ and $Psi in mathbbR^d times d$. I'd like to compute the derivative w.r.t. $boldsymbolmu$, $W$, and $Psi$ for the negative log-likelihood.

The stationary point for $boldsymbolmu$ is just the empirical mean (shown below*) or $hatboldsymbolmu$. Plugging in the minimum for the parameter $boldsymbolmu$ into the negative log-likelihood, we get:

$$
fracpartial mathcalLpartial W
=
fracpartialpartial W Big
overbrace
frac12 sum_i=1^n(textbfx_i - hatboldsymbolmu)^top Sigma^-1 (textbfx_i - hatboldsymbolmu)
^A
+
overbracefracn2 ln ^B + overbracetextconst^C
Big
$$

Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.

*Derivative w.r.t. $boldsymbolmu$

The negative log-likelihood is:

$$
mathcalL
=
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
$$

The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:

$$
fracpartialpartial boldsymbolmu
Big
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

By the linearity of differentiation, we have:

$$
frac12
sum_i=1^n
fracpartialpartial boldsymbolmu
Big
(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Using Equation ($86$) from the Matrix Cookbox, we get:

$$
frac12
sum_i=1^n
Big
-2 Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Finally, solve for $boldsymbolmu$, we get:

$$
beginalign
0
&= frac12 sum_i=1^n Big -2 Sigma^-1 (textbfx_i - boldsymbolmu) Big
\
&= - sum_i=1^n Big Sigma^-1 textbfx_i - Sigma^-1 boldsymbolmu Big
\
&= - sum_i=1^n Big Sigma^-1 textbfx_i Big + n Sigma^-1 boldsymbolmu
\
- n Sigma^-1 boldsymbolmu &= - Sigma^-1 sum_i=1^n textbfx_i
\
boldsymbolmu &= frac1n sum_i=1^n textbfx_i
endalign
$$

And we're done.

asked Jul 30 at 13:34

gwg

8501920

add a commentÂ |Â

up vote
0
down vote

favorite

$$
textbfx sim mathcalN(boldsymbolmu, Sigma)
$$

Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.

*Derivative w.r.t. $boldsymbolmu$

The negative log-likelihood is:

$$
mathcalL
=
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
$$

The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:

$$
fracpartialpartial boldsymbolmu
Big
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

By the linearity of differentiation, we have:

$$
frac12
sum_i=1^n
fracpartialpartial boldsymbolmu
Big
(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Using Equation ($86$) from the Matrix Cookbox, we get:

$$
frac12
sum_i=1^n
Big
-2 Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Finally, solve for $boldsymbolmu$, we get:

And we're done.

asked Jul 30 at 13:34

gwg

8501920

add a commentÂ |Â

up vote
0
down vote

favorite

$$
textbfx sim mathcalN(boldsymbolmu, Sigma)
$$

Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.

*Derivative w.r.t. $boldsymbolmu$

The negative log-likelihood is:

$$
mathcalL
=
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
$$

The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:

$$
fracpartialpartial boldsymbolmu
Big
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

By the linearity of differentiation, we have:

$$
frac12
sum_i=1^n
fracpartialpartial boldsymbolmu
Big
(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Using Equation ($86$) from the Matrix Cookbox, we get:

$$
frac12
sum_i=1^n
Big
-2 Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Finally, solve for $boldsymbolmu$, we get:

And we're done.

asked Jul 30 at 13:34

gwg

8501920

$$
textbfx sim mathcalN(boldsymbolmu, Sigma)
$$

Clearly, $C = 0$. But I'm not sure how to handle $A$ and $B$, particularly since $Sigma = W W^top + Psi$.

*Derivative w.r.t. $boldsymbolmu$

The negative log-likelihood is:

$$
mathcalL
=
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu) + fracn2 ln |Sigma| + textconst
$$

The derivative of the two rightmost terms with respect to $boldsymbolmu$ is $0$, meaning we just need to compute:

$$
fracpartialpartial boldsymbolmu
Big
frac12 sum_i=1^n(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

By the linearity of differentiation, we have:

$$
frac12
sum_i=1^n
fracpartialpartial boldsymbolmu
Big
(textbfx_i - boldsymbolmu)^top Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Using Equation ($86$) from the Matrix Cookbox, we get:

$$
frac12
sum_i=1^n
Big
-2 Sigma^-1 (textbfx_i - boldsymbolmu)
Big
=
0
$$

Finally, solve for $boldsymbolmu$, we get:

And we're done.

asked Jul 30 at 13:34

gwg

8501920

asked Jul 30 at 13:34

gwg

8501920

asked Jul 30 at 13:34

gwg

8501920

asked Jul 30 at 13:34

gwg

8501920

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

All those Greek letters are a pain to type, so let's use these variables
$$eqalign
S = Sigma,,,,P = Phi,,,,L=mathcal L,,,,Z = (X-mu 1) cr
$$
where $X$ is the matrix whose columns are the $x_i$ vectors, and $(mu 1)$ is a matrix all of whose elements are equal to $mu$.

Further, let's use a colon to denote the trace/Frobenius product
$$A:B = rm tr(A^TB)$$
Write the objective function in terms of the Frobenius product and these new variables. Then find its differential and gradients.
$$eqalign
L &= tfracn2log(det(S)) + tfrac12ZZ^T:S^-1 + K cr
dL
&= tfracn2rm tr,(dlog(S)) + tfrac12ZZ^T:dS^-1 + 0 cr
&= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dS cr
&= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):d(WW^T+P) cr
&= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T+dP) cr
$$
Setting $dW=0$ yields the gradient wrt $P$
$$eqalign
dL &= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):dP cr
fracpartial Lpartial P
&= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big)cr
$$
While setting $dP=0$ recovers the gradient wrt $W$
$$eqalign
dL
&= frac12Big(nS^-1 - S^-1ZZ^TS^-1Big):(dW,W^T+ W,dW^T) cr
&= Big(nS^-1 - S^-1ZZ^TS^-1Big)W:dW cr
fracpartial Lpartial W
&= Big(nS^-1 - S^-1ZZ^TS^-1Big)W cr
$$
In several of the steps, we've made use of the fact that $S$ is symmetric.

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2867022%2fderivation-of-derivative-of-multivariate-gaussian-w-r-t-covariance-matrix%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

add a commentÂ |Â

up vote
2
down vote

accepted

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

add a commentÂ |Â

up vote
2
down vote

accepted

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

edited Jul 30 at 21:21

answered Jul 30 at 18:01

greg

5,6331715

answered Jul 30 at 18:01

greg

5,6331715

answered Jul 30 at 18:01

greg

5,6331715

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

add a commentÂ |Â

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

Thanks! I can follow your derivation of $B$, the log of the determinant. I'm confused about how you setup for $A$ though. My part $A$ is $frac12 sum z^top S z$. If I use the trace trick, that's: $frac12 sum texttr(z z^top S)$. But your setup is $frac12 texttr(z^top z S^-1)$. What happened to the summation?
â€“Â gwg
Jul 30 at 19:47

Sorry, I misread the A term. The answer has been updated with the correct term. The change consisted of changing the $z$ vector into the $Z$ matrix.
â€“Â greg
Jul 30 at 20:36

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik