What's wrong with my derivation of the gradient of KL divergence?

up vote
0
down vote

favorite

I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.

In the paper, it defines the similarity between datapoints, $p_j$, as follows

$$p_j=x_i-x_kright$$

the similarity between map points is given by

$$q_j=^2)over sum_kne iexp(-left$$

and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$

Then it says the gradient is

$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$

However, I derive a different result

$$beginalign
partial Cover partial y_i&=-sum_j p_j1 over q_jnabla_y_iq_j + p_j1over q_jnabla_y_iq_j
endalign$$
and
$$beginalign
nabla_y_iq_j&=q_jleft(2(y_j-y_i)-2^2)right)\
nabla_y_iq_j&=q_j(2(y_j-y_i)-2q_j(y_j-y_i))
endalign$$
so
$$partial Coverpartial y_i=2sum_j(y_i-y_j)left(p_j+p_j-q_j-^2)right)$$

I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.

edited Jul 17 at 3:01

asked Jul 16 at 4:56

Sherwin Chen

536

add a commentÂ |Â

up vote
0
down vote

favorite

I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.

In the paper, it defines the similarity between datapoints, $p_j$, as follows

$$p_j=x_i-x_kright$$

the similarity between map points is given by

$$q_j=^2)over sum_kne iexp(-left$$

and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$

Then it says the gradient is

$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$

However, I derive a different result

I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.

edited Jul 17 at 3:01

asked Jul 16 at 4:56

Sherwin Chen

536

add a commentÂ |Â

up vote
0
down vote

favorite

I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.

In the paper, it defines the similarity between datapoints, $p_j$, as follows

$$p_j=x_i-x_kright$$

the similarity between map points is given by

$$q_j=^2)over sum_kne iexp(-left$$

and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$

Then it says the gradient is

$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$

However, I derive a different result

I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.

edited Jul 17 at 3:01

asked Jul 16 at 4:56

Sherwin Chen

536

I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.

In the paper, it defines the similarity between datapoints, $p_j$, as follows

$$p_j=x_i-x_kright$$

the similarity between map points is given by

$$q_j=^2)over sum_kne iexp(-left$$

and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$

Then it says the gradient is

$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$

However, I derive a different result

I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.

edited Jul 17 at 3:01

asked Jul 16 at 4:56

Sherwin Chen

536

edited Jul 17 at 3:01

asked Jul 16 at 4:56

Sherwin Chen

536

asked Jul 16 at 4:56

Sherwin Chen

536

asked Jul 16 at 4:56

Sherwin Chen

536

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853107%2fwhats-wrong-with-my-derivation-of-the-gradient-of-kl-divergence%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik