What's wrong with my derivation of the gradient of KL divergence?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.
In the paper, it defines the similarity between datapoints, $p_j$, as follows
$$p_j=x_i-x_kright$$
the similarity between map points is given by
$$q_j=^2)over sum_kne iexp(-left$$
and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$
Then it says the gradient is
$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$
However, I derive a different result
$$beginalign
partial Cover partial y_i&=-sum_j p_j1 over q_jnabla_y_iq_j + p_j1over q_jnabla_y_iq_j
endalign$$
and
$$beginalign
nabla_y_iq_j&=q_jleft(2(y_j-y_i)-2^2)right)\
nabla_y_iq_j&=q_j(2(y_j-y_i)-2q_j(y_j-y_i))
endalign$$
so
$$partial Coverpartial y_i=2sum_j(y_i-y_j)left(p_j+p_j-q_j-^2)right)$$
I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.
derivatives
add a comment |Â
up vote
0
down vote
favorite
I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.
In the paper, it defines the similarity between datapoints, $p_j$, as follows
$$p_j=x_i-x_kright$$
the similarity between map points is given by
$$q_j=^2)over sum_kne iexp(-left$$
and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$
Then it says the gradient is
$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$
However, I derive a different result
$$beginalign
partial Cover partial y_i&=-sum_j p_j1 over q_jnabla_y_iq_j + p_j1over q_jnabla_y_iq_j
endalign$$
and
$$beginalign
nabla_y_iq_j&=q_jleft(2(y_j-y_i)-2^2)right)\
nabla_y_iq_j&=q_j(2(y_j-y_i)-2q_j(y_j-y_i))
endalign$$
so
$$partial Coverpartial y_i=2sum_j(y_i-y_j)left(p_j+p_j-q_j-^2)right)$$
I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.
derivatives
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.
In the paper, it defines the similarity between datapoints, $p_j$, as follows
$$p_j=x_i-x_kright$$
the similarity between map points is given by
$$q_j=^2)over sum_kne iexp(-left$$
and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$
Then it says the gradient is
$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$
However, I derive a different result
$$beginalign
partial Cover partial y_i&=-sum_j p_j1 over q_jnabla_y_iq_j + p_j1over q_jnabla_y_iq_j
endalign$$
and
$$beginalign
nabla_y_iq_j&=q_jleft(2(y_j-y_i)-2^2)right)\
nabla_y_iq_j&=q_j(2(y_j-y_i)-2q_j(y_j-y_i))
endalign$$
so
$$partial Coverpartial y_i=2sum_j(y_i-y_j)left(p_j+p_j-q_j-^2)right)$$
I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.
derivatives
I'm reading the paper Visualizing Data using t-SNE, and I get stuck on the gradient of KL divergence.
In the paper, it defines the similarity between datapoints, $p_j$, as follows
$$p_j=x_i-x_kright$$
the similarity between map points is given by
$$q_j=^2)over sum_kne iexp(-left$$
and at last, the cost function, which is the sum of KL divergences over all datapoints is defined by
$$C=sum_iKL(P_ivert|Q_i)=sum_isum_jp_jlogp_jover q_j$$
Then it says the gradient is
$$partial Cover partial y_i=2sum_j(p_j-q_j+p_j-q_j)(y_i-y_j)$$
However, I derive a different result
$$beginalign
partial Cover partial y_i&=-sum_j p_j1 over q_jnabla_y_iq_j + p_j1over q_jnabla_y_iq_j
endalign$$
and
$$beginalign
nabla_y_iq_j&=q_jleft(2(y_j-y_i)-2^2)right)\
nabla_y_iq_j&=q_j(2(y_j-y_i)-2q_j(y_j-y_i))
endalign$$
so
$$partial Coverpartial y_i=2sum_j(y_i-y_j)left(p_j+p_j-q_j-^2)right)$$
I've checked my derivation several times, but still have no clue about where the wrong is. Could someone help me check it? I'll be very grateful for that. Thanks.
derivatives
edited Jul 17 at 3:01
asked Jul 16 at 4:56
Sherwin Chen
536
536
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853107%2fwhats-wrong-with-my-derivation-of-the-gradient-of-kl-divergence%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password