Two questions about the derivative of Softmax function.
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Actually i have some problems with the derivative of softmax:
$$y_k = frace^a_ksum_i=0^K e^a_i$$
The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?
The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.
I appreciate if you know about some lecture, or some property that i actually missing in my lectures.
Thanks.
calculus machine-learning
add a comment |Â
up vote
1
down vote
favorite
Actually i have some problems with the derivative of softmax:
$$y_k = frace^a_ksum_i=0^K e^a_i$$
The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?
The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.
I appreciate if you know about some lecture, or some property that i actually missing in my lectures.
Thanks.
calculus machine-learning
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Actually i have some problems with the derivative of softmax:
$$y_k = frace^a_ksum_i=0^K e^a_i$$
The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?
The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.
I appreciate if you know about some lecture, or some property that i actually missing in my lectures.
Thanks.
calculus machine-learning
Actually i have some problems with the derivative of softmax:
$$y_k = frace^a_ksum_i=0^K e^a_i$$
The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?
The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.
I appreciate if you know about some lecture, or some property that i actually missing in my lectures.
Thanks.
calculus machine-learning
asked Aug 1 at 1:07


Jose Daniel
61
61
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46
add a comment |Â
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.
The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.
The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$
add a comment |Â
up vote
0
down vote
Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.
The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.
The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$
Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.
The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$
answered Aug 1 at 3:30
Björn Lindqvist
324212
324212
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2868621%2ftwo-questions-about-the-derivative-of-softmax-function%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46