Two questions about the derivative of Softmax function.

up vote
1
down vote

favorite

Actually i have some problems with the derivative of softmax:

$$y_k = frace^a_ksum_i=0^K e^a_i$$

The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?

The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.

I appreciate if you know about some lecture, or some property that i actually missing in my lectures.

Thanks.

asked Aug 1 at 1:07

Jose Daniel

Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
â€“Â John Samples
Aug 1 at 1:46

add a commentÂ |Â

up vote
1
down vote

favorite

Actually i have some problems with the derivative of softmax:

$$y_k = frace^a_ksum_i=0^K e^a_i$$

The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?

The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.

I appreciate if you know about some lecture, or some property that i actually missing in my lectures.

Thanks.

asked Aug 1 at 1:07

Jose Daniel

Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
â€“Â John Samples
Aug 1 at 1:46

add a commentÂ |Â

up vote
1
down vote

favorite

Actually i have some problems with the derivative of softmax:

$$y_k = frace^a_ksum_i=0^K e^a_i$$

The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?

The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.

I appreciate if you know about some lecture, or some property that i actually missing in my lectures.

Thanks.

asked Aug 1 at 1:07

Jose Daniel

Actually i have some problems with the derivative of softmax:

$$y_k = frace^a_ksum_i=0^K e^a_i$$

The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?

The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.

I appreciate if you know about some lecture, or some property that i actually missing in my lectures.

Thanks.

asked Aug 1 at 1:07

Jose Daniel

asked Aug 1 at 1:07

Jose Daniel

asked Aug 1 at 1:07

Jose Daniel

asked Aug 1 at 1:07

Jose Daniel

Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
â€“Â John Samples
Aug 1 at 1:46

add a commentÂ |Â

Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
â€“Â John Samples
Aug 1 at 1:46

Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
â€“Â John Samples
Aug 1 at 1:46

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
0
down vote

Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.

The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2868621%2ftwo-questions-about-the-derivative-of-softmax-function%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

add a commentÂ |Â

up vote
0
down vote

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

add a commentÂ |Â

up vote
0
down vote

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

answered Aug 1 at 3:30

BjÃ¶rn Lindqvist

324212

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik