Two questions about the derivative of Softmax function.

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Actually i have some problems with the derivative of softmax:



$$y_k = frace^a_ksum_i=0^K e^a_i$$



The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?



The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.



I appreciate if you know about some lecture, or some property that i actually missing in my lectures.



Thanks.







share|cite|improve this question



















  • Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
    – John Samples
    Aug 1 at 1:46














up vote
1
down vote

favorite












Actually i have some problems with the derivative of softmax:



$$y_k = frace^a_ksum_i=0^K e^a_i$$



The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?



The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.



I appreciate if you know about some lecture, or some property that i actually missing in my lectures.



Thanks.







share|cite|improve this question



















  • Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
    – John Samples
    Aug 1 at 1:46












up vote
1
down vote

favorite









up vote
1
down vote

favorite











Actually i have some problems with the derivative of softmax:



$$y_k = frace^a_ksum_i=0^K e^a_i$$



The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?



The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.



I appreciate if you know about some lecture, or some property that i actually missing in my lectures.



Thanks.







share|cite|improve this question











Actually i have some problems with the derivative of softmax:



$$y_k = frace^a_ksum_i=0^K e^a_i$$



The first think i want to know is why the derivative of $fracpartial (sum_i=0^K e^a_i) partial e^k = e^a_k$?, why the indice of $e^a$ change?



The second question is why the equation have two answers?, i know how to get the first answer, but the second is a little bit confuse for me.



I appreciate if you know about some lecture, or some property that i actually missing in my lectures.



Thanks.









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Aug 1 at 1:07









Jose Daniel

61




61











  • Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
    – John Samples
    Aug 1 at 1:46
















  • Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
    – John Samples
    Aug 1 at 1:46















Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46




Can you provide definitions and a bit of background? Also, what are the variables? Is $k = K$? You want to take the derivative of the fraction of the two derivatives, or are you asking if that equation you wrote is already true?
– John Samples
Aug 1 at 1:46










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Note that the softmax function takes a vector and produces a vector of
equal size. Therefore its "derivative" will be a Jacobian matrix
containing its partial derivatives. If the vectors softmax operates on
has $n$ elements, then the Jacobian will be of size $n times n$ and
contain $n^2$ partial derivatives.



The easier way (I think) to understand what happens is to work on
vectors of size two and generalize from that. So let softmax be
$$
S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
$$
The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
$$
JS([x,y]) = beginbmatrix
fracpartial S_xpartial x fracpartial S_xpartial y\
fracpartial S_ypartial x fracpartial S_ypartial y
endbmatrix.
$$
Calculating gives
$$
fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
= frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
$$
Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
$$
fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
$$
As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
$$
JS([x,y]) = beginbmatrix
S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
-S_y(y)S_x(x) & S_y(y)(1-S_y(y))
endbmatrix.
$$
There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
$$
fracpartialpartial x(e^x + e^y) = e^x.
$$






share|cite|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2868621%2ftwo-questions-about-the-derivative-of-softmax-function%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Note that the softmax function takes a vector and produces a vector of
    equal size. Therefore its "derivative" will be a Jacobian matrix
    containing its partial derivatives. If the vectors softmax operates on
    has $n$ elements, then the Jacobian will be of size $n times n$ and
    contain $n^2$ partial derivatives.



    The easier way (I think) to understand what happens is to work on
    vectors of size two and generalize from that. So let softmax be
    $$
    S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
    $$
    The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
    $$
    JS([x,y]) = beginbmatrix
    fracpartial S_xpartial x fracpartial S_xpartial y\
    fracpartial S_ypartial x fracpartial S_ypartial y
    endbmatrix.
    $$
    Calculating gives
    $$
    fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
    = frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
    $$
    Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
    $$
    fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
    $$
    As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
    $$
    JS([x,y]) = beginbmatrix
    S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
    -S_y(y)S_x(x) & S_y(y)(1-S_y(y))
    endbmatrix.
    $$
    There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
    $$
    fracpartialpartial x(e^x + e^y) = e^x.
    $$






    share|cite|improve this answer

























      up vote
      0
      down vote













      Note that the softmax function takes a vector and produces a vector of
      equal size. Therefore its "derivative" will be a Jacobian matrix
      containing its partial derivatives. If the vectors softmax operates on
      has $n$ elements, then the Jacobian will be of size $n times n$ and
      contain $n^2$ partial derivatives.



      The easier way (I think) to understand what happens is to work on
      vectors of size two and generalize from that. So let softmax be
      $$
      S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
      $$
      The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
      $$
      JS([x,y]) = beginbmatrix
      fracpartial S_xpartial x fracpartial S_xpartial y\
      fracpartial S_ypartial x fracpartial S_ypartial y
      endbmatrix.
      $$
      Calculating gives
      $$
      fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
      = frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
      $$
      Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
      $$
      fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
      $$
      As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
      $$
      JS([x,y]) = beginbmatrix
      S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
      -S_y(y)S_x(x) & S_y(y)(1-S_y(y))
      endbmatrix.
      $$
      There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
      $$
      fracpartialpartial x(e^x + e^y) = e^x.
      $$






      share|cite|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        Note that the softmax function takes a vector and produces a vector of
        equal size. Therefore its "derivative" will be a Jacobian matrix
        containing its partial derivatives. If the vectors softmax operates on
        has $n$ elements, then the Jacobian will be of size $n times n$ and
        contain $n^2$ partial derivatives.



        The easier way (I think) to understand what happens is to work on
        vectors of size two and generalize from that. So let softmax be
        $$
        S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
        $$
        The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
        $$
        JS([x,y]) = beginbmatrix
        fracpartial S_xpartial x fracpartial S_xpartial y\
        fracpartial S_ypartial x fracpartial S_ypartial y
        endbmatrix.
        $$
        Calculating gives
        $$
        fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
        = frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
        $$
        Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
        $$
        fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
        $$
        As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
        $$
        JS([x,y]) = beginbmatrix
        S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
        -S_y(y)S_x(x) & S_y(y)(1-S_y(y))
        endbmatrix.
        $$
        There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
        $$
        fracpartialpartial x(e^x + e^y) = e^x.
        $$






        share|cite|improve this answer













        Note that the softmax function takes a vector and produces a vector of
        equal size. Therefore its "derivative" will be a Jacobian matrix
        containing its partial derivatives. If the vectors softmax operates on
        has $n$ elements, then the Jacobian will be of size $n times n$ and
        contain $n^2$ partial derivatives.



        The easier way (I think) to understand what happens is to work on
        vectors of size two and generalize from that. So let softmax be
        $$
        S([x, y]) = [S_x(x), S_y(y)]= left[frace^xe^x+e^y, frace^ye^x+e^yright].
        $$
        The Jacobian for $S$ will contain 4 partial derivatives arranged in the following fashion:
        $$
        JS([x,y]) = beginbmatrix
        fracpartial S_xpartial x fracpartial S_xpartial y\
        fracpartial S_ypartial x fracpartial S_ypartial y
        endbmatrix.
        $$
        Calculating gives
        $$
        fracpartial S_xpartial x = fracpartialpartial xfrace^xe^x+e^y
        = frace^x(e^x+e^y) - e^2x(e^x+e^y)^2 = frace^xe^x+e^yfrace^ye^x+e^y = S_x(x)S_y(y).
        $$
        Note that $S_y(y) = 1 - S_x(x)$ so it is more general to write the derivative as $S_x(x)(1 - S_x(x))$ because the formula works for vectors with more than two components. We calculate another derivative:
        $$
        fracpartial S_xpartial y = fracpartialpartial yfrace^xe^x+e^y = frac-e^xe^y(e^x+e^y)^2 = -S_x(x)S_y(y)
        $$
        As you can imagine, the partial derivatives are symmetric so we can fill in the full Jacobian.
        $$
        JS([x,y]) = beginbmatrix
        S_x(x)(1-S_x(x)) & -S_x(x)S_y(y)\
        -S_y(y)S_x(x) & S_y(y)(1-S_y(y))
        endbmatrix.
        $$
        There are two different "types" of elements depending on whether they are on the diagonal or not. For your first question, just note that
        $$
        fracpartialpartial x(e^x + e^y) = e^x.
        $$







        share|cite|improve this answer













        share|cite|improve this answer



        share|cite|improve this answer











        answered Aug 1 at 3:30









        Björn Lindqvist

        324212




        324212






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2868621%2ftwo-questions-about-the-derivative-of-softmax-function%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?