Using the chain rule for gradients with different function mappings

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.



You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.



I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?







share|cite|improve this question

























    up vote
    1
    down vote

    favorite












    Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
    defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.



    You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.



    I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?







    share|cite|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
      defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.



      You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.



      I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?







      share|cite|improve this question













      Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
      defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.



      You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.



      I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?









      share|cite|improve this question












      share|cite|improve this question




      share|cite|improve this question








      edited Jul 16 at 20:32
























      asked Jul 16 at 19:04









      Jonathan Tuck

      113




      113




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          2
          down vote













          $newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.



          Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
          $$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
          Spelled out completely,
          $$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$






          share|cite|improve this answer




























            up vote
            0
            down vote













            I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).



            Let's be a bit more careful.
            Define $g:mathbbRrightarrowmathbbR$ by
            $$
            g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
            $$
            What you are looking for is $g^prime$, the derivative of $g$.
            Apply the chain rule to get
            $$
            g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
            $$
            Or, more succinctly,
            $$
            g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
            $$
            Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.






            share|cite|improve this answer




























              up vote
              0
              down vote













              Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.






              share|cite|improve this answer





















                Your Answer




                StackExchange.ifUsing("editor", function ()
                return StackExchange.using("mathjaxEditing", function ()
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                );
                );
                , "mathjax-editing");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "69"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );








                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853736%2fusing-the-chain-rule-for-gradients-with-different-function-mappings%23new-answer', 'question_page');

                );

                Post as a guest






























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                2
                down vote













                $newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.



                Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
                $$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
                Spelled out completely,
                $$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$






                share|cite|improve this answer

























                  up vote
                  2
                  down vote













                  $newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.



                  Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
                  $$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
                  Spelled out completely,
                  $$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$






                  share|cite|improve this answer























                    up vote
                    2
                    down vote










                    up vote
                    2
                    down vote









                    $newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.



                    Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
                    $$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
                    Spelled out completely,
                    $$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$






                    share|cite|improve this answer













                    $newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.



                    Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
                    $$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
                    Spelled out completely,
                    $$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$







                    share|cite|improve this answer













                    share|cite|improve this answer



                    share|cite|improve this answer











                    answered Jul 16 at 19:13









                    Nitin

                    4,7071928




                    4,7071928




















                        up vote
                        0
                        down vote













                        I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).



                        Let's be a bit more careful.
                        Define $g:mathbbRrightarrowmathbbR$ by
                        $$
                        g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
                        $$
                        What you are looking for is $g^prime$, the derivative of $g$.
                        Apply the chain rule to get
                        $$
                        g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
                        $$
                        Or, more succinctly,
                        $$
                        g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
                        $$
                        Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.






                        share|cite|improve this answer

























                          up vote
                          0
                          down vote













                          I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).



                          Let's be a bit more careful.
                          Define $g:mathbbRrightarrowmathbbR$ by
                          $$
                          g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
                          $$
                          What you are looking for is $g^prime$, the derivative of $g$.
                          Apply the chain rule to get
                          $$
                          g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
                          $$
                          Or, more succinctly,
                          $$
                          g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
                          $$
                          Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.






                          share|cite|improve this answer























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).



                            Let's be a bit more careful.
                            Define $g:mathbbRrightarrowmathbbR$ by
                            $$
                            g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
                            $$
                            What you are looking for is $g^prime$, the derivative of $g$.
                            Apply the chain rule to get
                            $$
                            g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
                            $$
                            Or, more succinctly,
                            $$
                            g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
                            $$
                            Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.






                            share|cite|improve this answer













                            I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).



                            Let's be a bit more careful.
                            Define $g:mathbbRrightarrowmathbbR$ by
                            $$
                            g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
                            $$
                            What you are looking for is $g^prime$, the derivative of $g$.
                            Apply the chain rule to get
                            $$
                            g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
                            $$
                            Or, more succinctly,
                            $$
                            g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
                            $$
                            Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.







                            share|cite|improve this answer













                            share|cite|improve this answer



                            share|cite|improve this answer











                            answered Jul 16 at 19:29









                            parsiad

                            16k32253




                            16k32253




















                                up vote
                                0
                                down vote













                                Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.






                                share|cite|improve this answer

























                                  up vote
                                  0
                                  down vote













                                  Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.






                                  share|cite|improve this answer























                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.






                                    share|cite|improve this answer













                                    Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.







                                    share|cite|improve this answer













                                    share|cite|improve this answer



                                    share|cite|improve this answer











                                    answered Jul 16 at 19:44









                                    Mostafa Ayaz

                                    8,6023630




                                    8,6023630






















                                         

                                        draft saved


                                        draft discarded


























                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853736%2fusing-the-chain-rule-for-gradients-with-different-function-mappings%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Comments

                                        Popular posts from this blog

                                        What is the equation of a 3D cone with generalised tilt?

                                        Color the edges and diagonals of a regular polygon

                                        Relationship between determinant of matrix and determinant of adjoint?