Using the chain rule for gradients with different function mappings
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.
You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.
I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?
calculus multivariable-calculus
add a comment |Â
up vote
1
down vote
favorite
Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.
You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.
I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?
calculus multivariable-calculus
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.
You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.
I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?
calculus multivariable-calculus
Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.
You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.
I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?
calculus multivariable-calculus
edited Jul 16 at 20:32
asked Jul 16 at 19:04
Jonathan Tuck
113
113
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.
Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$
add a comment |Â
up vote
0
down vote
I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).
Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.
add a comment |Â
up vote
0
down vote
Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.
Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$
add a comment |Â
up vote
2
down vote
$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.
Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$
add a comment |Â
up vote
2
down vote
up vote
2
down vote
$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.
Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$
$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.
Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$
answered Jul 16 at 19:13
Nitin
4,7071928
4,7071928
add a comment |Â
add a comment |Â
up vote
0
down vote
I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).
Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.
add a comment |Â
up vote
0
down vote
I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).
Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).
Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.
I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).
Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.
answered Jul 16 at 19:29
parsiad
16k32253
16k32253
add a comment |Â
add a comment |Â
up vote
0
down vote
Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.
add a comment |Â
up vote
0
down vote
Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.
Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.
answered Jul 16 at 19:44


Mostafa Ayaz
8,6023630
8,6023630
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853736%2fusing-the-chain-rule-for-gradients-with-different-function-mappings%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password