Using the chain rule for gradients with different function mappings

up vote
1
down vote

favorite

Consider $x in mathbf R$, $theta(x)$ is defined as $theta : mathbf R to mathbf R^n$, and $f(theta)$
defined as $f : mathbf R^n to mathbf R$. That is, $f$ is a function of $theta$, and $theta$ is a function of $x$.

You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.

I am trying to evaluate $nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$nabla_x f = (nabla_theta f)^T nabla_x theta.$$ Is this valid?

edited Jul 16 at 20:32

asked Jul 16 at 19:04

Jonathan Tuck

113

add a commentÂ |Â

up vote
1
down vote

favorite

You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.

edited Jul 16 at 20:32

asked Jul 16 at 19:04

Jonathan Tuck

113

add a commentÂ |Â

up vote
1
down vote

favorite

You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.

edited Jul 16 at 20:32

asked Jul 16 at 19:04

Jonathan Tuck

113

You may assume that $theta$ is differentiable in $x$, and $f$ differentiable in $theta$.

edited Jul 16 at 20:32

asked Jul 16 at 19:04

Jonathan Tuck

113

edited Jul 16 at 20:32

asked Jul 16 at 19:04

Jonathan Tuck

113

asked Jul 16 at 19:04

Jonathan Tuck

113

asked Jul 16 at 19:04

Jonathan Tuck

113

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
2
down vote

$newcommandbbRmathbbR$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $theta$. Note that $theta(x)$ is a single-variable function so $nabla_xtheta$ doesn't make sense either.

Define a new function $g colon bbR to bbR$ given by $g(x) = f(theta(x))$. Then by the chain rule,
$$g'(x_0) = left.nabla_theta(f)right|_theta(x_0) ^top theta'(x_0).$$
Spelled out completely,
$$g'(x_0) = left.fracpartial fpartial theta_1right|_theta_1(x_0) left.fracdtheta_1dxright|_x_0 + cdots + left.fracpartial fpartial theta_n right|_theta(x)left.fracdtheta_ndxright|_x_0$$

answered Jul 16 at 19:13

Nitin

4,7071928

add a commentÂ |Â

up vote
0
down vote

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).

Let's be a bit more careful.
Define $g:mathbbRrightarrowmathbbR$ by
$$
g(x)equiv f(theta(x))equiv f(theta_1(x),ldots,theta_n(x)).
$$
What you are looking for is $g^prime$, the derivative of $g$.
Apply the chain rule to get
$$
g^prime(x)=theta_1^prime(x)f_theta_1(theta(x))+cdots+theta_n^prime(x)f_theta_n(theta(x)).
$$
Or, more succinctly,
$$
g^prime(x)=left[nabla_thetaf(theta(x))right]^intercaltheta^prime(x).
$$
Omitting the arguments, this looks like your expression $(nabla_thetaf)^intercaltheta^prime$.

answered Jul 16 at 19:29

parsiad

16k32253

add a commentÂ |Â

up vote
0
down vote

Correction:$$dfracdfdx= (nabla_theta f)^T nabla_x theta$$we have:$$df=dfracpartial fpartial theta_1dtheta_1+cdots+dfracpartial fpartial theta_ndtheta_n$$or$$dfover dx=dfracpartial fpartial theta_1dtheta_1over dx+cdots+dfracpartial fpartial theta_ndtheta_nover dx$$from the other side$$nablatheta=left[dtheta_iover dxquadcdotsquaddtheta_nover dxright]$$for which the same relation we wanted to prove turns out immediately.

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853736%2fusing-the-chain-rule-for-gradients-with-different-function-mappings%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

answered Jul 16 at 19:13

Nitin

4,7071928

add a commentÂ |Â

up vote
2
down vote

answered Jul 16 at 19:13

Nitin

4,7071928

add a commentÂ |Â

up vote
2
down vote

answered Jul 16 at 19:13

Nitin

4,7071928

answered Jul 16 at 19:13

Nitin

4,7071928

answered Jul 16 at 19:13

Nitin

4,7071928

answered Jul 16 at 19:13

Nitin

4,7071928

answered Jul 16 at 19:13

Nitin

4,7071928

add a commentÂ |Â

up vote
0
down vote

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).

answered Jul 16 at 19:29

parsiad

16k32253

add a commentÂ |Â

up vote
0
down vote

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).

answered Jul 16 at 19:29

parsiad

16k32253

add a commentÂ |Â

up vote
0
down vote

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).

answered Jul 16 at 19:29

parsiad

16k32253

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $nabla$ should be used for gradients, not ordinary derivatives).

answered Jul 16 at 19:29

parsiad

16k32253

answered Jul 16 at 19:29

parsiad

16k32253

answered Jul 16 at 19:29

parsiad

16k32253

answered Jul 16 at 19:29

parsiad

16k32253

add a commentÂ |Â

up vote
0
down vote

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

add a commentÂ |Â

up vote
0
down vote

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

add a commentÂ |Â

up vote
0
down vote

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

answered Jul 16 at 19:44

Mostafa Ayaz

8,6023630

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik