Proof of why Lagrangian Multipliers work

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
1












Suppose $vecx in mathbbR^n$ and we wish to optimize $f(vecx)$ subject to k constraints: $g_i(vecx) = c_i quad forall i in 1,...,k$



Now it is clear that at a local max/min, say $vecx^*, nabla f(vecx^*)$ is orthogonal to the any curve $r(x)$ that satisfies the $g_i$ constraints (i.e. $nabla f(vecx^*) cdot r^'(vecx^*) = 0) quadquad textbf(1)$.



Therefore, it is also clear to me that if $nabla f(vecx^*) = sum_i=1^k lambda_i nabla g_i(vecx^*)$ then $textbf(1)$ will hold (since each of the $nabla g_i(vecx^*)$ are orthogonal to the curve themselves. However, I am confused as to why the converse is true.



Why is it that if $nabla f(vecx^*) cdot r^'(vecx^*) = 0, nabla f(vecx^*)$ MUST be in the span of the gradients of the $g_i$?







share|cite|improve this question



















  • What do you mean by curve $ r (x) $ ?
    – S.t.a.l.k.e.r
    Jul 15 at 5:29










  • "Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
    – Richard Groenewald
    Jul 15 at 5:55






  • 2




    This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
    – copper.hat
    Jul 15 at 6:06











  • This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
    – S.t.a.l.k.e.r
    Jul 15 at 8:18














up vote
3
down vote

favorite
1












Suppose $vecx in mathbbR^n$ and we wish to optimize $f(vecx)$ subject to k constraints: $g_i(vecx) = c_i quad forall i in 1,...,k$



Now it is clear that at a local max/min, say $vecx^*, nabla f(vecx^*)$ is orthogonal to the any curve $r(x)$ that satisfies the $g_i$ constraints (i.e. $nabla f(vecx^*) cdot r^'(vecx^*) = 0) quadquad textbf(1)$.



Therefore, it is also clear to me that if $nabla f(vecx^*) = sum_i=1^k lambda_i nabla g_i(vecx^*)$ then $textbf(1)$ will hold (since each of the $nabla g_i(vecx^*)$ are orthogonal to the curve themselves. However, I am confused as to why the converse is true.



Why is it that if $nabla f(vecx^*) cdot r^'(vecx^*) = 0, nabla f(vecx^*)$ MUST be in the span of the gradients of the $g_i$?







share|cite|improve this question



















  • What do you mean by curve $ r (x) $ ?
    – S.t.a.l.k.e.r
    Jul 15 at 5:29










  • "Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
    – Richard Groenewald
    Jul 15 at 5:55






  • 2




    This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
    – copper.hat
    Jul 15 at 6:06











  • This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
    – S.t.a.l.k.e.r
    Jul 15 at 8:18












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





Suppose $vecx in mathbbR^n$ and we wish to optimize $f(vecx)$ subject to k constraints: $g_i(vecx) = c_i quad forall i in 1,...,k$



Now it is clear that at a local max/min, say $vecx^*, nabla f(vecx^*)$ is orthogonal to the any curve $r(x)$ that satisfies the $g_i$ constraints (i.e. $nabla f(vecx^*) cdot r^'(vecx^*) = 0) quadquad textbf(1)$.



Therefore, it is also clear to me that if $nabla f(vecx^*) = sum_i=1^k lambda_i nabla g_i(vecx^*)$ then $textbf(1)$ will hold (since each of the $nabla g_i(vecx^*)$ are orthogonal to the curve themselves. However, I am confused as to why the converse is true.



Why is it that if $nabla f(vecx^*) cdot r^'(vecx^*) = 0, nabla f(vecx^*)$ MUST be in the span of the gradients of the $g_i$?







share|cite|improve this question











Suppose $vecx in mathbbR^n$ and we wish to optimize $f(vecx)$ subject to k constraints: $g_i(vecx) = c_i quad forall i in 1,...,k$



Now it is clear that at a local max/min, say $vecx^*, nabla f(vecx^*)$ is orthogonal to the any curve $r(x)$ that satisfies the $g_i$ constraints (i.e. $nabla f(vecx^*) cdot r^'(vecx^*) = 0) quadquad textbf(1)$.



Therefore, it is also clear to me that if $nabla f(vecx^*) = sum_i=1^k lambda_i nabla g_i(vecx^*)$ then $textbf(1)$ will hold (since each of the $nabla g_i(vecx^*)$ are orthogonal to the curve themselves. However, I am confused as to why the converse is true.



Why is it that if $nabla f(vecx^*) cdot r^'(vecx^*) = 0, nabla f(vecx^*)$ MUST be in the span of the gradients of the $g_i$?









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 15 at 2:31









Richard Groenewald

262




262











  • What do you mean by curve $ r (x) $ ?
    – S.t.a.l.k.e.r
    Jul 15 at 5:29










  • "Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
    – Richard Groenewald
    Jul 15 at 5:55






  • 2




    This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
    – copper.hat
    Jul 15 at 6:06











  • This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
    – S.t.a.l.k.e.r
    Jul 15 at 8:18
















  • What do you mean by curve $ r (x) $ ?
    – S.t.a.l.k.e.r
    Jul 15 at 5:29










  • "Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
    – Richard Groenewald
    Jul 15 at 5:55






  • 2




    This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
    – copper.hat
    Jul 15 at 6:06











  • This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
    – S.t.a.l.k.e.r
    Jul 15 at 8:18















What do you mean by curve $ r (x) $ ?
– S.t.a.l.k.e.r
Jul 15 at 5:29




What do you mean by curve $ r (x) $ ?
– S.t.a.l.k.e.r
Jul 15 at 5:29












"Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
– Richard Groenewald
Jul 15 at 5:55




"Curve" is probably the wrong word to use since I just mean something analogous to a space curve in 3D or a plane curve in 2D (as in $r(vecx(t))$ is any parameterization that satisfies $g_i(r(vecx(t))) = c_i$)
– Richard Groenewald
Jul 15 at 5:55




2




2




This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
– copper.hat
Jul 15 at 6:06





This is just duality, if $A,B$ are (closed) subspaces then $A subset B$ iff $B^bot subset A^bot$. (That is, if whenever $r$ is in the orthogonal complement of the $nabla g_i(x)$, we have $r$ in the orthogonal complement of $nabla f(x)$, then $nabla f(x)$ is in the span of the constraint gradients.)
– copper.hat
Jul 15 at 6:06













This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
– S.t.a.l.k.e.r
Jul 15 at 8:18




This is the key to your answer en.m.wikipedia.org/wiki/Implicit_function_theorem
– S.t.a.l.k.e.r
Jul 15 at 8:18










1 Answer
1






active

oldest

votes

















up vote
0
down vote













The Lagrange Mutiplier is based on Implicit function theorem and inverse function theorem.



I will use 3 variables and 1 constraint to demonstrate why it works, the idea can be extended to any finite number of variables and constraints.



Let's assume your functions have partial derivatives that allows application of implicit function theorem which they must if Lagrange multiplier is to be used.



$ f (x_1, x_2, x_3)$



$g(x_1, x_2, x_3) = 0$



Because of the constraint you have $ x_1=h (x_2, x_3) $



$ f (h, x_2, x_3 ) $



$ g (h, x_2, x_3 )=0 $



Stationary points occur at $fracdfdx_1=fracdfdx_2=fracdfdx_1=0$



$0=fracdfdx_2=fracpartial fpartial x_2+fracpartial fpartial hfracpartial hpartial x_2$



From the constraint $0=fracdgdx_2=fracpartial gpartial x_2+fracpartial gpartial hfracpartial hpartial x_2$



Now using the 2 equations



$0=fracpartial fpartial x_2+lambdafracpartial gpartial x_2$



Where $lambda = -fracpartial fpartial h/fracpartial gpartial h$



For $ x_3 $ we have $0=fracpartial fpartial x_3+lambdafracpartial gpartial x_3$



Using the same trick with $ x_2 (x_1, x_3) $ you should observe that



$lambda=-fracpartial fpartial h/fracpartial gpartial h=-fracpartial fpartial x_2/fracpartial gpartial x_2=-fracpartial fpartial x_3/fracpartial gpartial x_3$



And get



$0=fracpartial fpartial x_1+lambdafracpartial gpartial x_1$



To extend the proof to finite number of variables and constraints :



$ f (x_1,.., x_n) $



Write $ x_1 (x_2,..., x_n) $ ,....,$ x_k(x_k+1,..., x_n)$



Where $ k $ is number of constraints , write out the partial derivatives at the stationary point and set to zero then you end up with equations you asked in your question






share|cite|improve this answer























  • I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
    – Richard Groenewald
    Jul 15 at 17:20










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2852134%2fproof-of-why-lagrangian-multipliers-work%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













The Lagrange Mutiplier is based on Implicit function theorem and inverse function theorem.



I will use 3 variables and 1 constraint to demonstrate why it works, the idea can be extended to any finite number of variables and constraints.



Let's assume your functions have partial derivatives that allows application of implicit function theorem which they must if Lagrange multiplier is to be used.



$ f (x_1, x_2, x_3)$



$g(x_1, x_2, x_3) = 0$



Because of the constraint you have $ x_1=h (x_2, x_3) $



$ f (h, x_2, x_3 ) $



$ g (h, x_2, x_3 )=0 $



Stationary points occur at $fracdfdx_1=fracdfdx_2=fracdfdx_1=0$



$0=fracdfdx_2=fracpartial fpartial x_2+fracpartial fpartial hfracpartial hpartial x_2$



From the constraint $0=fracdgdx_2=fracpartial gpartial x_2+fracpartial gpartial hfracpartial hpartial x_2$



Now using the 2 equations



$0=fracpartial fpartial x_2+lambdafracpartial gpartial x_2$



Where $lambda = -fracpartial fpartial h/fracpartial gpartial h$



For $ x_3 $ we have $0=fracpartial fpartial x_3+lambdafracpartial gpartial x_3$



Using the same trick with $ x_2 (x_1, x_3) $ you should observe that



$lambda=-fracpartial fpartial h/fracpartial gpartial h=-fracpartial fpartial x_2/fracpartial gpartial x_2=-fracpartial fpartial x_3/fracpartial gpartial x_3$



And get



$0=fracpartial fpartial x_1+lambdafracpartial gpartial x_1$



To extend the proof to finite number of variables and constraints :



$ f (x_1,.., x_n) $



Write $ x_1 (x_2,..., x_n) $ ,....,$ x_k(x_k+1,..., x_n)$



Where $ k $ is number of constraints , write out the partial derivatives at the stationary point and set to zero then you end up with equations you asked in your question






share|cite|improve this answer























  • I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
    – Richard Groenewald
    Jul 15 at 17:20














up vote
0
down vote













The Lagrange Mutiplier is based on Implicit function theorem and inverse function theorem.



I will use 3 variables and 1 constraint to demonstrate why it works, the idea can be extended to any finite number of variables and constraints.



Let's assume your functions have partial derivatives that allows application of implicit function theorem which they must if Lagrange multiplier is to be used.



$ f (x_1, x_2, x_3)$



$g(x_1, x_2, x_3) = 0$



Because of the constraint you have $ x_1=h (x_2, x_3) $



$ f (h, x_2, x_3 ) $



$ g (h, x_2, x_3 )=0 $



Stationary points occur at $fracdfdx_1=fracdfdx_2=fracdfdx_1=0$



$0=fracdfdx_2=fracpartial fpartial x_2+fracpartial fpartial hfracpartial hpartial x_2$



From the constraint $0=fracdgdx_2=fracpartial gpartial x_2+fracpartial gpartial hfracpartial hpartial x_2$



Now using the 2 equations



$0=fracpartial fpartial x_2+lambdafracpartial gpartial x_2$



Where $lambda = -fracpartial fpartial h/fracpartial gpartial h$



For $ x_3 $ we have $0=fracpartial fpartial x_3+lambdafracpartial gpartial x_3$



Using the same trick with $ x_2 (x_1, x_3) $ you should observe that



$lambda=-fracpartial fpartial h/fracpartial gpartial h=-fracpartial fpartial x_2/fracpartial gpartial x_2=-fracpartial fpartial x_3/fracpartial gpartial x_3$



And get



$0=fracpartial fpartial x_1+lambdafracpartial gpartial x_1$



To extend the proof to finite number of variables and constraints :



$ f (x_1,.., x_n) $



Write $ x_1 (x_2,..., x_n) $ ,....,$ x_k(x_k+1,..., x_n)$



Where $ k $ is number of constraints , write out the partial derivatives at the stationary point and set to zero then you end up with equations you asked in your question






share|cite|improve this answer























  • I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
    – Richard Groenewald
    Jul 15 at 17:20












up vote
0
down vote










up vote
0
down vote









The Lagrange Mutiplier is based on Implicit function theorem and inverse function theorem.



I will use 3 variables and 1 constraint to demonstrate why it works, the idea can be extended to any finite number of variables and constraints.



Let's assume your functions have partial derivatives that allows application of implicit function theorem which they must if Lagrange multiplier is to be used.



$ f (x_1, x_2, x_3)$



$g(x_1, x_2, x_3) = 0$



Because of the constraint you have $ x_1=h (x_2, x_3) $



$ f (h, x_2, x_3 ) $



$ g (h, x_2, x_3 )=0 $



Stationary points occur at $fracdfdx_1=fracdfdx_2=fracdfdx_1=0$



$0=fracdfdx_2=fracpartial fpartial x_2+fracpartial fpartial hfracpartial hpartial x_2$



From the constraint $0=fracdgdx_2=fracpartial gpartial x_2+fracpartial gpartial hfracpartial hpartial x_2$



Now using the 2 equations



$0=fracpartial fpartial x_2+lambdafracpartial gpartial x_2$



Where $lambda = -fracpartial fpartial h/fracpartial gpartial h$



For $ x_3 $ we have $0=fracpartial fpartial x_3+lambdafracpartial gpartial x_3$



Using the same trick with $ x_2 (x_1, x_3) $ you should observe that



$lambda=-fracpartial fpartial h/fracpartial gpartial h=-fracpartial fpartial x_2/fracpartial gpartial x_2=-fracpartial fpartial x_3/fracpartial gpartial x_3$



And get



$0=fracpartial fpartial x_1+lambdafracpartial gpartial x_1$



To extend the proof to finite number of variables and constraints :



$ f (x_1,.., x_n) $



Write $ x_1 (x_2,..., x_n) $ ,....,$ x_k(x_k+1,..., x_n)$



Where $ k $ is number of constraints , write out the partial derivatives at the stationary point and set to zero then you end up with equations you asked in your question






share|cite|improve this answer















The Lagrange Mutiplier is based on Implicit function theorem and inverse function theorem.



I will use 3 variables and 1 constraint to demonstrate why it works, the idea can be extended to any finite number of variables and constraints.



Let's assume your functions have partial derivatives that allows application of implicit function theorem which they must if Lagrange multiplier is to be used.



$ f (x_1, x_2, x_3)$



$g(x_1, x_2, x_3) = 0$



Because of the constraint you have $ x_1=h (x_2, x_3) $



$ f (h, x_2, x_3 ) $



$ g (h, x_2, x_3 )=0 $



Stationary points occur at $fracdfdx_1=fracdfdx_2=fracdfdx_1=0$



$0=fracdfdx_2=fracpartial fpartial x_2+fracpartial fpartial hfracpartial hpartial x_2$



From the constraint $0=fracdgdx_2=fracpartial gpartial x_2+fracpartial gpartial hfracpartial hpartial x_2$



Now using the 2 equations



$0=fracpartial fpartial x_2+lambdafracpartial gpartial x_2$



Where $lambda = -fracpartial fpartial h/fracpartial gpartial h$



For $ x_3 $ we have $0=fracpartial fpartial x_3+lambdafracpartial gpartial x_3$



Using the same trick with $ x_2 (x_1, x_3) $ you should observe that



$lambda=-fracpartial fpartial h/fracpartial gpartial h=-fracpartial fpartial x_2/fracpartial gpartial x_2=-fracpartial fpartial x_3/fracpartial gpartial x_3$



And get



$0=fracpartial fpartial x_1+lambdafracpartial gpartial x_1$



To extend the proof to finite number of variables and constraints :



$ f (x_1,.., x_n) $



Write $ x_1 (x_2,..., x_n) $ ,....,$ x_k(x_k+1,..., x_n)$



Where $ k $ is number of constraints , write out the partial derivatives at the stationary point and set to zero then you end up with equations you asked in your question







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited Jul 15 at 8:16


























answered Jul 15 at 7:22









S.t.a.l.k.e.r

719320




719320











  • I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
    – Richard Groenewald
    Jul 15 at 17:20
















  • I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
    – Richard Groenewald
    Jul 15 at 17:20















I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
– Richard Groenewald
Jul 15 at 17:20




I know how to implement the method, I was asking for an explanation of the proof. This is NOT helpful at all.
– Richard Groenewald
Jul 15 at 17:20












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2852134%2fproof-of-why-lagrangian-multipliers-work%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?

What is the equation of a 3D cone with generalised tilt?