Why does the projected gradient method work?

up vote
0
down vote

favorite

Consider the problem
beginalign*
min_x in mathbbR^n &quad f(x) \
s.t.: &quad x in C,
endalign*
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x in mathbbR^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_t+1 = P_C[x_t - eta nabla f(x_t)]$.

It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?

"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"

As a bonus question, when there are more than one convex restrictions (like $x in C$ and $x in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $Ccap D$, if $C cap D ne emptyset$. But that limit is not necessarily the projection of the initial point onto $C cap D$, so why alternated projections can work with gradient descent?

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

asked Jul 19 at 12:14

gtf

205

1

The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
â€“Â littleO
Jul 19 at 14:41

Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
â€“Â M. Winter
Jul 20 at 13:19

I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
â€“Â gtf
Jul 20 at 13:56

add a commentÂ |Â

up vote
0
down vote

favorite

It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?

"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

asked Jul 19 at 12:14

gtf

205

1

The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
â€“Â littleO
Jul 19 at 14:41

Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
â€“Â M. Winter
Jul 20 at 13:19

I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
â€“Â gtf
Jul 20 at 13:56

add a commentÂ |Â

up vote
0
down vote

favorite

It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?

"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

asked Jul 19 at 12:14

gtf

205

It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?

"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

asked Jul 19 at 12:14

gtf

205

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

edited Jul 19 at 16:22

Rodrigo de Azevedo

12.6k41751

asked Jul 19 at 12:14

gtf

205

asked Jul 19 at 12:14

gtf

205

asked Jul 19 at 12:14

gtf

205

1

The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
â€“Â littleO
Jul 19 at 14:41

Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
â€“Â M. Winter
Jul 20 at 13:19

I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
â€“Â gtf
Jul 20 at 13:56

add a commentÂ |Â

1

The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
â€“Â littleO
Jul 19 at 14:41

Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
â€“Â M. Winter
Jul 20 at 13:19

I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
â€“Â gtf
Jul 20 at 13:56

The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
â€“Â littleO
Jul 19 at 14:41

Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
â€“Â M. Winter
Jul 20 at 13:19

I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
â€“Â gtf
Jul 20 at 13:56

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.

Heuristic arguments

While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).

$qquadqquadqquadqquadqquadqquadquad$

I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$

$qquadqquadqquadqquadqquadqquad$

We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.

$qquadqquadqquadqquadqquadqquad$

It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

add a commentÂ |Â

up vote
0
down vote

There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.

In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.

It is clear when you use the Proximal Gradient Method (Framework).

answered Jul 19 at 14:30

Royi

2,99512046

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2856559%2fwhy-does-the-projected-gradient-method-work%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Heuristic arguments

While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).

$qquadqquadqquadqquadqquadqquadquad$

I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$

$qquadqquadqquadqquadqquadqquad$

We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.

$qquadqquadqquadqquadqquadqquad$

It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

add a commentÂ |Â

up vote
2
down vote

accepted

Heuristic arguments

While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).

$qquadqquadqquadqquadqquadqquadquad$

I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$

$qquadqquadqquadqquadqquadqquad$

We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.

$qquadqquadqquadqquadqquadqquad$

It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

add a commentÂ |Â

up vote
2
down vote

accepted

Heuristic arguments

While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).

$qquadqquadqquadqquadqquadqquadquad$

I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$

$qquadqquadqquadqquadqquadqquad$

We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.

$qquadqquadqquadqquadqquadqquad$

It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

Heuristic arguments

While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).

$qquadqquadqquadqquadqquadqquadquad$

I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$

$qquadqquadqquadqquadqquadqquad$

We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.

$qquadqquadqquadqquadqquadqquad$

It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

edited Jul 20 at 13:16

answered Jul 20 at 11:23

M. Winter

17.7k62764

answered Jul 20 at 11:23

M. Winter

17.7k62764

answered Jul 20 at 11:23

M. Winter

17.7k62764

add a commentÂ |Â

up vote
0
down vote

There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.

In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.

It is clear when you use the Proximal Gradient Method (Framework).

answered Jul 19 at 14:30

Royi

2,99512046

add a commentÂ |Â

up vote
0
down vote

There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.

In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.

It is clear when you use the Proximal Gradient Method (Framework).

answered Jul 19 at 14:30

Royi

2,99512046

add a commentÂ |Â

up vote
0
down vote

There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.

In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.

It is clear when you use the Proximal Gradient Method (Framework).

answered Jul 19 at 14:30

Royi

2,99512046

There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.

In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.

It is clear when you use the Proximal Gradient Method (Framework).

answered Jul 19 at 14:30

Royi

2,99512046

answered Jul 19 at 14:30

Royi

2,99512046

answered Jul 19 at 14:30

Royi

2,99512046

answered Jul 19 at 14:30

Royi

2,99512046

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik