Why does the projected gradient method work?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Consider the problem
beginalign*
min_x in mathbbR^n &quad f(x) \
s.t.: &quad x in C,
endalign*
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x in mathbbR^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_t+1 = P_C[x_t - eta nabla f(x_t)]$.
It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?
"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"
As a bonus question, when there are more than one convex restrictions (like $x in C$ and $x in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $Ccap D$, if $C cap D ne emptyset$. But that limit is not necessarily the projection of the initial point onto $C cap D$, so why alternated projections can work with gradient descent?
convex-optimization numerical-optimization gradient-descent projection
add a comment |Â
up vote
0
down vote
favorite
Consider the problem
beginalign*
min_x in mathbbR^n &quad f(x) \
s.t.: &quad x in C,
endalign*
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x in mathbbR^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_t+1 = P_C[x_t - eta nabla f(x_t)]$.
It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?
"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"
As a bonus question, when there are more than one convex restrictions (like $x in C$ and $x in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $Ccap D$, if $C cap D ne emptyset$. But that limit is not necessarily the projection of the initial point onto $C cap D$, so why alternated projections can work with gradient descent?
convex-optimization numerical-optimization gradient-descent projection
1
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Consider the problem
beginalign*
min_x in mathbbR^n &quad f(x) \
s.t.: &quad x in C,
endalign*
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x in mathbbR^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_t+1 = P_C[x_t - eta nabla f(x_t)]$.
It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?
"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"
As a bonus question, when there are more than one convex restrictions (like $x in C$ and $x in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $Ccap D$, if $C cap D ne emptyset$. But that limit is not necessarily the projection of the initial point onto $C cap D$, so why alternated projections can work with gradient descent?
convex-optimization numerical-optimization gradient-descent projection
Consider the problem
beginalign*
min_x in mathbbR^n &quad f(x) \
s.t.: &quad x in C,
endalign*
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x in mathbbR^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_t+1 = P_C[x_t - eta nabla f(x_t)]$.
It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?
"Exists $varepsilon > 0$ so that for $0 < delta < varepsilon$, we have $f(P_C[x - deltanabla f(x)]) le f(x)$"
As a bonus question, when there are more than one convex restrictions (like $x in C$ and $x in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $Ccap D$, if $C cap D ne emptyset$. But that limit is not necessarily the projection of the initial point onto $C cap D$, so why alternated projections can work with gradient descent?
convex-optimization numerical-optimization gradient-descent projection
edited Jul 19 at 16:22
Rodrigo de Azevedo
12.6k41751
12.6k41751
asked Jul 19 at 12:14
gtf
205
205
1
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56
add a comment |Â
1
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56
1
1
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.
Heuristic arguments
- While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).
$qquadqquadqquadqquadqquadqquadquad$
- I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$
$qquadqquadqquadqquadqquadqquad$
- We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.
$qquadqquadqquadqquadqquadqquad$
- It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.
add a comment |Â
up vote
0
down vote
There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.
In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.
It is clear when you use the Proximal Gradient Method (Framework).
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.
Heuristic arguments
- While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).
$qquadqquadqquadqquadqquadqquadquad$
- I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$
$qquadqquadqquadqquadqquadqquad$
- We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.
$qquadqquadqquadqquadqquadqquad$
- It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.
add a comment |Â
up vote
2
down vote
accepted
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.
Heuristic arguments
- While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).
$qquadqquadqquadqquadqquadqquadquad$
- I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$
$qquadqquadqquadqquadqquadqquad$
- We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.
$qquadqquadqquadqquadqquadqquad$
- It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.
Heuristic arguments
- While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).
$qquadqquadqquadqquadqquadqquadquad$
- I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$
$qquadqquadqquadqquadqquadqquad$
- We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.
$qquadqquadqquadqquadqquadqquad$
- It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I concentrate on the first part of your question for now.
Heuristic arguments
- While going into the direction of the negative gradient $-nabla f(x)$ leads to descent, this is not the only direction with this property. Actually, for every direction $dinBbb R^n$ with $def<langledef>rangle<d,nabla f(x)> < 0$ there is some $epsilon > 0$ so that
$$f(x+eta d)< f(x),qquadtextfor all $eta<epsilon$.$$
However, if $<d,nabla f(x)>ge 0$, then this direction leads to ascent (or at least no descent).
$qquadqquadqquadqquadqquadqquadquad$
- I assume by projection you mean
$$P_C[x] := mathopmathrmargminlimits_cin C d(x,c).$$
There is some insightful way to find this projection: every $xinBbb R^n$ can be written as $x=c+v$ in a unique way, where $cin C$ and $vin N_C(c)$ is from the normal cone (also called polar cone) of $C$ at $c$. We then have $P_C[x]=c$. Or to put it another way:
$$P_C[x]=cquadLongleftrightarrow quad x-cin N_C(c).$$
$qquadqquadqquadqquadqquadqquad$
- We now show that if $x_t+1=P_C[x_t-eta nabla f(x_t)]$ is different from the current point $x_t$, then it lies in the direction of descent. In other words: $x_t+1-x_t$ is a direction of descent, or
$$<x_t+1-x_t,nabla f(x_t)>< 0.$$
This is very evident from a picture, but it's a bit messy to prove. Also, this does not show that $f(x_t+1)<f(x_t)$, but it makes it plausible that this will happen if $eta$ is sufficiently small. So let's assume the opposite, that $x_t+1$ lies in the direction of ascent:
$$<x_t+1-x_t,nabla f(x_t)>ge 0qquadimpliesqquad (*);; <x_t-x_t+1,colorlightgrayundersetbeginarraycuparrow\[-0.5ex] llaptextadding a $eta>0$ does notrlaptext hurt the inequalityendarraycolorblacketanabla f(x_t)>le 0.$$
Because the unconstrained gradient descent step $colorredx_t - etanabla f(x)$ projects to $x_t+1$, it can be written as $colorbluex_t+1+v$ for some $vin N_C(x_t+1)$. Since $v$ is in the normal cone, this gives
$$<c-x_t+1,v>le 0,;textfor all $cin C$qquadimpliesqquad (**);; <x_t-x_t+1,v>le 0.$$
By adding $(*)$ and $(**)$ we obtain
$$(times);; <x_t-x_t+1,v+etanabla f(x_t)> le 0.$$
But actually both factors of this inner product are equal, as evident from
$$colorredx_t-etanabla f(x_t) = colorbluex_t+1+vqquadimpliesqquad x_t-x_t+1 = v+etanabla f(x_t).$$
So $(times)$ gives us $|x_t-x_t+1|^2le 0implies x_t=x_t+1$. Thus, the only way to not make a move in a descent direction is to not move at all.
$qquadqquadqquadqquadqquadqquad$
- It remains to show that not moving only happens if there is no descent direction available, hence that we are in a local minimum. If $x_t-etanabla f(x)$ projects to $x_t$ again, then this means
beginalign
x_t-etanabla f(x)-x_tin N_C(x_t) quad
&impliesquad -nabla f(x)in N_C(x_t) \
&impliesquad <c-x_t,nabla f(x_t)> ge 0,; textfor all $cin C$.
endalign
The last line means that moving in the direction of any point $cin C$ (and in a convex set these are all available directions) will lead to ascent, hence that we are indeed in a local minimum.
edited Jul 20 at 13:16
answered Jul 20 at 11:23
M. Winter
17.7k62764
17.7k62764
add a comment |Â
add a comment |Â
up vote
0
down vote
There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.
In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.
It is clear when you use the Proximal Gradient Method (Framework).
add a comment |Â
up vote
0
down vote
There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.
In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.
It is clear when you use the Proximal Gradient Method (Framework).
add a comment |Â
up vote
0
down vote
up vote
0
down vote
There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.
In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.
It is clear when you use the Proximal Gradient Method (Framework).
There is a formal way to prove it you can look at many Convex Optimization course notes to see the full proof.
In my intuition the reason it works is that the Projected Gradient Descent is actually alternating projection.
It is clear when you use the Proximal Gradient Method (Framework).
answered Jul 19 at 14:30
Royi
2,99512046
2,99512046
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2856559%2fwhy-does-the-projected-gradient-method-work%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
The projected gradient method is a special case of the proximal gradient method, and you can find a convergence proof for the proximal gradient method in many places, for example in Vandenberghe's UCLA 236c course notes.
– littleO
Jul 19 at 14:41
Do you have an example for where the repeated projection does not converge to $P_Ccap D[x]$?
– M. Winter
Jul 20 at 13:19
I think that an example can be a square inside a circle in $mathbbR^2$. Supposing that the square edges are aligned with the axis, the projection of every point onto the square is given by intersecting a vertical or horizontal line that contains the point with the square, if they intersect, or a diagonal point, if they don't intersect. But the projection on the circle may not fall on the same vertical or horizontal line, and the intersection of both is the square.
– gtf
Jul 20 at 13:56