About the feasibility of a subgradient step

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












A vector $v in mathbbR^d$ is a subgradient of a function $fcolon mathbbR^d to mathbbR$ in a point $x in mathbbR^d$ if, for every $y in mathbbR^d$,



$$f(y) ge f(x) + langle x - y, v rangle. $$



It can be proved that every convex function has subgradients in every point of its domain. The subgradient descent method is a method for minimizing non differentiable convex functions that consists in, staring in $x_0 in mathbbR^d$, for each $t$, taking a subgradient $v$ of $f$ at $x_t$ , and updating $x$ by the rule $x_t+1 = x_t - eta v$.



I want to prove that for $eta$ small enough, the subgradient update decreases the value of $f$. I don't want to prove anything about convergence or convergence rates, I just want to see that if $0$ is not a subgradient of $f$ at $x$ and $v$ is a subgradient at the same point, then exists $varepsilon > 0$ so that $f(x - eta v) < f(x)$ for every $0 < eta < varepsilon$ (always considering $f$ is convex).



First of all, I thinked how can this be proved for the gradient descent method for differentiable functions in the general case. This can be done by taking the first order taylor approximation of $f$, in the points $x$ and $x - eta nabla f(x)$. From this is possible to bound the Lagrange remainder, for $eta$ small enough, by the gradient norm, and finally obtaining that $f(x - etanabla f(x)) - f(x) < 0$, obtaining the result.



After that I tried thinking about convex differentiable functions. In this case, we have that $f(x) ge f(y) + langle x - y, nabla f(y) rangle$, for every $x,y$ (in fact, in this case the gradient is the only subgradient of $f$ at every point). I was trying to prove the result I want from this hypothesis, to see then that it doesn't depend on the gradient, and any vector verifying the hypothesis (that is, the subgradients) would produce the result I want. But the only thing I could get from that expression is that
$$- eta |nabla f(x)|^2le f(x - eta nabla f(x)) - f(x) le eta |nabla f(x)|^2, $$
which doesn't provide information enough to achieve the result. Any ideas?







share|cite|improve this question























    up vote
    0
    down vote

    favorite












    A vector $v in mathbbR^d$ is a subgradient of a function $fcolon mathbbR^d to mathbbR$ in a point $x in mathbbR^d$ if, for every $y in mathbbR^d$,



    $$f(y) ge f(x) + langle x - y, v rangle. $$



    It can be proved that every convex function has subgradients in every point of its domain. The subgradient descent method is a method for minimizing non differentiable convex functions that consists in, staring in $x_0 in mathbbR^d$, for each $t$, taking a subgradient $v$ of $f$ at $x_t$ , and updating $x$ by the rule $x_t+1 = x_t - eta v$.



    I want to prove that for $eta$ small enough, the subgradient update decreases the value of $f$. I don't want to prove anything about convergence or convergence rates, I just want to see that if $0$ is not a subgradient of $f$ at $x$ and $v$ is a subgradient at the same point, then exists $varepsilon > 0$ so that $f(x - eta v) < f(x)$ for every $0 < eta < varepsilon$ (always considering $f$ is convex).



    First of all, I thinked how can this be proved for the gradient descent method for differentiable functions in the general case. This can be done by taking the first order taylor approximation of $f$, in the points $x$ and $x - eta nabla f(x)$. From this is possible to bound the Lagrange remainder, for $eta$ small enough, by the gradient norm, and finally obtaining that $f(x - etanabla f(x)) - f(x) < 0$, obtaining the result.



    After that I tried thinking about convex differentiable functions. In this case, we have that $f(x) ge f(y) + langle x - y, nabla f(y) rangle$, for every $x,y$ (in fact, in this case the gradient is the only subgradient of $f$ at every point). I was trying to prove the result I want from this hypothesis, to see then that it doesn't depend on the gradient, and any vector verifying the hypothesis (that is, the subgradients) would produce the result I want. But the only thing I could get from that expression is that
    $$- eta |nabla f(x)|^2le f(x - eta nabla f(x)) - f(x) le eta |nabla f(x)|^2, $$
    which doesn't provide information enough to achieve the result. Any ideas?







    share|cite|improve this question





















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      A vector $v in mathbbR^d$ is a subgradient of a function $fcolon mathbbR^d to mathbbR$ in a point $x in mathbbR^d$ if, for every $y in mathbbR^d$,



      $$f(y) ge f(x) + langle x - y, v rangle. $$



      It can be proved that every convex function has subgradients in every point of its domain. The subgradient descent method is a method for minimizing non differentiable convex functions that consists in, staring in $x_0 in mathbbR^d$, for each $t$, taking a subgradient $v$ of $f$ at $x_t$ , and updating $x$ by the rule $x_t+1 = x_t - eta v$.



      I want to prove that for $eta$ small enough, the subgradient update decreases the value of $f$. I don't want to prove anything about convergence or convergence rates, I just want to see that if $0$ is not a subgradient of $f$ at $x$ and $v$ is a subgradient at the same point, then exists $varepsilon > 0$ so that $f(x - eta v) < f(x)$ for every $0 < eta < varepsilon$ (always considering $f$ is convex).



      First of all, I thinked how can this be proved for the gradient descent method for differentiable functions in the general case. This can be done by taking the first order taylor approximation of $f$, in the points $x$ and $x - eta nabla f(x)$. From this is possible to bound the Lagrange remainder, for $eta$ small enough, by the gradient norm, and finally obtaining that $f(x - etanabla f(x)) - f(x) < 0$, obtaining the result.



      After that I tried thinking about convex differentiable functions. In this case, we have that $f(x) ge f(y) + langle x - y, nabla f(y) rangle$, for every $x,y$ (in fact, in this case the gradient is the only subgradient of $f$ at every point). I was trying to prove the result I want from this hypothesis, to see then that it doesn't depend on the gradient, and any vector verifying the hypothesis (that is, the subgradients) would produce the result I want. But the only thing I could get from that expression is that
      $$- eta |nabla f(x)|^2le f(x - eta nabla f(x)) - f(x) le eta |nabla f(x)|^2, $$
      which doesn't provide information enough to achieve the result. Any ideas?







      share|cite|improve this question











      A vector $v in mathbbR^d$ is a subgradient of a function $fcolon mathbbR^d to mathbbR$ in a point $x in mathbbR^d$ if, for every $y in mathbbR^d$,



      $$f(y) ge f(x) + langle x - y, v rangle. $$



      It can be proved that every convex function has subgradients in every point of its domain. The subgradient descent method is a method for minimizing non differentiable convex functions that consists in, staring in $x_0 in mathbbR^d$, for each $t$, taking a subgradient $v$ of $f$ at $x_t$ , and updating $x$ by the rule $x_t+1 = x_t - eta v$.



      I want to prove that for $eta$ small enough, the subgradient update decreases the value of $f$. I don't want to prove anything about convergence or convergence rates, I just want to see that if $0$ is not a subgradient of $f$ at $x$ and $v$ is a subgradient at the same point, then exists $varepsilon > 0$ so that $f(x - eta v) < f(x)$ for every $0 < eta < varepsilon$ (always considering $f$ is convex).



      First of all, I thinked how can this be proved for the gradient descent method for differentiable functions in the general case. This can be done by taking the first order taylor approximation of $f$, in the points $x$ and $x - eta nabla f(x)$. From this is possible to bound the Lagrange remainder, for $eta$ small enough, by the gradient norm, and finally obtaining that $f(x - etanabla f(x)) - f(x) < 0$, obtaining the result.



      After that I tried thinking about convex differentiable functions. In this case, we have that $f(x) ge f(y) + langle x - y, nabla f(y) rangle$, for every $x,y$ (in fact, in this case the gradient is the only subgradient of $f$ at every point). I was trying to prove the result I want from this hypothesis, to see then that it doesn't depend on the gradient, and any vector verifying the hypothesis (that is, the subgradients) would produce the result I want. But the only thing I could get from that expression is that
      $$- eta |nabla f(x)|^2le f(x - eta nabla f(x)) - f(x) le eta |nabla f(x)|^2, $$
      which doesn't provide information enough to achieve the result. Any ideas?









      share|cite|improve this question










      share|cite|improve this question




      share|cite|improve this question









      asked Jul 20 at 15:11









      gtf

      205




      205

























          active

          oldest

          votes











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2857730%2fabout-the-feasibility-of-a-subgradient-step%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes










           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2857730%2fabout-the-feasibility-of-a-subgradient-step%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What is the equation of a 3D cone with generalised tilt?

          Color the edges and diagonals of a regular polygon

          Relationship between determinant of matrix and determinant of adjoint?