Sparse group lasso derivation of soft thresholding operator via subgradient equations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
$$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
$$ u in _2 leq1 if B^(k)=0$$
$$ v_j in v_j if B_j^(k)=0$$
They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
$$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
$$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
and with $S(·)$ the coordinate-wise soft thresholding operator:
$$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$



I tried to derive same inequality but I ended up with these:
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$



$$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
But I couldn't figure out how to apply soft thresholding operator.



I would really appreciate help on solving my question. Thanks very much.







share|cite|improve this question























    up vote
    0
    down vote

    favorite












    In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
    $$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
    where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
    of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
    $$ u in _2 leq1 if B^(k)=0$$
    $$ v_j in v_j if B_j^(k)=0$$
    They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
    $$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
    where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
    $$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
    and with $S(·)$ the coordinate-wise soft thresholding operator:
    $$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$



    I tried to derive same inequality but I ended up with these:
    $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$



    $$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
    But I couldn't figure out how to apply soft thresholding operator.



    I would really appreciate help on solving my question. Thanks very much.







    share|cite|improve this question





















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
      $$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
      where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
      of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
      $$ u in _2 leq1 if B^(k)=0$$
      $$ v_j in v_j if B_j^(k)=0$$
      They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
      $$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
      where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
      $$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
      and with $S(·)$ the coordinate-wise soft thresholding operator:
      $$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$



      I tried to derive same inequality but I ended up with these:
      $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$



      $$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
      But I couldn't figure out how to apply soft thresholding operator.



      I would really appreciate help on solving my question. Thanks very much.







      share|cite|improve this question











      In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
      $$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
      where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
      of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
      $$ u in _2 leq1 if B^(k)=0$$
      $$ v_j in v_j if B_j^(k)=0$$
      They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
      $$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
      where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
      $$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
      and with $S(·)$ the coordinate-wise soft thresholding operator:
      $$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$



      I tried to derive same inequality but I ended up with these:
      $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$



      $$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
      But I couldn't figure out how to apply soft thresholding operator.



      I would really appreciate help on solving my question. Thanks very much.









      share|cite|improve this question










      share|cite|improve this question




      share|cite|improve this question









      asked Jul 21 at 21:15









      Pumpkin

      4261417




      4261417




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted
          +50










          You already found
          $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
          where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
          $$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
          You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.



          When (2) is larger than $alphalambda$, the $j^th$ component equals:
          $$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
          When (2) is smaller than $alphalambda$, the $j^th$ component equals:
          $$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
          When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
          $$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
          Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).






          share|cite|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "69"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858886%2fsparse-group-lasso-derivation-of-soft-thresholding-operator-via-subgradient-equa%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted
            +50










            You already found
            $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
            where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
            $$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
            You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.



            When (2) is larger than $alphalambda$, the $j^th$ component equals:
            $$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
            When (2) is smaller than $alphalambda$, the $j^th$ component equals:
            $$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
            When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
            $$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
            Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).






            share|cite|improve this answer

























              up vote
              1
              down vote



              accepted
              +50










              You already found
              $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
              where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
              $$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
              You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.



              When (2) is larger than $alphalambda$, the $j^th$ component equals:
              $$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
              When (2) is smaller than $alphalambda$, the $j^th$ component equals:
              $$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
              When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
              $$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
              Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).






              share|cite|improve this answer























                up vote
                1
                down vote



                accepted
                +50







                up vote
                1
                down vote



                accepted
                +50




                +50




                You already found
                $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
                where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
                $$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
                You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.



                When (2) is larger than $alphalambda$, the $j^th$ component equals:
                $$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
                When (2) is smaller than $alphalambda$, the $j^th$ component equals:
                $$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
                When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
                $$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
                Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).






                share|cite|improve this answer













                You already found
                $$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
                where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
                $$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
                You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.



                When (2) is larger than $alphalambda$, the $j^th$ component equals:
                $$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
                When (2) is smaller than $alphalambda$, the $j^th$ component equals:
                $$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
                When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
                $$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
                Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).







                share|cite|improve this answer













                share|cite|improve this answer



                share|cite|improve this answer











                answered Jul 31 at 12:42









                LinAlg

                5,3911319




                5,3911319






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858886%2fsparse-group-lasso-derivation-of-soft-thresholding-operator-via-subgradient-equa%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What is the equation of a 3D cone with generalised tilt?

                    Color the edges and diagonals of a regular polygon

                    Relationship between determinant of matrix and determinant of adjoint?