Sparse group lasso derivation of soft thresholding operator via subgradient equations
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
$$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
$$ u in _2 leq1 if B^(k)=0$$
$$ v_j in v_j if B_j^(k)=0$$
They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
$$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
$$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
and with $S(·)$ the coordinate-wise soft thresholding operator:
$$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$
I tried to derive same inequality but I ended up with these:
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$
$$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
But I couldn't figure out how to apply soft thresholding operator.
I would really appreciate help on solving my question. Thanks very much.
convex-optimization linear-regression subgradient
add a comment |Â
up vote
0
down vote
favorite
In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
$$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
$$ u in _2 leq1 if B^(k)=0$$
$$ v_j in v_j if B_j^(k)=0$$
They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
$$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
$$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
and with $S(·)$ the coordinate-wise soft thresholding operator:
$$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$
I tried to derive same inequality but I ended up with these:
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$
$$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
But I couldn't figure out how to apply soft thresholding operator.
I would really appreciate help on solving my question. Thanks very much.
convex-optimization linear-regression subgradient
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
$$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
$$ u in _2 leq1 if B^(k)=0$$
$$ v_j in v_j if B_j^(k)=0$$
They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
$$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
$$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
and with $S(·)$ the coordinate-wise soft thresholding operator:
$$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$
I tried to derive same inequality but I ended up with these:
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$
$$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
But I couldn't figure out how to apply soft thresholding operator.
I would really appreciate help on solving my question. Thanks very much.
convex-optimization linear-regression subgradient
In the sparse group lasso paper SGLpaper, they derive a condition for $beta=0$ from the equation:
$$ frac1n X^(k)^T(y-sum_l=1^m X^(l) beta^(l)) = (1-alpha)lambda u+ alpha lambda v$$
where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^(l)$ is the submatrix of design matrix with each $X^(k)$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $beta$ is the coefficients to be estimated, $alpha$ and $lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients
of $||beta^(k) ||_2$ and $||beta^(k) ||_1$ respectively evaluated at $beta^(k)$.
$$ u in _2 leq1 if B^(k)=0$$
$$ v_j in v_j if B_j^(k)=0$$
They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^(k)=0$ if:
$$ ||S(X^(k)^T r_(-k)/n, alpha lambda)||_2 leq (1-alpha)lambda$$
where $r_(-k)$ the partial residual of y, subtracting all group fits other than group k:
$$r_(-k)=y-sum_l neq k X^(l) beta^(l)$$
and with $S(·)$ the coordinate-wise soft thresholding operator:
$$(S(z, alpha lambda))_j = sign(z_j)(|z_j| − alpha lambda)_+ .$$
I tried to derive same inequality but I ended up with these:
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda $$
$$|| |frac1n X^(k)^T r_(-k)| - alpha lambda ||_2 leq (1-alpha)lambda $$
But I couldn't figure out how to apply soft thresholding operator.
I would really appreciate help on solving my question. Thanks very much.
convex-optimization linear-regression subgradient
asked Jul 21 at 21:15


Pumpkin
4261417
4261417
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You already found
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
$$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.
When (2) is larger than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
When (2) is smaller than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
$$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You already found
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
$$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.
When (2) is larger than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
When (2) is smaller than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
$$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).
add a comment |Â
up vote
1
down vote
accepted
You already found
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
$$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.
When (2) is larger than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
When (2) is smaller than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
$$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You already found
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
$$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.
When (2) is larger than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
When (2) is smaller than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
$$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).
You already found
$$||frac1n X^(k)^T r_(-k) - alpha lambda [-1,1]||_2 leq (1-alpha)lambda, qquad (1) $$
where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^th$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on
$$frac1nleft(X^(k)^T r_(-k)right)_j quad (2)$$
You select $1$ (from the interval $[-1,1]$ when (2) is larger than $alphalambda$, $-1$ when (2) is smaller than $-alphalambda$, and (2) divided by $alphalambda$ when (2) is in $[-alphalambda,alphalambda]$. The last displayed formula in your question does not take the last case into account, while the $(cdot)_+$ operator does.
When (2) is larger than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j-alphalambda.$$
When (2) is smaller than $alphalambda$, the $j^th$ component equals:
$$frac1nleft(X^(k)^T r_(-k)right)_j+alphalambda.$$
When (2) is in $[-alphalambda,alphalambda]$, the $j^th$ component equals $0$. For each of these cases, it is easy to see that they equal
$$textsignleft(frac1nleft(X^(k)^T r_(-k)right)_jright)left(left|frac1nleft(X^(k)^T r_(-k)right)_jright|- alpha lambdaright)_+.$$
Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).
answered Jul 31 at 12:42
LinAlg
5,3911319
5,3911319
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858886%2fsparse-group-lasso-derivation-of-soft-thresholding-operator-via-subgradient-equa%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password