Study the convexity of Mean Squared Error with regularization
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.
$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$
$E_d$ is the error for a pattern $d$, defined as
$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$
where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.
convexity study
Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...
- $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.
- The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.
- The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...
for example, at the first layer of the neural network it is computed
$$g(x) = f(W cdot x + b)$$
where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.
Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:
- if $f := identity$ then $g$ is convex on $R$
- if $f := ReLU$, then $g$ is convex on $R$
- if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$
- if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$
I don't believe this analysis is sufficient, how can I improve it?
convex-analysis convex-optimization neural-networks non-convex-optimization
add a comment |Â
up vote
0
down vote
favorite
I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.
$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$
$E_d$ is the error for a pattern $d$, defined as
$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$
where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.
convexity study
Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...
- $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.
- The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.
- The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...
for example, at the first layer of the neural network it is computed
$$g(x) = f(W cdot x + b)$$
where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.
Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:
- if $f := identity$ then $g$ is convex on $R$
- if $f := ReLU$, then $g$ is convex on $R$
- if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$
- if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$
I don't believe this analysis is sufficient, how can I improve it?
convex-analysis convex-optimization neural-networks non-convex-optimization
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.
$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$
$E_d$ is the error for a pattern $d$, defined as
$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$
where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.
convexity study
Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...
- $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.
- The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.
- The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...
for example, at the first layer of the neural network it is computed
$$g(x) = f(W cdot x + b)$$
where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.
Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:
- if $f := identity$ then $g$ is convex on $R$
- if $f := ReLU$, then $g$ is convex on $R$
- if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$
- if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$
I don't believe this analysis is sufficient, how can I improve it?
convex-analysis convex-optimization neural-networks non-convex-optimization
I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.
$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$
$E_d$ is the error for a pattern $d$, defined as
$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$
where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.
convexity study
Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...
- $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.
- The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.
- The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...
for example, at the first layer of the neural network it is computed
$$g(x) = f(W cdot x + b)$$
where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.
Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:
- if $f := identity$ then $g$ is convex on $R$
- if $f := ReLU$, then $g$ is convex on $R$
- if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$
- if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$
I don't believe this analysis is sufficient, how can I improve it?
convex-analysis convex-optimization neural-networks non-convex-optimization
asked Jul 22 at 10:35
ThanksBye
254
254
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54
add a comment |Â
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Here's at least a partial answer to what you're asking, I think.
You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.
The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by
beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*
However, we can plainly see that this function is not convex:
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Here's at least a partial answer to what you're asking, I think.
You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.
The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by
beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*
However, we can plainly see that this function is not convex:
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
add a comment |Â
up vote
1
down vote
Here's at least a partial answer to what you're asking, I think.
You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.
The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by
beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*
However, we can plainly see that this function is not convex:
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Here's at least a partial answer to what you're asking, I think.
You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.
The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by
beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*
However, we can plainly see that this function is not convex:
Here's at least a partial answer to what you're asking, I think.
You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.
The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by
beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*
However, we can plainly see that this function is not convex:
answered Jul 23 at 17:42
David M.
1,314318
1,314318
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
add a comment |Â
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859270%2fstudy-the-convexity-of-mean-squared-error-with-regularization%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35
The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54