Study the convexity of Mean Squared Error with regularization

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.



$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$



$E_d$ is the error for a pattern $d$, defined as



$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$



where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.



convexity study



Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...



  • $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

  • The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

  • The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed



$$g(x) = f(W cdot x + b)$$



where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.



Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:



  • if $f := identity$ then $g$ is convex on $R$

  • if $f := ReLU$, then $g$ is convex on $R$

  • if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

  • if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?







share|cite|improve this question



















  • It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
    – David M.
    Jul 22 at 16:35










  • The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
    – ThanksBye
    Jul 23 at 16:54














up vote
0
down vote

favorite












I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.



$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$



$E_d$ is the error for a pattern $d$, defined as



$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$



where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.



convexity study



Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...



  • $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

  • The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

  • The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed



$$g(x) = f(W cdot x + b)$$



where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.



Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:



  • if $f := identity$ then $g$ is convex on $R$

  • if $f := ReLU$, then $g$ is convex on $R$

  • if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

  • if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?







share|cite|improve this question



















  • It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
    – David M.
    Jul 22 at 16:35










  • The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
    – ThanksBye
    Jul 23 at 16:54












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.



$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$



$E_d$ is the error for a pattern $d$, defined as



$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$



where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.



convexity study



Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...



  • $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

  • The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

  • The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed



$$g(x) = f(W cdot x + b)$$



where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.



Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:



  • if $f := identity$ then $g$ is convex on $R$

  • if $f := ReLU$, then $g$ is convex on $R$

  • if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

  • if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?







share|cite|improve this question











I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.



$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$



$E_d$ is the error for a pattern $d$, defined as



$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$



where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.



convexity study



Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...



  • $E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

  • The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

  • The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed



$$g(x) = f(W cdot x + b)$$



where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.



Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:



  • if $f := identity$ then $g$ is convex on $R$

  • if $f := ReLU$, then $g$ is convex on $R$

  • if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

  • if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 22 at 10:35









ThanksBye

254




254











  • It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
    – David M.
    Jul 22 at 16:35










  • The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
    – ThanksBye
    Jul 23 at 16:54
















  • It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
    – David M.
    Jul 22 at 16:35










  • The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
    – ThanksBye
    Jul 23 at 16:54















It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35




It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
– David M.
Jul 22 at 16:35












The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54




The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
– ThanksBye
Jul 23 at 16:54










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Here's at least a partial answer to what you're asking, I think.



You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.



The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by



beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*



However, we can plainly see that this function is not convex:



Non-convex error surface for simple neural network






share|cite|improve this answer





















  • This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
    – ThanksBye
    Aug 1 at 14:53










  • maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
    – ThanksBye
    Aug 1 at 14:53











  • @ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
    – David M.
    Aug 1 at 15:03










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859270%2fstudy-the-convexity-of-mean-squared-error-with-regularization%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













Here's at least a partial answer to what you're asking, I think.



You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.



The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by



beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*



However, we can plainly see that this function is not convex:



Non-convex error surface for simple neural network






share|cite|improve this answer





















  • This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
    – ThanksBye
    Aug 1 at 14:53










  • maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
    – ThanksBye
    Aug 1 at 14:53











  • @ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
    – David M.
    Aug 1 at 15:03














up vote
1
down vote













Here's at least a partial answer to what you're asking, I think.



You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.



The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by



beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*



However, we can plainly see that this function is not convex:



Non-convex error surface for simple neural network






share|cite|improve this answer





















  • This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
    – ThanksBye
    Aug 1 at 14:53










  • maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
    – ThanksBye
    Aug 1 at 14:53











  • @ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
    – David M.
    Aug 1 at 15:03












up vote
1
down vote










up vote
1
down vote









Here's at least a partial answer to what you're asking, I think.



You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.



The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by



beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*



However, we can plainly see that this function is not convex:



Non-convex error surface for simple neural network






share|cite|improve this answer













Here's at least a partial answer to what you're asking, I think.



You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.



The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by



beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*



However, we can plainly see that this function is not convex:



Non-convex error surface for simple neural network







share|cite|improve this answer













share|cite|improve this answer



share|cite|improve this answer











answered Jul 23 at 17:42









David M.

1,314318




1,314318











  • This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
    – ThanksBye
    Aug 1 at 14:53










  • maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
    – ThanksBye
    Aug 1 at 14:53











  • @ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
    – David M.
    Aug 1 at 15:03
















  • This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
    – ThanksBye
    Aug 1 at 14:53










  • maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
    – ThanksBye
    Aug 1 at 14:53











  • @ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
    – David M.
    Aug 1 at 15:03















This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53




This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
– ThanksBye
Aug 1 at 14:53












maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53





maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
– ThanksBye
Aug 1 at 14:53













@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03




@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
– David M.
Aug 1 at 15:03












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859270%2fstudy-the-convexity-of-mean-squared-error-with-regularization%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?