Study the convexity of Mean Squared Error with regularization

up vote
0
down vote

favorite

I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.

$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$

$E_d$ is the error for a pattern $d$, defined as

$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$

where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.

convexity study

Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...

$E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed

$$g(x) = f(W cdot x + b)$$

where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.

Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:

if $f := identity$ then $g$ is convex on $R$

if $f := ReLU$, then $g$ is convex on $R$

if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?

asked Jul 22 at 10:35

ThanksBye

254

It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
â€“Â David M.
Jul 22 at 16:35

The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
â€“Â ThanksBye
Jul 23 at 16:54

add a commentÂ |Â

up vote
0
down vote

favorite

I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.

$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$

$E_d$ is the error for a pattern $d$, defined as

$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$

where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.

convexity study

Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...

$E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed

$$g(x) = f(W cdot x + b)$$

where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.

Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:

if $f := identity$ then $g$ is convex on $R$

if $f := ReLU$, then $g$ is convex on $R$

if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?

asked Jul 22 at 10:35

ThanksBye

254

It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
â€“Â David M.
Jul 22 at 16:35

The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
â€“Â ThanksBye
Jul 23 at 16:54

add a commentÂ |Â

up vote
0
down vote

favorite

I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.

$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$

$E_d$ is the error for a pattern $d$, defined as

$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$

where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.

convexity study

Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...

$E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed

$$g(x) = f(W cdot x + b)$$

where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.

Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:

if $f := identity$ then $g$ is convex on $R$

if $f := ReLU$, then $g$ is convex on $R$

if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?

asked Jul 22 at 10:35

ThanksBye

254

I want to study the convexity of the Mean Squared Error with regularization loss function. I am using an artificial neural network to compute the output.

$$E(w) = MSE(w) = frac1mid D midsum_d in D E_d(w)$$

$E_d$ is the error for a pattern $d$, defined as

$$E_d(w)=frac12sum_j in outputLayer^ (t_j-o_j(w))^2 + frac12lambda||w||^2$$

where $t_j$ and $o_j$ are respectively the target and the output of the $j^th$ output neuron of the network. $w$ are the parameters. $lambda$
is the regularization term.

convexity study

Intuitively I can say that the function is neither convex nor concave, since there are several local minima. In detail...

$E_d(w)$ is a nonnegative weighted sum of two terms. This operation preserves convexity if the two terms are convex.

The second term, $frac12lambda||w||^2$, is convex since the squared euclidean norm is convex and $lambda$ is positive.

The first term, $frac12sum_j in outputLayer^ (t_j-o_j(w))^2$, is neither convex nor concave because of the neural network...

for example, at the first layer of the neural network it is computed

$$g(x) = f(W cdot x + b)$$

where $W in R^m times n$ is the first weight matrix, $x in R^n$ is the network input, $b in R^m$ is the bias and $f: R^m rightarrow R^m$ is the element-wise activation function.

Here is when I break my analysis saying that the whole function is neither convex nor concave, depending on the activation function:

if $f := identity$ then $g$ is convex on $R$

if $f := ReLU$, then $g$ is convex on $R$

if $f := Sigmoid$, the $g$ is convex in $R_-$ but concave in $R_+$

if $f := tanh$, the $g$ is convex in $R_-$ but concave in $R_+$

I don't believe this analysis is sufficient, how can I improve it?

asked Jul 22 at 10:35

ThanksBye

254

asked Jul 22 at 10:35

ThanksBye

254

asked Jul 22 at 10:35

ThanksBye

254

asked Jul 22 at 10:35

ThanksBye

254

It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
â€“Â David M.
Jul 22 at 16:35

The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
â€“Â ThanksBye
Jul 23 at 16:54

add a commentÂ |Â

It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
â€“Â David M.
Jul 22 at 16:35

The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
â€“Â ThanksBye
Jul 23 at 16:54

It's not clear exactly what you're trying to prove. As you note, in general, the (regularized) error function is non-convex--in other words, there exist activation functions $g$ which make $E$ non-convex. However, there do exist activation functions $g$ which make $E$ convex (for example, if $gequiv0$). So, what are you trying to show?
â€“Â David M.
Jul 22 at 16:35

The goal is to know if E is convex or not, so that I know if some convergence properties of the learning algorithm I am using hold or not. Is my analysis complete enough to conclude that if I use ReLU as activation function then E is convex, otherwise for Sigmoid/Tanh is nonconvex?
â€“Â ThanksBye
Jul 23 at 16:54

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

Here's at least a partial answer to what you're asking, I think.

You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.

The answer is no: consider a very simple network with two inputs (and a bias $b=1$) and one output. Suppose we have just one input pattern (i.e. $|D|=1$), which is the vector $x=(1, 1)$. Suppose the desired (target) output is $t=10$. The network is parameterized by the weight vector $w=(w_1, w_2)inmathbbR^2$. If we take $lambda=1$, the error function is given by

beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*

However, we can plainly see that this function is not convex:

Non-convex error surface for simple neural network

answered Jul 23 at 17:42

David M.

1,314318

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859270%2fstudy-the-convexity-of-mean-squared-error-with-regularization%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

Here's at least a partial answer to what you're asking, I think.

You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.

beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*

However, we can plainly see that this function is not convex:

Non-convex error surface for simple neural network

answered Jul 23 at 17:42

David M.

1,314318

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

add a commentÂ |Â

up vote
1
down vote

Here's at least a partial answer to what you're asking, I think.

You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.

beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*

However, we can plainly see that this function is not convex:

Non-convex error surface for simple neural network

answered Jul 23 at 17:42

David M.

1,314318

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

add a commentÂ |Â

up vote
1
down vote

Here's at least a partial answer to what you're asking, I think.

You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.

beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*

However, we can plainly see that this function is not convex:

Non-convex error surface for simple neural network

answered Jul 23 at 17:42

David M.

1,314318

Here's at least a partial answer to what you're asking, I think.

You want to know whether or not the function $E$ is convex in $w$ if we take the activation function $f$ to be $operatornameReLU$.

beginalign*
E(w)&=frac12big(t-maxw^textTx+b,0big)^2+frac12lambda|w|^2\
&=frac12big(10-maxw_1+w_2+1,0big)^2+frac12(w_1^2+w_2^2).
endalign*

However, we can plainly see that this function is not convex:

Non-convex error surface for simple neural network

answered Jul 23 at 17:42

David M.

1,314318

answered Jul 23 at 17:42

David M.

1,314318

answered Jul 23 at 17:42

David M.

1,314318

answered Jul 23 at 17:42

David M.

1,314318

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

add a commentÂ |Â

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

This is a bit counter intuitive to me since ReLU is convex. Is there a way to check if a function is convex without plotting or checking if the Hessian is positive definite?
â€“Â ThanksBye
Aug 1 at 14:53

maybe by knowing the basic convex functions and knowing the operations that preserve convexity?
â€“Â ThanksBye
Aug 1 at 14:53

@ThanksBye that's a good idea. There are many such results. For example, the composition of two convex functions is not, in general, convex. But there are other useful results. This question is one place to start.
â€“Â David M.
Aug 1 at 15:03

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik