Hessian on linear least squares problem

Clash Royale CLAN TAG#URR8PPP

up vote
2
down vote

favorite

I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:

$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$

Can someone help me?

Thanks a lot.

edited Jul 30 at 16:28

asked Jul 27 at 16:51

S-F

112

3

It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
â€“Â hardmath
Jul 27 at 16:55

It's not double differentiable?
â€“Â S-F
Jul 27 at 17:01

1

The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
â€“Â hardmath
Jul 27 at 17:06

Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
â€“Â mathreadler
Jul 30 at 18:20

Are you using the spectral norm or the Frobenius norm?
â€“Â Rodrigo de Azevedo
Jul 30 at 19:17

add a commentÂ |Â

up vote
2
down vote

favorite

I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:

$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$

Can someone help me?

Thanks a lot.

edited Jul 30 at 16:28

asked Jul 27 at 16:51

S-F

112

3

It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
â€“Â hardmath
Jul 27 at 16:55

It's not double differentiable?
â€“Â S-F
Jul 27 at 17:01

1

The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
â€“Â hardmath
Jul 27 at 17:06

Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
â€“Â mathreadler
Jul 30 at 18:20

Are you using the spectral norm or the Frobenius norm?
â€“Â Rodrigo de Azevedo
Jul 30 at 19:17

add a commentÂ |Â

up vote
2
down vote

favorite

I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:

$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$

Can someone help me?

Thanks a lot.

edited Jul 30 at 16:28

asked Jul 27 at 16:51

S-F

112

I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:

$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$

Can someone help me?

Thanks a lot.

edited Jul 30 at 16:28

asked Jul 27 at 16:51

S-F

112

edited Jul 30 at 16:28

asked Jul 27 at 16:51

S-F

112

asked Jul 27 at 16:51

S-F

112

asked Jul 27 at 16:51

S-F

112

3

It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
â€“Â hardmath
Jul 27 at 16:55

It's not double differentiable?
â€“Â S-F
Jul 27 at 17:01

1

The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
â€“Â hardmath
Jul 27 at 17:06

Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
â€“Â mathreadler
Jul 30 at 18:20

Are you using the spectral norm or the Frobenius norm?
â€“Â Rodrigo de Azevedo
Jul 30 at 19:17

add a commentÂ |Â

3

It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
â€“Â hardmath
Jul 27 at 16:55

It's not double differentiable?
â€“Â S-F
Jul 27 at 17:01

1

The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
â€“Â hardmath
Jul 27 at 17:06

Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
â€“Â mathreadler
Jul 30 at 18:20

Are you using the spectral norm or the Frobenius norm?
â€“Â Rodrigo de Azevedo
Jul 30 at 19:17

It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
â€“Â hardmath
Jul 27 at 16:55

It's not double differentiable?
â€“Â S-F
Jul 27 at 17:01

The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
â€“Â hardmath
Jul 27 at 17:06

Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
â€“Â mathreadler
Jul 30 at 18:20

Are you using the spectral norm or the Frobenius norm?
â€“Â Rodrigo de Azevedo
Jul 30 at 19:17

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
0
down vote

Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").

Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.

So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.

In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

add a commentÂ |Â

up vote
0
down vote

Let $f:mathbb R^n to mathbb R$ be defined by
$$
f(x)=frac12 |Ax-b|^2.
$$
Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
$$
g'(y)=y^T, quad h'(x)=A.
$$
The chain rule tells us that
$$
f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
$$
If we use the convention that the gradient is a column vector, then
$$
nabla f(x)=f'(x)^T=A^T(Ax-b).
$$

The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
$$
Hf(x)= A^T A.
$$

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

add a commentÂ |Â

up vote
0
down vote

Let $f : mathbb R^m times n to mathbb R$ be defined by

$$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$

where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is

$$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

add a commentÂ |Â

up vote
0
down vote

Yep squared norm is better.

$$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$

Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.

If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

add a commentÂ |Â

up vote
0
down vote

Define a new matrix $P=(AX-B)$ and write the function as
$$f=|P|_F^2 = P:P$$
where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$

Find the differential and gradient of $f$
$$eqalign
df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
G &= fracpartial fpartial X = 2A^TP cr
$$
Now find the differential and gradient of $G$
$$eqalign
dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
$$
Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
$$mathcal E_ijkl = delta_ik delta_jl$$
So far everyone has answered a modified form of your question by squaring the function.

If you truly need the hessian of your original function, here it is
$$eqalignP$$
where $star$ is the tensor product, i.e.
$$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$

answered Aug 5 at 20:12

lynn

1,451166

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864585%2fhessian-on-linear-least-squares-problem%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
0
down vote

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

add a commentÂ |Â

up vote
0
down vote

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

add a commentÂ |Â

up vote
0
down vote

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

answered Jul 27 at 17:32

Mauricio Cele Lopez Belon

54728

add a commentÂ |Â

up vote
0
down vote

The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
$$
Hf(x)= A^T A.
$$

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

add a commentÂ |Â

up vote
0
down vote

The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
$$
Hf(x)= A^T A.
$$

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

add a commentÂ |Â

up vote
0
down vote

The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
$$
Hf(x)= A^T A.
$$

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
$$
Hf(x)= A^T A.
$$

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

edited Jul 30 at 18:54

answered Jul 30 at 18:49

littleO

25.9k540100

answered Jul 30 at 18:49

littleO

25.9k540100

answered Jul 30 at 18:49

littleO

25.9k540100

add a commentÂ |Â

up vote
0
down vote

Let $f : mathbb R^m times n to mathbb R$ be defined by

$$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$

where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is

$$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

add a commentÂ |Â

up vote
0
down vote

Let $f : mathbb R^m times n to mathbb R$ be defined by

$$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$

where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is

$$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

add a commentÂ |Â

up vote
0
down vote

Let $f : mathbb R^m times n to mathbb R$ be defined by

$$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$

where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is

$$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

Let $f : mathbb R^m times n to mathbb R$ be defined by

$$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$

where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is

$$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

answered Jul 31 at 5:31

Rodrigo de Azevedo

12.5k41751

add a commentÂ |Â

up vote
0
down vote

Yep squared norm is better.

$$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$

Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.

If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

add a commentÂ |Â

up vote
0
down vote

Yep squared norm is better.

$$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$

Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.

If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

add a commentÂ |Â

up vote
0
down vote

Yep squared norm is better.

$$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$

Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.

If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

Yep squared norm is better.

$$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$

Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.

If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

edited Aug 1 at 9:28

answered Jul 30 at 18:22

mathreadler

13.6k71857

answered Jul 30 at 18:22

mathreadler

13.6k71857

answered Jul 30 at 18:22

mathreadler

13.6k71857

add a commentÂ |Â

up vote
0
down vote

Define a new matrix $P=(AX-B)$ and write the function as
$$f=|P|_F^2 = P:P$$
where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$

answered Aug 5 at 20:12

lynn

1,451166

add a commentÂ |Â

up vote
0
down vote

Define a new matrix $P=(AX-B)$ and write the function as
$$f=|P|_F^2 = P:P$$
where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$

answered Aug 5 at 20:12

lynn

1,451166

add a commentÂ |Â

up vote
0
down vote

Define a new matrix $P=(AX-B)$ and write the function as
$$f=|P|_F^2 = P:P$$
where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$

answered Aug 5 at 20:12

lynn

1,451166

Define a new matrix $P=(AX-B)$ and write the function as
$$f=|P|_F^2 = P:P$$
where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$

answered Aug 5 at 20:12

lynn

1,451166

answered Aug 5 at 20:12

lynn

1,451166

answered Aug 5 at 20:12

lynn

1,451166

answered Aug 5 at 20:12

lynn

1,451166

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik