Hessian on linear least squares problem

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite
1












I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:



$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$



Can someone help me?

Thanks a lot.







share|cite|improve this question

















  • 3




    It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
    – hardmath
    Jul 27 at 16:55










  • It's not double differentiable?
    – S-F
    Jul 27 at 17:01






  • 1




    The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
    – hardmath
    Jul 27 at 17:06










  • Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
    – mathreadler
    Jul 30 at 18:20










  • Are you using the spectral norm or the Frobenius norm?
    – Rodrigo de Azevedo
    Jul 30 at 19:17














up vote
2
down vote

favorite
1












I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:



$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$



Can someone help me?

Thanks a lot.







share|cite|improve this question

















  • 3




    It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
    – hardmath
    Jul 27 at 16:55










  • It's not double differentiable?
    – S-F
    Jul 27 at 17:01






  • 1




    The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
    – hardmath
    Jul 27 at 17:06










  • Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
    – mathreadler
    Jul 30 at 18:20










  • Are you using the spectral norm or the Frobenius norm?
    – Rodrigo de Azevedo
    Jul 30 at 19:17












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:



$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$



Can someone help me?

Thanks a lot.







share|cite|improve this question













I tried to calculate the Hessian matrix of linear least squares problem (L-2 norm), in particular:



$$f(x) = |AX - B |_2$$
where $f:rm I!R^11times 2rightarrow rm I!R$



Can someone help me?

Thanks a lot.









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 30 at 16:28
























asked Jul 27 at 16:51









S-F

112




112







  • 3




    It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
    – hardmath
    Jul 27 at 16:55










  • It's not double differentiable?
    – S-F
    Jul 27 at 17:01






  • 1




    The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
    – hardmath
    Jul 27 at 17:06










  • Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
    – mathreadler
    Jul 30 at 18:20










  • Are you using the spectral norm or the Frobenius norm?
    – Rodrigo de Azevedo
    Jul 30 at 19:17












  • 3




    It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
    – hardmath
    Jul 27 at 16:55










  • It's not double differentiable?
    – S-F
    Jul 27 at 17:01






  • 1




    The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
    – hardmath
    Jul 27 at 17:06










  • Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
    – mathreadler
    Jul 30 at 18:20










  • Are you using the spectral norm or the Frobenius norm?
    – Rodrigo de Azevedo
    Jul 30 at 19:17







3




3




It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
– hardmath
Jul 27 at 16:55




It makes more sense to work with the norm squared, since that is a smooth function (while the norm itself fails to be differentiable at one point).
– hardmath
Jul 27 at 16:55












It's not double differentiable?
– S-F
Jul 27 at 17:01




It's not double differentiable?
– S-F
Jul 27 at 17:01




1




1




The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
– hardmath
Jul 27 at 17:06




The Hessian is the matrix of second partial derivatives. However my point is to use the norm-squared as your objective function rather than the $L^2$-norm itself avoids problems with taking derivatives. Consider the one dimensional case, $f(x) = |x|$ and take the second derivative. I don't think it will be as useful as the function $f(x) = x^2$.
– hardmath
Jul 27 at 17:06












Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
– mathreadler
Jul 30 at 18:20




Yep squared norm is better. $|AX-B|_F^2 = (AX-B)^T(AX-B)$
– mathreadler
Jul 30 at 18:20












Are you using the spectral norm or the Frobenius norm?
– Rodrigo de Azevedo
Jul 30 at 19:17




Are you using the spectral norm or the Frobenius norm?
– Rodrigo de Azevedo
Jul 30 at 19:17










5 Answers
5






active

oldest

votes

















up vote
0
down vote













Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").



Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.



So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.



In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.






share|cite|improve this answer




























    up vote
    0
    down vote













    Let $f:mathbb R^n to mathbb R$ be defined by
    $$
    f(x)=frac12 |Ax-b|^2.
    $$
    Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
    $$
    g'(y)=y^T, quad h'(x)=A.
    $$
    The chain rule tells us that
    $$
    f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
    $$
    If we use the convention that the gradient is a column vector, then
    $$
    nabla f(x)=f'(x)^T=A^T(Ax-b).
    $$



    The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
    $$
    Hf(x)= A^T A.
    $$






    share|cite|improve this answer






























      up vote
      0
      down vote













      Let $f : mathbb R^m times n to mathbb R$ be defined by



      $$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$



      where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is



      $$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$






      share|cite|improve this answer




























        up vote
        0
        down vote













        Yep squared norm is better.



        $$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$



        Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.




        If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.






        share|cite|improve this answer






























          up vote
          0
          down vote













          Define a new matrix $P=(AX-B)$ and write the function as
          $$f=|P|_F^2 = P:P$$
          where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$



          Find the differential and gradient of $f$
          $$eqalign
          df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
          G &= fracpartial fpartial X = 2A^TP cr
          $$
          Now find the differential and gradient of $G$
          $$eqalign
          dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
          mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
          $$
          Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
          $$mathcal E_ijkl = delta_ik delta_jl$$
          So far everyone has answered a modified form of your question by squaring the function.

          If you truly need the hessian of your original function, here it is
          $$eqalignP$$
          where $star$ is the tensor product, i.e.
          $$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$






          share|cite|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "69"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864585%2fhessian-on-linear-least-squares-problem%23new-answer', 'question_page');

            );

            Post as a guest






























            5 Answers
            5






            active

            oldest

            votes








            5 Answers
            5






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").



            Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.



            So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.



            In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.






            share|cite|improve this answer

























              up vote
              0
              down vote













              Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").



              Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.



              So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.



              In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.






              share|cite|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").



                Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.



                So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.



                In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.






                share|cite|improve this answer













                Calculate first the gradient vector: use the chain rule and calculate the partial derivatives of $f(x)$ w.r.t $x in R^n$. You will get a function that eats a vector and produce other "vector" $g(x) in R^n$ (well this is an abuse of notation and terminology, $g(x)$ produces a vector of functions not a vector in $R^n$ so it is really a "vector operator").



                Then you will take the partial derivatives of $g(x)$ w.r.t $x$ again applying the chain rule. For that you can see $g(x)$ as a vector of simpler functions $g_i(x) in R$ each of which eats a vector and produces a scalar value.



                So for each dimension of $g(x)$ you have a function $g_i(x) in R$. So taking the partial derivatives of $g(x)$ w.r.t $x$ amounts to taking the partial derivatives of $g_i(x) in R$ w.r.t $x$ and put them toguether. That is the Hessian matrix.



                In the same way that we see the derivative of $f(x)$ w.r.t $x$ is producing a vector operator, we can see the derivative of $g_i(x)$ w.r.t $x$ as producing a vector operator and hence the derivative of $g(x)$ w.r.t $x$ is producing a matrix operator named Hessian matrix.







                share|cite|improve this answer













                share|cite|improve this answer



                share|cite|improve this answer











                answered Jul 27 at 17:32









                Mauricio Cele Lopez Belon

                54728




                54728




















                    up vote
                    0
                    down vote













                    Let $f:mathbb R^n to mathbb R$ be defined by
                    $$
                    f(x)=frac12 |Ax-b|^2.
                    $$
                    Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
                    $$
                    g'(y)=y^T, quad h'(x)=A.
                    $$
                    The chain rule tells us that
                    $$
                    f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
                    $$
                    If we use the convention that the gradient is a column vector, then
                    $$
                    nabla f(x)=f'(x)^T=A^T(Ax-b).
                    $$



                    The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
                    $$
                    Hf(x)= A^T A.
                    $$






                    share|cite|improve this answer



























                      up vote
                      0
                      down vote













                      Let $f:mathbb R^n to mathbb R$ be defined by
                      $$
                      f(x)=frac12 |Ax-b|^2.
                      $$
                      Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
                      $$
                      g'(y)=y^T, quad h'(x)=A.
                      $$
                      The chain rule tells us that
                      $$
                      f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
                      $$
                      If we use the convention that the gradient is a column vector, then
                      $$
                      nabla f(x)=f'(x)^T=A^T(Ax-b).
                      $$



                      The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
                      $$
                      Hf(x)= A^T A.
                      $$






                      share|cite|improve this answer

























                        up vote
                        0
                        down vote










                        up vote
                        0
                        down vote









                        Let $f:mathbb R^n to mathbb R$ be defined by
                        $$
                        f(x)=frac12 |Ax-b|^2.
                        $$
                        Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
                        $$
                        g'(y)=y^T, quad h'(x)=A.
                        $$
                        The chain rule tells us that
                        $$
                        f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
                        $$
                        If we use the convention that the gradient is a column vector, then
                        $$
                        nabla f(x)=f'(x)^T=A^T(Ax-b).
                        $$



                        The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
                        $$
                        Hf(x)= A^T A.
                        $$






                        share|cite|improve this answer















                        Let $f:mathbb R^n to mathbb R$ be defined by
                        $$
                        f(x)=frac12 |Ax-b|^2.
                        $$
                        Notice that $f(x)=g(h(x))$, where $h(x)=Ax-b$ and $g(y) = frac12 |y|^2$. The derivatives of $g$ and $h$ are given by
                        $$
                        g'(y)=y^T, quad h'(x)=A.
                        $$
                        The chain rule tells us that
                        $$
                        f'(x)=g'(h(x))h'(x) = (Ax-b)^T A.
                        $$
                        If we use the convention that the gradient is a column vector, then
                        $$
                        nabla f(x)=f'(x)^T=A^T(Ax-b).
                        $$



                        The Hessian $Hf(x)$ is the derivative of the function $x mapsto nabla f(x)$, so:
                        $$
                        Hf(x)= A^T A.
                        $$







                        share|cite|improve this answer















                        share|cite|improve this answer



                        share|cite|improve this answer








                        edited Jul 30 at 18:54


























                        answered Jul 30 at 18:49









                        littleO

                        25.9k540100




                        25.9k540100




















                            up vote
                            0
                            down vote













                            Let $f : mathbb R^m times n to mathbb R$ be defined by



                            $$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$



                            where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is



                            $$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$






                            share|cite|improve this answer

























                              up vote
                              0
                              down vote













                              Let $f : mathbb R^m times n to mathbb R$ be defined by



                              $$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$



                              where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is



                              $$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$






                              share|cite|improve this answer























                                up vote
                                0
                                down vote










                                up vote
                                0
                                down vote









                                Let $f : mathbb R^m times n to mathbb R$ be defined by



                                $$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$



                                where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is



                                $$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$






                                share|cite|improve this answer













                                Let $f : mathbb R^m times n to mathbb R$ be defined by



                                $$f (mathrm X) := frac 12 | mathrm A mathrm X - mathrm B |_textF^2 = frac 12 | (mathrm I_n otimes mathrm A) , mboxvec (mathrm X) - mboxvec (mathrm B) |_2^2$$



                                where $mboxvec$ is the vectorization operator and $otimes$ is the Kronecker product. Thus, the Hessian of $f$ is



                                $$(mathrm I_n otimes mathrm A)^top (mathrm I_n otimes mathrm A) = mathrm I_n otimes mathrm A^top mathrm A$$







                                share|cite|improve this answer













                                share|cite|improve this answer



                                share|cite|improve this answer











                                answered Jul 31 at 5:31









                                Rodrigo de Azevedo

                                12.5k41751




                                12.5k41751




















                                    up vote
                                    0
                                    down vote













                                    Yep squared norm is better.



                                    $$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$



                                    Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.




                                    If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.






                                    share|cite|improve this answer



























                                      up vote
                                      0
                                      down vote













                                      Yep squared norm is better.



                                      $$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$



                                      Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.




                                      If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.






                                      share|cite|improve this answer

























                                        up vote
                                        0
                                        down vote










                                        up vote
                                        0
                                        down vote









                                        Yep squared norm is better.



                                        $$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$



                                        Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.




                                        If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.






                                        share|cite|improve this answer















                                        Yep squared norm is better.



                                        $$|AX-B|_F^2 = (AX-B)^T(AX-B) = Big/text simplify Big/ = X^TA^TAX + textlinear & const terms$$



                                        Now you should see what the Hessian is. If you still don't you can check out Hessian matrix - use in optimization.




                                        If linear problem then the Hessian is directly in the second order term, if non-linear problem solved by trust-region approach it is matrix on second term of Taylor expansion around trust region.







                                        share|cite|improve this answer















                                        share|cite|improve this answer



                                        share|cite|improve this answer








                                        edited Aug 1 at 9:28


























                                        answered Jul 30 at 18:22









                                        mathreadler

                                        13.6k71857




                                        13.6k71857




















                                            up vote
                                            0
                                            down vote













                                            Define a new matrix $P=(AX-B)$ and write the function as
                                            $$f=|P|_F^2 = P:P$$
                                            where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$



                                            Find the differential and gradient of $f$
                                            $$eqalign
                                            df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
                                            G &= fracpartial fpartial X = 2A^TP cr
                                            $$
                                            Now find the differential and gradient of $G$
                                            $$eqalign
                                            dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
                                            mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
                                            $$
                                            Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
                                            $$mathcal E_ijkl = delta_ik delta_jl$$
                                            So far everyone has answered a modified form of your question by squaring the function.

                                            If you truly need the hessian of your original function, here it is
                                            $$eqalignP$$
                                            where $star$ is the tensor product, i.e.
                                            $$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$






                                            share|cite|improve this answer

























                                              up vote
                                              0
                                              down vote













                                              Define a new matrix $P=(AX-B)$ and write the function as
                                              $$f=|P|_F^2 = P:P$$
                                              where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$



                                              Find the differential and gradient of $f$
                                              $$eqalign
                                              df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
                                              G &= fracpartial fpartial X = 2A^TP cr
                                              $$
                                              Now find the differential and gradient of $G$
                                              $$eqalign
                                              dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
                                              mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
                                              $$
                                              Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
                                              $$mathcal E_ijkl = delta_ik delta_jl$$
                                              So far everyone has answered a modified form of your question by squaring the function.

                                              If you truly need the hessian of your original function, here it is
                                              $$eqalignP$$
                                              where $star$ is the tensor product, i.e.
                                              $$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$






                                              share|cite|improve this answer























                                                up vote
                                                0
                                                down vote










                                                up vote
                                                0
                                                down vote









                                                Define a new matrix $P=(AX-B)$ and write the function as
                                                $$f=|P|_F^2 = P:P$$
                                                where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$



                                                Find the differential and gradient of $f$
                                                $$eqalign
                                                df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
                                                G &= fracpartial fpartial X = 2A^TP cr
                                                $$
                                                Now find the differential and gradient of $G$
                                                $$eqalign
                                                dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
                                                mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
                                                $$
                                                Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
                                                $$mathcal E_ijkl = delta_ik delta_jl$$
                                                So far everyone has answered a modified form of your question by squaring the function.

                                                If you truly need the hessian of your original function, here it is
                                                $$eqalignP$$
                                                where $star$ is the tensor product, i.e.
                                                $$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$






                                                share|cite|improve this answer













                                                Define a new matrix $P=(AX-B)$ and write the function as
                                                $$f=|P|_F^2 = P:P$$
                                                where the colon denotes the trace/Frobenius product, i.e. $,,A:B=rm tr(A^TB)$



                                                Find the differential and gradient of $f$
                                                $$eqalign
                                                df &= 2P:dP = 2P:A,dX = 2A^TP:dX cr
                                                G &= fracpartial fpartial X = 2A^TP cr
                                                $$
                                                Now find the differential and gradient of $G$
                                                $$eqalign
                                                dG &= 2A^T,dP = 2A^TA,dX = 2A^TAmathcal E:dX cr
                                                mathcal H &= fracpartial Gpartial X = 2A^TAmathcal E cr
                                                $$
                                                Note that both $(mathcal H,mathcal E)$ are fourth-order tensors, the latter having components
                                                $$mathcal E_ijkl = delta_ik delta_jl$$
                                                So far everyone has answered a modified form of your question by squaring the function.

                                                If you truly need the hessian of your original function, here it is
                                                $$eqalignP$$
                                                where $star$ is the tensor product, i.e.
                                                $$mathcal M=Bstar C implies mathcal M_ijkl = B_ij,C_kl$$







                                                share|cite|improve this answer













                                                share|cite|improve this answer



                                                share|cite|improve this answer











                                                answered Aug 5 at 20:12









                                                lynn

                                                1,451166




                                                1,451166






















                                                     

                                                    draft saved


                                                    draft discarded


























                                                     


                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function ()
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2864585%2fhessian-on-linear-least-squares-problem%23new-answer', 'question_page');

                                                    );

                                                    Post as a guest













































































                                                    Comments

                                                    Popular posts from this blog

                                                    What is the equation of a 3D cone with generalised tilt?

                                                    Color the edges and diagonals of a regular polygon

                                                    Relationship between determinant of matrix and determinant of adjoint?