Gradient descent versus finding where the gradient vanishes via solving systems of equations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I started learning machine learning and got stuck at the following questions:



  1. Why do we need to iterate the gradient descent algorithm?


  2. Why don't we equate the gradient to zero and find all local minima?


Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?



Sorry if this is a duplicate question. Thanks in advance.







share|cite|improve this question













migrated from cstheory.stackexchange.com Jul 30 at 12:18


This question came from our site for theoretical computer scientists and researchers in related fields.














  • Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
    – Rodrigo de Azevedo
    Jul 26 at 18:08










  • @RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
    – Anton
    Jul 30 at 9:14










  • Laplace transform? Where are the differential equations?
    – Rodrigo de Azevedo
    Jul 30 at 12:36










  • @RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
    – Anton
    Jul 30 at 12:46











  • Take a look at this.
    – Rodrigo de Azevedo
    Jul 30 at 12:52














up vote
1
down vote

favorite












I started learning machine learning and got stuck at the following questions:



  1. Why do we need to iterate the gradient descent algorithm?


  2. Why don't we equate the gradient to zero and find all local minima?


Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?



Sorry if this is a duplicate question. Thanks in advance.







share|cite|improve this question













migrated from cstheory.stackexchange.com Jul 30 at 12:18


This question came from our site for theoretical computer scientists and researchers in related fields.














  • Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
    – Rodrigo de Azevedo
    Jul 26 at 18:08










  • @RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
    – Anton
    Jul 30 at 9:14










  • Laplace transform? Where are the differential equations?
    – Rodrigo de Azevedo
    Jul 30 at 12:36










  • @RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
    – Anton
    Jul 30 at 12:46











  • Take a look at this.
    – Rodrigo de Azevedo
    Jul 30 at 12:52












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I started learning machine learning and got stuck at the following questions:



  1. Why do we need to iterate the gradient descent algorithm?


  2. Why don't we equate the gradient to zero and find all local minima?


Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?



Sorry if this is a duplicate question. Thanks in advance.







share|cite|improve this question













I started learning machine learning and got stuck at the following questions:



  1. Why do we need to iterate the gradient descent algorithm?


  2. Why don't we equate the gradient to zero and find all local minima?


Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?



Sorry if this is a duplicate question. Thanks in advance.









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 30 at 12:34









Rodrigo de Azevedo

12.5k41751




12.5k41751









asked Jul 26 at 8:52









Anton

61




61




migrated from cstheory.stackexchange.com Jul 30 at 12:18


This question came from our site for theoretical computer scientists and researchers in related fields.






migrated from cstheory.stackexchange.com Jul 30 at 12:18


This question came from our site for theoretical computer scientists and researchers in related fields.













  • Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
    – Rodrigo de Azevedo
    Jul 26 at 18:08










  • @RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
    – Anton
    Jul 30 at 9:14










  • Laplace transform? Where are the differential equations?
    – Rodrigo de Azevedo
    Jul 30 at 12:36










  • @RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
    – Anton
    Jul 30 at 12:46











  • Take a look at this.
    – Rodrigo de Azevedo
    Jul 30 at 12:52
















  • Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
    – Rodrigo de Azevedo
    Jul 26 at 18:08










  • @RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
    – Anton
    Jul 30 at 9:14










  • Laplace transform? Where are the differential equations?
    – Rodrigo de Azevedo
    Jul 30 at 12:36










  • @RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
    – Anton
    Jul 30 at 12:46











  • Take a look at this.
    – Rodrigo de Azevedo
    Jul 30 at 12:52















Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
– Rodrigo de Azevedo
Jul 26 at 18:08




Generally speaking, finding where the gradient equals zero is only easy for quadratic cost functions. Solving systems of polynomial equations is not easy.
– Rodrigo de Azevedo
Jul 26 at 18:08












@RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
– Anton
Jul 30 at 9:14




@RodrigodeAzevedo, thanks for reply first of all! but why we can't use Laplace transform in that case? I mean it could take much less computing time
– Anton
Jul 30 at 9:14












Laplace transform? Where are the differential equations?
– Rodrigo de Azevedo
Jul 30 at 12:36




Laplace transform? Where are the differential equations?
– Rodrigo de Azevedo
Jul 30 at 12:36












@RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
– Anton
Jul 30 at 12:46





@RodrigodeAzevedo, sorry, I might misunderstand you. I thought, that when we find a derivatives from MSE function, we are getting system of differential equations and in case it is difficult to solve it, we might use Laplace transform.
– Anton
Jul 30 at 12:46













Take a look at this.
– Rodrigo de Azevedo
Jul 30 at 12:52




Take a look at this.
– Rodrigo de Azevedo
Jul 30 at 12:52















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2866962%2fgradient-descent-versus-finding-where-the-gradient-vanishes-via-solving-systems%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2866962%2fgradient-descent-versus-finding-where-the-gradient-vanishes-via-solving-systems%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?