How to improve the numerical stability of the inverse rank-one Cholesky update?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I am trying to use the inverse Cholesky update from the page 10 of the Efficient covariance matrix update for variable metric evolution strategies paper as a part of the optimization step in a neural network and am struggling significantly as it is so unstable. There is nothing wrong with the logic of it, but I've found that it requires really low learning rates $beta$ and even then works quite poorly. The full reasons for that are unknown to me, but there is some indication that the expression as originally shown is quite numerically unstable. I tend to set $alpha=beta-1$.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 ^2 left(1 - frac 1 sqrt z_tright right) z_t [z^T_tA^-1_t]
$$
By distributing $sqrt alpha$, I think I've managed to find the first place on how to improve this expression.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 z_tright left(frac 1 sqrt alpha - frac 1 sqrt ^2 right) z_t [z^T_tA^-1_t]
$$
I've yet to test this, but I have a reason to expect this would be better. While testing the back-whitening in the last layer I had the situation that the inverse Cholesky factor was not updating at all for some reason. Looking into it the square L2 norm $left|z_tright|^2$ was around $10^-3$ while the learning rate $beta$ was really low around $10^-5$ due to higher ones diverging. Hence what happened was that $sqrt z_tright$ always evaluated to zero and no updates ever took place because $1 + 10^-8 = 1$ with float32 numbers.
Distributing the $sqrt alpha$ definitely feels right here, but I am hardly an expert in numerical optimization and am just going off my intuition as a programmer.
Are there any more moves I could take here to make the expression behave better?
linear-algebra numerical-methods numerical-optimization
add a comment |Â
up vote
1
down vote
favorite
I am trying to use the inverse Cholesky update from the page 10 of the Efficient covariance matrix update for variable metric evolution strategies paper as a part of the optimization step in a neural network and am struggling significantly as it is so unstable. There is nothing wrong with the logic of it, but I've found that it requires really low learning rates $beta$ and even then works quite poorly. The full reasons for that are unknown to me, but there is some indication that the expression as originally shown is quite numerically unstable. I tend to set $alpha=beta-1$.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 ^2 left(1 - frac 1 sqrt z_tright right) z_t [z^T_tA^-1_t]
$$
By distributing $sqrt alpha$, I think I've managed to find the first place on how to improve this expression.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 z_tright left(frac 1 sqrt alpha - frac 1 sqrt ^2 right) z_t [z^T_tA^-1_t]
$$
I've yet to test this, but I have a reason to expect this would be better. While testing the back-whitening in the last layer I had the situation that the inverse Cholesky factor was not updating at all for some reason. Looking into it the square L2 norm $left|z_tright|^2$ was around $10^-3$ while the learning rate $beta$ was really low around $10^-5$ due to higher ones diverging. Hence what happened was that $sqrt z_tright$ always evaluated to zero and no updates ever took place because $1 + 10^-8 = 1$ with float32 numbers.
Distributing the $sqrt alpha$ definitely feels right here, but I am hardly an expert in numerical optimization and am just going off my intuition as a programmer.
Are there any more moves I could take here to make the expression behave better?
linear-algebra numerical-methods numerical-optimization
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am trying to use the inverse Cholesky update from the page 10 of the Efficient covariance matrix update for variable metric evolution strategies paper as a part of the optimization step in a neural network and am struggling significantly as it is so unstable. There is nothing wrong with the logic of it, but I've found that it requires really low learning rates $beta$ and even then works quite poorly. The full reasons for that are unknown to me, but there is some indication that the expression as originally shown is quite numerically unstable. I tend to set $alpha=beta-1$.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 ^2 left(1 - frac 1 sqrt z_tright right) z_t [z^T_tA^-1_t]
$$
By distributing $sqrt alpha$, I think I've managed to find the first place on how to improve this expression.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 z_tright left(frac 1 sqrt alpha - frac 1 sqrt ^2 right) z_t [z^T_tA^-1_t]
$$
I've yet to test this, but I have a reason to expect this would be better. While testing the back-whitening in the last layer I had the situation that the inverse Cholesky factor was not updating at all for some reason. Looking into it the square L2 norm $left|z_tright|^2$ was around $10^-3$ while the learning rate $beta$ was really low around $10^-5$ due to higher ones diverging. Hence what happened was that $sqrt z_tright$ always evaluated to zero and no updates ever took place because $1 + 10^-8 = 1$ with float32 numbers.
Distributing the $sqrt alpha$ definitely feels right here, but I am hardly an expert in numerical optimization and am just going off my intuition as a programmer.
Are there any more moves I could take here to make the expression behave better?
linear-algebra numerical-methods numerical-optimization
I am trying to use the inverse Cholesky update from the page 10 of the Efficient covariance matrix update for variable metric evolution strategies paper as a part of the optimization step in a neural network and am struggling significantly as it is so unstable. There is nothing wrong with the logic of it, but I've found that it requires really low learning rates $beta$ and even then works quite poorly. The full reasons for that are unknown to me, but there is some indication that the expression as originally shown is quite numerically unstable. I tend to set $alpha=beta-1$.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 ^2 left(1 - frac 1 sqrt z_tright right) z_t [z^T_tA^-1_t]
$$
By distributing $sqrt alpha$, I think I've managed to find the first place on how to improve this expression.
$$
A^-1_t+1 = frac 1 sqrt alpha A^-1_t - frac 1 z_tright left(frac 1 sqrt alpha - frac 1 sqrt ^2 right) z_t [z^T_tA^-1_t]
$$
I've yet to test this, but I have a reason to expect this would be better. While testing the back-whitening in the last layer I had the situation that the inverse Cholesky factor was not updating at all for some reason. Looking into it the square L2 norm $left|z_tright|^2$ was around $10^-3$ while the learning rate $beta$ was really low around $10^-5$ due to higher ones diverging. Hence what happened was that $sqrt z_tright$ always evaluated to zero and no updates ever took place because $1 + 10^-8 = 1$ with float32 numbers.
Distributing the $sqrt alpha$ definitely feels right here, but I am hardly an expert in numerical optimization and am just going off my intuition as a programmer.
Are there any more moves I could take here to make the expression behave better?
linear-algebra numerical-methods numerical-optimization
edited Jul 16 at 12:43
asked Jul 15 at 16:38
Marko Grdinic
1369
1369
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2852664%2fhow-to-improve-the-numerical-stability-of-the-inverse-rank-one-cholesky-update%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password