Shortcut to finding the square-root of a perfect-square?
Clash Royale CLAN TAG#URR8PPP
up vote
10
down vote
favorite
I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?
Thanks,
Ryan
square-numbers
 |Â
show 26 more comments
up vote
10
down vote
favorite
I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?
Thanks,
Ryan
square-numbers
1
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
1
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
2
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
1
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
2
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday
 |Â
show 26 more comments
up vote
10
down vote
favorite
up vote
10
down vote
favorite
I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?
Thanks,
Ryan
square-numbers
I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?
Thanks,
Ryan
square-numbers
asked yesterday
Ryan Pierce Williams
817
817
1
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
1
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
2
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
1
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
2
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday
 |Â
show 26 more comments
1
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
1
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
2
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
1
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
2
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday
1
1
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
1
1
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
2
2
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
1
1
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
2
2
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday
 |Â
show 26 more comments
4 Answers
4
active
oldest
votes
up vote
12
down vote
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.
I think this method would run in $O(sqrt n)$.
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
 |Â
show 3 more comments
up vote
6
down vote
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=frac k2$
- Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
 |Â
show 2 more comments
up vote
3
down vote
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.
Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
add a comment |Â
up vote
1
down vote
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
12
down vote
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.
I think this method would run in $O(sqrt n)$.
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
 |Â
show 3 more comments
up vote
12
down vote
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.
I think this method would run in $O(sqrt n)$.
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
 |Â
show 3 more comments
up vote
12
down vote
up vote
12
down vote
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.
I think this method would run in $O(sqrt n)$.
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.
I think this method would run in $O(sqrt n)$.
edited yesterday
answered yesterday
Tiwa Aina
2,526319
2,526319
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
 |Â
show 3 more comments
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
1
1
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday
1
1
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday
4
4
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday
1
1
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday
7
7
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday
 |Â
show 3 more comments
up vote
6
down vote
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=frac k2$
- Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
 |Â
show 2 more comments
up vote
6
down vote
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=frac k2$
- Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
 |Â
show 2 more comments
up vote
6
down vote
up vote
6
down vote
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=frac k2$
- Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=frac k2$
- Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.
edited yesterday
answered yesterday


Marcus Müller
1827
1827
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
 |Â
show 2 more comments
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday
1
1
(of course it's impossible to do it in constant time)
– user202729
yesterday
(of course it's impossible to do it in constant time)
– user202729
yesterday
1
1
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday
 |Â
show 2 more comments
up vote
3
down vote
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.
Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
add a comment |Â
up vote
3
down vote
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.
Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
add a comment |Â
up vote
3
down vote
up vote
3
down vote
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.
Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.
Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.
edited 19 hours ago
answered yesterday
Aeryk
257211
257211
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
add a comment |Â
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago
add a comment |Â
up vote
1
down vote
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
add a comment |Â
up vote
1
down vote
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...
answered yesterday
alcana
812
812
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
add a comment |Â
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday
3
3
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday
2
2
Yes, this is less efficient than Newton's method.
– user202729
yesterday
Yes, this is less efficient than Newton's method.
– user202729
yesterday
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872437%2fshortcut-to-finding-the-square-root-of-a-perfect-square%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday
1
And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday
2
If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday
1
Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday
2
By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday