Shortcut to finding the square-root of a perfect-square?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
10
down vote

favorite
6












I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?



Thanks,
Ryan







share|cite|improve this question















  • 1




    How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
    – Marcus Müller
    yesterday







  • 1




    And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
    – Marcus Müller
    yesterday







  • 2




    If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
    – Marcus Müller
    yesterday







  • 1




    Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
    – Artimis Fowl
    yesterday







  • 2




    By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
    – Marcus Müller
    yesterday














up vote
10
down vote

favorite
6












I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?



Thanks,
Ryan







share|cite|improve this question















  • 1




    How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
    – Marcus Müller
    yesterday







  • 1




    And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
    – Marcus Müller
    yesterday







  • 2




    If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
    – Marcus Müller
    yesterday







  • 1




    Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
    – Artimis Fowl
    yesterday







  • 2




    By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
    – Marcus Müller
    yesterday












up vote
10
down vote

favorite
6









up vote
10
down vote

favorite
6






6





I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?



Thanks,
Ryan







share|cite|improve this question











I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?



Thanks,
Ryan









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked yesterday









Ryan Pierce Williams

817




817







  • 1




    How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
    – Marcus Müller
    yesterday







  • 1




    And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
    – Marcus Müller
    yesterday







  • 2




    If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
    – Marcus Müller
    yesterday







  • 1




    Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
    – Artimis Fowl
    yesterday







  • 2




    By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
    – Marcus Müller
    yesterday












  • 1




    How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
    – Marcus Müller
    yesterday







  • 1




    And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
    – Marcus Müller
    yesterday







  • 2




    If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
    – Marcus Müller
    yesterday







  • 1




    Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
    – Artimis Fowl
    yesterday







  • 2




    By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
    – Marcus Müller
    yesterday







1




1




How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday





How large are your squares? 48 binary digits? 64? 1000? And since this is very applied numerical math: what machine are we running on? Or is this configurable hardware and we can implement our own logic?
– Marcus Müller
yesterday





1




1




And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday





And: do you need to make a decision after every square root calculation, or can you calculate multiple square roots in parallel (e.g. data parallelism)?
– Marcus Müller
yesterday





2




2




If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday





If you can run on a cluster, you have data parallelism (prob'ly it's better to discard the pi cluster idea. It's very little bang for the initial Buck and little bang for the power bill Buck; embedded computers are not optimized for number crunching, much unlike modern desktop CPUs)
– Marcus Müller
yesterday





1




1




Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday





Practical idea, Depending on use case, you may be able to compute your square root earlier and memorize it for the bottle neck. Assuming you're doing some earlier computation to Ensure this number is a square. Memorization is also an option, but perhaps that's too much memory. To elaborate: get n^2, split thread. One end does your usual thing, the other computes the square root. Savings will depend on if you are doing more than just finding root n.
– Artimis Fowl
yesterday





2




2




By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday




By the way, gmp comes with an integer sqrt function. And I don't you'll he much faster than that on large integers.
– Marcus Müller
yesterday










4 Answers
4






active

oldest

votes

















up vote
12
down vote













The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.



For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.



I think this method would run in $O(sqrt n)$.






share|cite|improve this answer



















  • 1




    +1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
    – Brian Tung
    yesterday






  • 1




    Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
    – Ryan Pierce Williams
    yesterday






  • 4




    However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
    – Brian Tung
    yesterday






  • 1




    Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
    – Marcus Müller
    yesterday







  • 7




    Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
    – user202729
    yesterday


















up vote
6
down vote













With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.



(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)



  1. Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html

  2. $a=frac k2$

  3. Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.

  4. From here, do a binary search¹.

I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.



Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.



You have a technological problem, not so much a numerical.




¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.






share|cite|improve this answer























  • Step 1 assumes the input is a machine-sized integer, which may not be a case.
    – user202729
    yesterday










  • Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
    – Marcus Müller
    yesterday






  • 1




    (of course it's impossible to do it in constant time)
    – user202729
    yesterday






  • 1




    Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
    – Winther
    yesterday










  • "built-in square root function": of what, operating on what data types?
    – Marcus Müller
    yesterday

















up vote
3
down vote













I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.



Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)



Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.



Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.



Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.






share|cite|improve this answer























  • (1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
    – user202729
    yesterday










  • (3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
    – user202729
    yesterday










  • @user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
    – Aeryk
    19 hours ago

















up vote
1
down vote













I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.



Let $n$ be the input value.



  1. Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.


  2. Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.


  3. Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.


The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...






share|cite|improve this answer





















  • Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
    – Ryan Pierce Williams
    yesterday










  • @RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
    – rcgldr
    yesterday






  • 3




    Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
    – ktb
    yesterday






  • 2




    Yes, this is less efficient than Newton's method.
    – user202729
    yesterday










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872437%2fshortcut-to-finding-the-square-root-of-a-perfect-square%23new-answer', 'question_page');

);

Post as a guest






























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
12
down vote













The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.



For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.



I think this method would run in $O(sqrt n)$.






share|cite|improve this answer



















  • 1




    +1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
    – Brian Tung
    yesterday






  • 1




    Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
    – Ryan Pierce Williams
    yesterday






  • 4




    However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
    – Brian Tung
    yesterday






  • 1




    Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
    – Marcus Müller
    yesterday







  • 7




    Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
    – user202729
    yesterday















up vote
12
down vote













The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.



For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.



I think this method would run in $O(sqrt n)$.






share|cite|improve this answer



















  • 1




    +1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
    – Brian Tung
    yesterday






  • 1




    Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
    – Ryan Pierce Williams
    yesterday






  • 4




    However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
    – Brian Tung
    yesterday






  • 1




    Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
    – Marcus Müller
    yesterday







  • 7




    Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
    – user202729
    yesterday













up vote
12
down vote










up vote
12
down vote









The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.



For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.



I think this method would run in $O(sqrt n)$.






share|cite|improve this answer















The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.



For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $sqrt16=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k in mathbb N$.



I think this method would run in $O(sqrt n)$.







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited yesterday


























answered yesterday









Tiwa Aina

2,526319




2,526319







  • 1




    +1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
    – Brian Tung
    yesterday






  • 1




    Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
    – Ryan Pierce Williams
    yesterday






  • 4




    However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
    – Brian Tung
    yesterday






  • 1




    Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
    – Marcus Müller
    yesterday







  • 7




    Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
    – user202729
    yesterday













  • 1




    +1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
    – Brian Tung
    yesterday






  • 1




    Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
    – Ryan Pierce Williams
    yesterday






  • 4




    However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
    – Brian Tung
    yesterday






  • 1




    Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
    – Marcus Müller
    yesterday







  • 7




    Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
    – user202729
    yesterday








1




1




+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday




+1. I would edit your answer to make your use of $n$ consistent between the first and last sentences. (Perhaps prefer changing $n$ to $k$ in your first sentence to match the second paragraph.)
– Brian Tung
yesterday




1




1




Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday




Good answer :) I actually tried this myself. Unfortunately, as n gets larger this approach takes a lot more time than more general approaches like Newton's Method.
– Ryan Pierce Williams
yesterday




4




4




However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday




However, I wonder if this is actually faster than the existing sqrt implementation in most compiled languages.
– Brian Tung
yesterday




1




1




Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday





Newton's algorithm converges much faster (on average; worst case: your integer is the square of two large primes, then complexity approaches $mathcal O(sqrt n)$, I think.
– Marcus Müller
yesterday





7




7




Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday





Terrible. Remember that $n$ here is the number itself, which is of order $2^s$ where $s$ is number of bits used to represent the number; so $sqrt n = sqrt2^s = (sqrt 2)^s$, which is exponential in the input size. Newton's algorithm is polynomial in $s$.
– user202729
yesterday











up vote
6
down vote













With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.



(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)



  1. Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html

  2. $a=frac k2$

  3. Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.

  4. From here, do a binary search¹.

I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.



Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.



You have a technological problem, not so much a numerical.




¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.






share|cite|improve this answer























  • Step 1 assumes the input is a machine-sized integer, which may not be a case.
    – user202729
    yesterday










  • Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
    – Marcus Müller
    yesterday






  • 1




    (of course it's impossible to do it in constant time)
    – user202729
    yesterday






  • 1




    Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
    – Winther
    yesterday










  • "built-in square root function": of what, operating on what data types?
    – Marcus Müller
    yesterday














up vote
6
down vote













With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.



(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)



  1. Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html

  2. $a=frac k2$

  3. Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.

  4. From here, do a binary search¹.

I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.



Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.



You have a technological problem, not so much a numerical.




¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.






share|cite|improve this answer























  • Step 1 assumes the input is a machine-sized integer, which may not be a case.
    – user202729
    yesterday










  • Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
    – Marcus Müller
    yesterday






  • 1




    (of course it's impossible to do it in constant time)
    – user202729
    yesterday






  • 1




    Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
    – Winther
    yesterday










  • "built-in square root function": of what, operating on what data types?
    – Marcus Müller
    yesterday












up vote
6
down vote










up vote
6
down vote









With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.



(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)



  1. Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html

  2. $a=frac k2$

  3. Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.

  4. From here, do a binary search¹.

I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.



Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.



You have a technological problem, not so much a numerical.




¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.






share|cite|improve this answer















With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.



(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)



  1. Find $k=lfloorlog_2 xrfloor $, see https://graphics.stanford.edu/~seander/bithacks.html

  2. $a=frac k2$

  3. Thus, $2^a$ becomes a lower limit for $sqrt x$ and $2^a+1$ an upper. Both values can be found via bit-shifting 1.

  4. From here, do a binary search¹.

I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.



Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.



You have a technological problem, not so much a numerical.




¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $mathcal O(n^2·log_2 x)$; compare Fürer's algorithm.







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited yesterday


























answered yesterday









Marcus Müller

1827




1827











  • Step 1 assumes the input is a machine-sized integer, which may not be a case.
    – user202729
    yesterday










  • Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
    – Marcus Müller
    yesterday






  • 1




    (of course it's impossible to do it in constant time)
    – user202729
    yesterday






  • 1




    Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
    – Winther
    yesterday










  • "built-in square root function": of what, operating on what data types?
    – Marcus Müller
    yesterday
















  • Step 1 assumes the input is a machine-sized integer, which may not be a case.
    – user202729
    yesterday










  • Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
    – Marcus Müller
    yesterday






  • 1




    (of course it's impossible to do it in constant time)
    – user202729
    yesterday






  • 1




    Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
    – Winther
    yesterday










  • "built-in square root function": of what, operating on what data types?
    – Marcus Müller
    yesterday















Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday




Step 1 assumes the input is a machine-sized integer, which may not be a case.
– user202729
yesterday












Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday




Well if it is some kind of big integer: whatever big integer library you're using somehow has to keep track of the length in binary digits, anyway.
– Marcus Müller
yesterday




1




1




(of course it's impossible to do it in constant time)
– user202729
yesterday




(of course it's impossible to do it in constant time)
– user202729
yesterday




1




1




Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday




Did you try to time it? The binary search requires a lot of multiplications for large numbers so my bet is that this is much slower than the built in square root function (this is what I find with a simple test in C++ using long long integers, but its always the caveat that there is a better implementation than what I did)
– Winther
yesterday












"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday




"built-in square root function": of what, operating on what data types?
– Marcus Müller
yesterday










up vote
3
down vote













I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.



Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)



Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.



Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.



Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.






share|cite|improve this answer























  • (1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
    – user202729
    yesterday










  • (3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
    – user202729
    yesterday










  • @user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
    – Aeryk
    19 hours ago














up vote
3
down vote













I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.



Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)



Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.



Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.



Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.






share|cite|improve this answer























  • (1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
    – user202729
    yesterday










  • (3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
    – user202729
    yesterday










  • @user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
    – Aeryk
    19 hours ago












up vote
3
down vote










up vote
3
down vote









I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.



Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)



Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.



Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.



Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.






share|cite|improve this answer















I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^66$.



Fact 1: If $p$ is a prime with $p equiv 3 mod 4$ and $x$ is a perfect square $mod p$, then $$x equiv left(x^(p+1)/4right)^2 mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)



Fact 2: The numbers $m_17=2^17-1$, $m_19=2^19-1$, and $m_31=2^31-1$ are (Mersenne) primes whose product is greater than $2^66$.



Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following
$$t_17 equiv S^2^15 mod m_17$$
$$t_19 equiv S^2^17 mod m_19$$
$$t_31 equiv S^2^29 mod m_31$$
Then the Chinese Remainder Theorem gives $$t equiv pm 31207 t_17 m_19 m_31 pm 298611 m_17 t_19 m_31 pm 413071270 m_17 m_19 t_31 mod m_17m_19m_31$$ Then check these 8 possibilities.



Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $sqrtn$ of other methods for this range of $n$.







share|cite|improve this answer















share|cite|improve this answer



share|cite|improve this answer








edited 19 hours ago


























answered yesterday









Aeryk

257211




257211











  • (1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
    – user202729
    yesterday










  • (3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
    – user202729
    yesterday










  • @user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
    – Aeryk
    19 hours ago
















  • (1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
    – user202729
    yesterday










  • (3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
    – user202729
    yesterday










  • @user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
    – Aeryk
    19 hours ago















(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday




(1) I've never heard of a practical algorithm computing square root that may not terminate. (2) (as I pointed out in a comment under the question) it's impossible to do this in constant time, as storing a number $n$ takes at least $O(log n)$ bits.
– user202729
yesterday












(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday




(3) Note that for $n$ being in a constant range (in this case $[1 dots 2^66]$, any (deterministic, correct) algorithm must run in constant asymptotic time complexity. (4) It's only necessary for $p$ to be larger than $2sqrt n$ instead of $n$, the proof is left to the reader. ... and ...
– user202729
yesterday












@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago




@user202729 You're totally right about the 8 solution ambiguity. I made the correction in the exposition above. Thanks!
– Aeryk
19 hours ago










up vote
1
down vote













I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.



Let $n$ be the input value.



  1. Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.


  2. Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.


  3. Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.


The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...






share|cite|improve this answer





















  • Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
    – Ryan Pierce Williams
    yesterday










  • @RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
    – rcgldr
    yesterday






  • 3




    Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
    – ktb
    yesterday






  • 2




    Yes, this is less efficient than Newton's method.
    – user202729
    yesterday














up vote
1
down vote













I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.



Let $n$ be the input value.



  1. Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.


  2. Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.


  3. Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.


The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...






share|cite|improve this answer





















  • Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
    – Ryan Pierce Williams
    yesterday










  • @RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
    – rcgldr
    yesterday






  • 3




    Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
    – ktb
    yesterday






  • 2




    Yes, this is less efficient than Newton's method.
    – user202729
    yesterday












up vote
1
down vote










up vote
1
down vote









I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.



Let $n$ be the input value.



  1. Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.


  2. Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.


  3. Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.


The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...






share|cite|improve this answer













I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.



Let $n$ be the input value.



  1. Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.


  2. Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.


  3. Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.


The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...







share|cite|improve this answer













share|cite|improve this answer



share|cite|improve this answer











answered yesterday









alcana

812




812











  • Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
    – Ryan Pierce Williams
    yesterday










  • @RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
    – rcgldr
    yesterday






  • 3




    Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
    – ktb
    yesterday






  • 2




    Yes, this is less efficient than Newton's method.
    – user202729
    yesterday
















  • Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
    – Ryan Pierce Williams
    yesterday










  • @RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
    – rcgldr
    yesterday






  • 3




    Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
    – ktb
    yesterday






  • 2




    Yes, this is less efficient than Newton's method.
    – user202729
    yesterday















Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday




Thanks for your answer alcana :) This is a more general algorithm for the square root (would work for non-perfect squares as well with some minor modifications to the logic for determining when we are done). It doesn't take advantage of the special property of the input value: that it is a perfect-square. It is faster than the previous answer as the input size increases however :)
– Ryan Pierce Williams
yesterday












@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday




@RyanPierceWilliams - also note that on many current Intel processors, multiplies are optimized and only take 1 to 3 cycles, so for larger n, the binary search using multiply and compare should be faster.
– rcgldr
yesterday




3




3




Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday




Isn't this very similar to Newton's method, just less efficient because it doesn't use deltas to correct the prediction, just a blind binary split testing?
– ktb
yesterday




2




2




Yes, this is less efficient than Newton's method.
– user202729
yesterday




Yes, this is less efficient than Newton's method.
– user202729
yesterday












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2872437%2fshortcut-to-finding-the-square-root-of-a-perfect-square%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?