Leave-$k$-out greatest common divisor

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












The problem



I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.



I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,



$$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$



where $N$ is again a multiset.



What I've tried




  • The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:



    $$ textgcd_0(M) = textgcdbig(textset(M)big), $$



    where $textset(M)$ associates to a multiset $M$ its underlying set.




  • Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then



    $$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$




  • We can calculate $textgcd_k(M)$ recursively,



    $$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$



    This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...



Environment



I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.



Why do I care?



I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:



time series



These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.



Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):



$$ beginarray*5c
hline
m_i & 0 & 240 & 432 & 552 & 864 \
hline
#m_i & 705 & 1 & 15 & 1 & 3 \
hline
endarray $$



We can calculate $textgcd_k(M)$ recursively as above and obtain:



$$ beginarrayc
hline
k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
hline
textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
hline
endarray $$



GCDs







share|cite|improve this question























    up vote
    1
    down vote

    favorite












    The problem



    I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.



    I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,



    $$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$



    where $N$ is again a multiset.



    What I've tried




    • The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:



      $$ textgcd_0(M) = textgcdbig(textset(M)big), $$



      where $textset(M)$ associates to a multiset $M$ its underlying set.




    • Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then



      $$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$




    • We can calculate $textgcd_k(M)$ recursively,



      $$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$



      This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...



    Environment



    I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.



    Why do I care?



    I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:



    time series



    These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.



    Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):



    $$ beginarray*5c
    hline
    m_i & 0 & 240 & 432 & 552 & 864 \
    hline
    #m_i & 705 & 1 & 15 & 1 & 3 \
    hline
    endarray $$



    We can calculate $textgcd_k(M)$ recursively as above and obtain:



    $$ beginarrayc
    hline
    k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
    hline
    textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
    hline
    endarray $$



    GCDs







    share|cite|improve this question





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      The problem



      I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.



      I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,



      $$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$



      where $N$ is again a multiset.



      What I've tried




      • The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:



        $$ textgcd_0(M) = textgcdbig(textset(M)big), $$



        where $textset(M)$ associates to a multiset $M$ its underlying set.




      • Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then



        $$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$




      • We can calculate $textgcd_k(M)$ recursively,



        $$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$



        This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...



      Environment



      I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.



      Why do I care?



      I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:



      time series



      These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.



      Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):



      $$ beginarray*5c
      hline
      m_i & 0 & 240 & 432 & 552 & 864 \
      hline
      #m_i & 705 & 1 & 15 & 1 & 3 \
      hline
      endarray $$



      We can calculate $textgcd_k(M)$ recursively as above and obtain:



      $$ beginarrayc
      hline
      k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
      hline
      textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
      hline
      endarray $$



      GCDs







      share|cite|improve this question











      The problem



      I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.



      I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,



      $$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$



      where $N$ is again a multiset.



      What I've tried




      • The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:



        $$ textgcd_0(M) = textgcdbig(textset(M)big), $$



        where $textset(M)$ associates to a multiset $M$ its underlying set.




      • Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then



        $$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$




      • We can calculate $textgcd_k(M)$ recursively,



        $$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$



        This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...



      Environment



      I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.



      Why do I care?



      I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:



      time series



      These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.



      Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):



      $$ beginarray*5c
      hline
      m_i & 0 & 240 & 432 & 552 & 864 \
      hline
      #m_i & 705 & 1 & 15 & 1 & 3 \
      hline
      endarray $$



      We can calculate $textgcd_k(M)$ recursively as above and obtain:



      $$ beginarrayc
      hline
      k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
      hline
      textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
      hline
      endarray $$



      GCDs









      share|cite|improve this question










      share|cite|improve this question




      share|cite|improve this question









      asked Jul 16 at 15:40









      Stephan Kolassa

      445410




      445410




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:



          $$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$



          Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.



          Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).






          share|cite|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "69"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853516%2fleave-k-out-greatest-common-divisor%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:



            $$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$



            Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.



            Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).






            share|cite|improve this answer

























              up vote
              0
              down vote













              Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:



              $$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$



              Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.



              Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).






              share|cite|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:



                $$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$



                Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.



                Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).






                share|cite|improve this answer













                Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:



                $$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$



                Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.



                Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).







                share|cite|improve this answer













                share|cite|improve this answer



                share|cite|improve this answer











                answered Jul 18 at 15:44









                Timon Knigge

                484




                484






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853516%2fleave-k-out-greatest-common-divisor%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What is the equation of a 3D cone with generalised tilt?

                    Color the edges and diagonals of a regular polygon

                    Relationship between determinant of matrix and determinant of adjoint?