Leave-$k$-out greatest common divisor
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
The problem
I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.
I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,
$$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$
where $N$ is again a multiset.
What I've tried
The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:
$$ textgcd_0(M) = textgcdbig(textset(M)big), $$
where $textset(M)$ associates to a multiset $M$ its underlying set.
Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then
$$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$
We can calculate $textgcd_k(M)$ recursively,
$$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$
This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...
Environment
I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.
Why do I care?
I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:
These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.
Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):
$$ beginarray*5c
hline
m_i & 0 & 240 & 432 & 552 & 864 \
hline
#m_i & 705 & 1 & 15 & 1 & 3 \
hline
endarray $$
We can calculate $textgcd_k(M)$ recursively as above and obtain:
$$ beginarrayc
hline
k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
hline
textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
hline
endarray $$
combinatorics algorithms greatest-common-divisor multisets
add a comment |Â
up vote
1
down vote
favorite
The problem
I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.
I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,
$$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$
where $N$ is again a multiset.
What I've tried
The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:
$$ textgcd_0(M) = textgcdbig(textset(M)big), $$
where $textset(M)$ associates to a multiset $M$ its underlying set.
Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then
$$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$
We can calculate $textgcd_k(M)$ recursively,
$$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$
This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...
Environment
I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.
Why do I care?
I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:
These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.
Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):
$$ beginarray*5c
hline
m_i & 0 & 240 & 432 & 552 & 864 \
hline
#m_i & 705 & 1 & 15 & 1 & 3 \
hline
endarray $$
We can calculate $textgcd_k(M)$ recursively as above and obtain:
$$ beginarrayc
hline
k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
hline
textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
hline
endarray $$
combinatorics algorithms greatest-common-divisor multisets
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
The problem
I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.
I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,
$$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$
where $N$ is again a multiset.
What I've tried
The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:
$$ textgcd_0(M) = textgcdbig(textset(M)big), $$
where $textset(M)$ associates to a multiset $M$ its underlying set.
Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then
$$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$
We can calculate $textgcd_k(M)$ recursively,
$$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$
This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...
Environment
I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.
Why do I care?
I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:
These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.
Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):
$$ beginarray*5c
hline
m_i & 0 & 240 & 432 & 552 & 864 \
hline
#m_i & 705 & 1 & 15 & 1 & 3 \
hline
endarray $$
We can calculate $textgcd_k(M)$ recursively as above and obtain:
$$ beginarrayc
hline
k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
hline
textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
hline
endarray $$
combinatorics algorithms greatest-common-divisor multisets
The problem
I have a multiset $M=m_1, dots, m_n$ of positive integers (that is, a number $m_i$ can appear multiple times in $M$), and a positive integer $k$.
I am looking for an algorithm to determine the greatest greatest common divisor (greatest gcd; "greatest greatest" is not a typo) of all sub-multisets that can be derived from $M$ by deleting $k$ elements. Formally,
$$ textgcd_k(M) := max_substack Nsubset M \ #N=k textgcd (Msetminus N), $$
where $N$ is again a multiset.
What I've tried
The case $k=0$ is the "ordinary" $textgcd$, which I'll assume we can calculate easily:
$$ textgcd_0(M) = textgcdbig(textset(M)big), $$
where $textset(M)$ associates to a multiset $M$ its underlying set.
Without loss of generality, we can assume that the multiplicity of each entry in $M$ is greater than $k$, because any element of $M$ that occurs more than $k$ times will not be "active" in determining $textgcd_k(M)$. Formally, if $M'subset M$ consists of those elements of $M$ with multiplicity at most $k$ (with multiplicities, i.e., $M'$ is again a multiset), then
$$ textgcd_k (M) = textgcdBig(bigtextgcd_k(M') cup textgcdbigtextset(Msetminus M'big)Big). $$
We can calculate $textgcd_k(M)$ recursively,
$$ textgcd_k(M) = max_min M textgcd_k-1big(Msetminusmbig). $$
This is what I am doing right now on a dataset, and which is running long enough for me to post this question. I'd prefer something quicker...
Environment
I don't have large numbers. $M$ won't contain much more than 100 numbers, counted with multiplicities, and $k$ won't exceed 10. However, I do need to do this quickly on thousands or even millions of different $M$s.
Why do I care?
I am working on time series that are "almost-multiples" of an underlying $textgcd$, like this one:
These are orders a retail store places at the wholesaler. The underlying multiple is a logistical unit, which I would like to infer from the orders, since it may not be available elsewhere in the system. What complicates matters, and motivates the "leave-$k$-out" aspect, is that sometimes orders are placed which are "not" multiples of this logistical unit.
Disregarding the time dimension for the moment, a table of the values here looks like this (I'll discard the zeros first thing):
$$ beginarray*5c
hline
m_i & 0 & 240 & 432 & 552 & 864 \
hline
#m_i & 705 & 1 & 15 & 1 & 3 \
hline
endarray $$
We can calculate $textgcd_k(M)$ recursively as above and obtain:
$$ beginarrayc
hline
k & 0 & 1 & 2 & 3 & 4 & 5 & 6 \
hline
textgcd_k(M) & 24 & 48 & 432 & 432 & 432 & 432 & 432 \
hline
endarray $$
combinatorics algorithms greatest-common-divisor multisets
asked Jul 16 at 15:40
Stephan Kolassa
445410
445410
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:
$$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$
Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.
Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:
$$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$
Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.
Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).
add a comment |Â
up vote
0
down vote
Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:
$$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$
Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.
Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:
$$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$
Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.
Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).
Are you memoizing $gcd_k$? Because with $|M| = 100$, $k=10$ that should finish rather quickly. Or alternatively, you can build up a dynamic programming table where $G[n][k]$ is $gcd_k$ of the first $n$ numbers (with $G[0][k] = 0$ and by convention $gcd(0, a) = a$). With a recurrence like this:
$$G[n][k] = maxbig(gcd(G[n-1][k], a_n), G[n-1][k-1]big)$$
Again, with $|M|cdot k approx 1000$ that ought to finish really quickly.
Addendum: to consider multiplicities, the $k-1$ (leaving out $a_n$) should be $k - m_n$ (leaving out all $m_n$ copies of $a_n$).
answered Jul 18 at 15:44
Timon Knigge
484
484
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853516%2fleave-k-out-greatest-common-divisor%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password