$n$-sphere enclosing the Birkhoff polytope
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:
Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.
When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.
The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.
So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).
Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?
Thanks!
general-topology permutations approximation polytopes relaxations
add a comment |Â
up vote
4
down vote
favorite
I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:
Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.
When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.
The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.
So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).
Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?
Thanks!
general-topology permutations approximation polytopes relaxations
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:
Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.
When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.
The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.
So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).
Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?
Thanks!
general-topology permutations approximation polytopes relaxations
I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:
Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.
When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.
The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.
So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).
Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?
Thanks!
general-topology permutations approximation polytopes relaxations
edited Jul 22 at 15:50
Rodrigo de Azevedo
12.6k41751
12.6k41751
asked Jul 21 at 22:49
Tolga Birdal
20128
20128
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44
add a comment |Â
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.
Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.
Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
add a comment |Â
up vote
1
down vote
Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.
Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.
Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.
Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.
Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.
answered Aug 1 at 14:05


David E Speyer
44.2k4120197
44.2k4120197
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
add a comment |Â
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858947%2fn-sphere-enclosing-the-birkhoff-polytope%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24
I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29
"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39
Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44