$n$-sphere enclosing the Birkhoff polytope

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite












I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:



Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.



When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.



The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.



So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).



Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?



Thanks!







share|cite|improve this question





















  • By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
    – joriki
    Jul 21 at 23:24










  • I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
    – Tolga Birdal
    Jul 21 at 23:29










  • "largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
    – joriki
    Jul 21 at 23:39










  • Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
    – Tolga Birdal
    Jul 21 at 23:44















up vote
4
down vote

favorite












I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:



Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.



When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.



The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.



So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).



Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?



Thanks!







share|cite|improve this question





















  • By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
    – joriki
    Jul 21 at 23:24










  • I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
    – Tolga Birdal
    Jul 21 at 23:29










  • "largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
    – joriki
    Jul 21 at 23:39










  • Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
    – Tolga Birdal
    Jul 21 at 23:44













up vote
4
down vote

favorite









up vote
4
down vote

favorite











I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:



Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.



When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.



The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.



So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).



Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?



Thanks!







share|cite|improve this question













I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:



Let $P$ denote a $textitpermutation matrix$:
$$
beginequation
P := X in 0,1^ntimes n : X mathbf1_n = mathbf1_n,,,mathbf1_n^T X = mathbf1_n^T.
endequation
$$
where $mathbf1_n$ denotes an $n$-dimensional ones vector.



When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $mathcalDP_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $mathbbR^ntimes n$, defined as:
$$
beginalign
mathcalDP_n = ,,X :& X_ij>0 ,,, i,j in1,...,n ,,, wedge sumlimits_i=1^n X_ij=1 ,wedge, sumlimits_j=1^n X_ij=1 ,,
endalign
$$
In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.



The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $mathcalDP_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$):
$$
P=X in mathcalDP_n : X X^T=mathbfI
$$
The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^1/2$.



So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).



Now comes my question: As $n rightarrow infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?



Thanks!









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 22 at 15:50









Rodrigo de Azevedo

12.6k41751




12.6k41751









asked Jul 21 at 22:49









Tolga Birdal

20128




20128











  • By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
    – joriki
    Jul 21 at 23:24










  • I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
    – Tolga Birdal
    Jul 21 at 23:29










  • "largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
    – joriki
    Jul 21 at 23:39










  • Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
    – Tolga Birdal
    Jul 21 at 23:44

















  • By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
    – joriki
    Jul 21 at 23:24










  • I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
    – Tolga Birdal
    Jul 21 at 23:29










  • "largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
    – joriki
    Jul 21 at 23:39










  • Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
    – Tolga Birdal
    Jul 21 at 23:44
















By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24




By "error of approximation", do you mean the Frobenius norm of the difference between a doubly schoastic matrix and the nearest point on the sphere? And if so, isn't the maximum distance attained at the matrix with constant entries $frac1n$?
– joriki
Jul 21 at 23:24












I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29




I meant the largest (or mean) error one attains by walking on the sphere rather than on the polytope (assuming that Birkhoff is the correct relaxation of course).
– Tolga Birdal
Jul 21 at 23:29












"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39




"largest (or mean)" in the sense that you'd be interested in knowing either? What do you want to average over? The sphere and/or the polytope? And what do you mean by "the correct relaxation" -- correct in what sense?
– joriki
Jul 21 at 23:39












Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44





Let's say we walk from $P_0$ to $P_1$. On Birkhoff the path would be a straight line, an edge rather, if they are neighbors. On $O(n)$, that would be a geodesic of the Stiefel, and takes longer to travel. Therefore, I am assuming that Birkhoff is a better relaxation. However, how better? As dimension grows, do we gain or do we lose? Any measure would be fine in that case, as soon as it can give a hint on the quality of approximation. Does that make sense? You might also be right that $1/n$ becomes the farthest point of $mathcalDP_n$ to $mathcalS_n$. Maybe it is possible to show this?
– Tolga Birdal
Jul 21 at 23:44











1 Answer
1






active

oldest

votes

















up vote
1
down vote













Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.



Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.






share|cite|improve this answer





















  • Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
    – Tolga Birdal
    Aug 2 at 8:18










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858947%2fn-sphere-enclosing-the-birkhoff-polytope%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.



Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.






share|cite|improve this answer





















  • Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
    – Tolga Birdal
    Aug 2 at 8:18














up vote
1
down vote













Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.



Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.






share|cite|improve this answer





















  • Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
    – Tolga Birdal
    Aug 2 at 8:18












up vote
1
down vote










up vote
1
down vote









Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.



Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.






share|cite|improve this answer













Let $J$ be the $n times n$ matrix each of whose entries are $1$. Then the ray $mathbbR_geq 0 J$ exits $DP_n$ at $tfrac1n J$ but exits the sphere at $tfrac1sqrtn J$.



Let $lambda$ be the linear function $lambda(X) = sum_i,j X_ij$. Then $lambda$ is optimized on $DP_n$ at $tfrac1n J$ and on the sphere at $tfrac1sqrtn J$, so the ratio between the true optimum and the spherical approximation is $sqrtn$, which goes to $infty$ as $n$ grows.







share|cite|improve this answer













share|cite|improve this answer



share|cite|improve this answer











answered Aug 1 at 14:05









David E Speyer

44.2k4120197




44.2k4120197











  • Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
    – Tolga Birdal
    Aug 2 at 8:18
















  • Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
    – Tolga Birdal
    Aug 2 at 8:18















Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18




Good thought, I have two questions though: 1. How can one prove that we are not shooting the ray at a location close to one of the vertices, but to the center of the face? And would that ray be orthogonal to the face? Maybe this was encoded in the second part but wasn't clear to me.
– Tolga Birdal
Aug 2 at 8:18












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858947%2fn-sphere-enclosing-the-birkhoff-polytope%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?