How to use Kullback-Leibler Divergence if probability distributions have different support?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
2












I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.



I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?



If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?



One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.



Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.







share|cite|improve this question





















  • A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
    – stochasticboy321
    Jul 26 at 20:22











  • $X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
    – Ollie
    Jul 26 at 20:39






  • 1




    There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
    – Nik Pronko
    Jul 26 at 20:52










  • In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
    – stochasticboy321
    Jul 26 at 22:55















up vote
3
down vote

favorite
2












I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.



I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?



If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?



One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.



Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.







share|cite|improve this question





















  • A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
    – stochasticboy321
    Jul 26 at 20:22











  • $X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
    – Ollie
    Jul 26 at 20:39






  • 1




    There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
    – Nik Pronko
    Jul 26 at 20:52










  • In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
    – stochasticboy321
    Jul 26 at 22:55













up vote
3
down vote

favorite
2









up vote
3
down vote

favorite
2






2





I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.



I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?



If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?



One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.



Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.







share|cite|improve this question













I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.



I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?



If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?



One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.



Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 26 at 20:36
























asked Jul 22 at 10:57









Ollie

697




697











  • A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
    – stochasticboy321
    Jul 26 at 20:22











  • $X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
    – Ollie
    Jul 26 at 20:39






  • 1




    There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
    – Nik Pronko
    Jul 26 at 20:52










  • In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
    – stochasticboy321
    Jul 26 at 22:55

















  • A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
    – stochasticboy321
    Jul 26 at 20:22











  • $X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
    – Ollie
    Jul 26 at 20:39






  • 1




    There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
    – Nik Pronko
    Jul 26 at 20:52










  • In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
    – stochasticboy321
    Jul 26 at 22:55
















A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22





A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22













$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39




$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39




1




1




There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52




There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52












In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55





In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859284%2fhow-to-use-kullback-leibler-divergence-if-probability-distributions-have-differe%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859284%2fhow-to-use-kullback-leibler-divergence-if-probability-distributions-have-differe%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?