How to use Kullback-Leibler Divergence if probability distributions have different support?

up vote
3
down vote

favorite

I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.

I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?

If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?

One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.

Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.

edited Jul 26 at 20:36

asked Jul 22 at 10:57

Ollie

697

A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
â€“Â stochasticboy321
Jul 26 at 20:22

$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
â€“Â Ollie
Jul 26 at 20:39

1

There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
â€“Â Nik Pronko
Jul 26 at 20:52

In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
â€“Â stochasticboy321
Jul 26 at 22:55

add a commentÂ |Â

up vote
3
down vote

favorite

I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?

If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?

Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.

edited Jul 26 at 20:36

asked Jul 22 at 10:57

Ollie

697

A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
â€“Â stochasticboy321
Jul 26 at 20:22

$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
â€“Â Ollie
Jul 26 at 20:39

1

There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
â€“Â Nik Pronko
Jul 26 at 20:52

In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
â€“Â stochasticboy321
Jul 26 at 22:55

add a commentÂ |Â

up vote
3
down vote

favorite

I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?

If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?

Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.

edited Jul 26 at 20:36

asked Jul 22 at 10:57

Ollie

697

I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?

If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?

Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.

edited Jul 26 at 20:36

asked Jul 22 at 10:57

Ollie

697

edited Jul 26 at 20:36

asked Jul 22 at 10:57

Ollie

697

asked Jul 22 at 10:57

Ollie

697

asked Jul 22 at 10:57

Ollie

697

A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
â€“Â stochasticboy321
Jul 26 at 20:22

$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
â€“Â Ollie
Jul 26 at 20:39

1

There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
â€“Â Nik Pronko
Jul 26 at 20:52

In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
â€“Â stochasticboy321
Jul 26 at 22:55

add a commentÂ |Â

A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
â€“Â stochasticboy321
Jul 26 at 20:22

$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
â€“Â Ollie
Jul 26 at 20:39

1

There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
â€“Â Nik Pronko
Jul 26 at 20:52

In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
â€“Â stochasticboy321
Jul 26 at 22:55

A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
â€“Â stochasticboy321
Jul 26 at 20:22

$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
â€“Â Ollie
Jul 26 at 20:39

There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
â€“Â Nik Pronko
Jul 26 at 20:52

In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
â€“Â stochasticboy321
Jul 26 at 22:55

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859284%2fhow-to-use-kullback-leibler-divergence-if-probability-distributions-have-differe%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik