How to use Kullback-Leibler Divergence if probability distributions have different support?
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.
I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?
If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?
One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.
Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.
probability probability-distributions information-theory
add a comment |Â
up vote
3
down vote
favorite
I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.
I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?
If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?
One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.
Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.
probability probability-distributions information-theory
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
1
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.
I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?
If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?
One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.
Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.
probability probability-distributions information-theory
I have two discrete random variables $X$ and $Y$ and their distributions have different support. Assume $X$ and $Y$ can both take on the same number of values. Lets say $X$ takes values in $10,13,15,17,19$ and $Y$ takes values in $12,14,16,18,20$.
I would like to use the Kullback-Leibler Divergence but it requires that Q dominates P. Is it possible to modify the support of each random variable so that they have the same support?
If not, are there any measures of statistical distance that do not require $X$ and $Y$ to have the same support?
One solution I have created is to make kernel density estimators with a gaussian kernel using the datasets collected on $X$ and $Y$. Now the densities $hatf(x)$ and $hatg(y)$ have support on $( -infty, infty)$ and with suitable bandwidth they are multimodal with modes centered around the support of the original random variables. It remains to be seen how wise or foolish of an idea this is.
Note: Since the KL divergence of a finite gaussian mixture does not have a closed form solution, I used monte carlo methods to estimate it.
probability probability-distributions information-theory
edited Jul 26 at 20:36
asked Jul 22 at 10:57
Ollie
697
697
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
1
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
add a comment |Â
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
1
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
1
1
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859284%2fhow-to-use-kullback-leibler-divergence-if-probability-distributions-have-differe%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
A question about your setting - are you aware that $X$ and $Y$ have distributions of differing support, or is it the case that they have the same support but you have samples that are disjointly supported? In the latter case KL intrinsically makes sense for the underlying distributions, but not on the empirical distributions (and there are smoothing procedures that may be appropriate). However, in the former case, any procedure like the ones you describe boil down to throwing information away for the sake of some metric, and this makes absolutely no sense.
– stochasticboy321
Jul 26 at 20:22
$X$ and $Y$ most certainly have distributions of different support. The overarching background question in my mind was if there was a way to measure the distance between distributions if their support is certainly different. KL divergence was an intermediary calculation I wanted because the square root of the jensen-shannon distance built on KL is indeed a metric. I have cleaned up the question to make the intent more clear.
– Ollie
Jul 26 at 20:39
1
There are different metrics, which can be used for measuring distance between probability distributions. For example, Wasserstein metric is indeed a metric and schould work for distributions with different supports. I recommend this article as an overview of other options: arxiv.org/pdf/math/0209021.pdf
– Nik Pronko
Jul 26 at 20:52
In this case, I don't think any $f$-divergence (e.g. KL, TV...) is appropriate, since these are intimately related with various error metrics in the hypothesis testing of these distributions, which is trivial in this case. Distances like Wasserstein are good if your random variables take values in a metric space. For categorical data, I'm a little unsure. One could always take some ref. measure and do a Jensen-Shannon type trick to get a non-trivial div., but I'm unaware of any neat use-case for these. Ultimately, of course, the div. you pick should be linked to how you want to apply it.
– stochasticboy321
Jul 26 at 22:55