Amount of information in one experiment
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Let $X$ be a random variable with two possible values as outcomes and probabilities $p,1-p$. The experiment described, once carried out, answers a true/false question.
If I'm not mistaking the definitions: one defines that this experiment after carried out yields one bit of information.
So one bit of information quantifies the knowledge of the answer to a true/false variable.
Now let $Z$ be a discrete random variable with values $(z_i)_iin I$ and probabilities $(p_i)_iin I$. Here I'm assuming that $I$ is finite.
If one performs the experiment, how do we know the amount of information one will have gained?
My initial idea was to "decompose $Z$ into true/false questions". So for fixed $i$ we ask "is the value observed $z_i$?" This is a true/false question and the answer yields one bit of information.
This would suggest that observing $Z$ yields $|I|$ bits of information. But this is clearly wrong. The initial random variable $X$ has $I=0,1$ and hence $|I|=2$, while we know we should have 1 bit of information only.
Again, I might have the wrong definition of what is information, or of what is one bit.
So my question is: quantifying information in bits, if $Z$ is a random variable as above described, after we observe $Z$, how many bits of information we have?
probability statistics random-variables information-theory
add a comment |Â
up vote
1
down vote
favorite
Let $X$ be a random variable with two possible values as outcomes and probabilities $p,1-p$. The experiment described, once carried out, answers a true/false question.
If I'm not mistaking the definitions: one defines that this experiment after carried out yields one bit of information.
So one bit of information quantifies the knowledge of the answer to a true/false variable.
Now let $Z$ be a discrete random variable with values $(z_i)_iin I$ and probabilities $(p_i)_iin I$. Here I'm assuming that $I$ is finite.
If one performs the experiment, how do we know the amount of information one will have gained?
My initial idea was to "decompose $Z$ into true/false questions". So for fixed $i$ we ask "is the value observed $z_i$?" This is a true/false question and the answer yields one bit of information.
This would suggest that observing $Z$ yields $|I|$ bits of information. But this is clearly wrong. The initial random variable $X$ has $I=0,1$ and hence $|I|=2$, while we know we should have 1 bit of information only.
Again, I might have the wrong definition of what is information, or of what is one bit.
So my question is: quantifying information in bits, if $Z$ is a random variable as above described, after we observe $Z$, how many bits of information we have?
probability statistics random-variables information-theory
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Let $X$ be a random variable with two possible values as outcomes and probabilities $p,1-p$. The experiment described, once carried out, answers a true/false question.
If I'm not mistaking the definitions: one defines that this experiment after carried out yields one bit of information.
So one bit of information quantifies the knowledge of the answer to a true/false variable.
Now let $Z$ be a discrete random variable with values $(z_i)_iin I$ and probabilities $(p_i)_iin I$. Here I'm assuming that $I$ is finite.
If one performs the experiment, how do we know the amount of information one will have gained?
My initial idea was to "decompose $Z$ into true/false questions". So for fixed $i$ we ask "is the value observed $z_i$?" This is a true/false question and the answer yields one bit of information.
This would suggest that observing $Z$ yields $|I|$ bits of information. But this is clearly wrong. The initial random variable $X$ has $I=0,1$ and hence $|I|=2$, while we know we should have 1 bit of information only.
Again, I might have the wrong definition of what is information, or of what is one bit.
So my question is: quantifying information in bits, if $Z$ is a random variable as above described, after we observe $Z$, how many bits of information we have?
probability statistics random-variables information-theory
Let $X$ be a random variable with two possible values as outcomes and probabilities $p,1-p$. The experiment described, once carried out, answers a true/false question.
If I'm not mistaking the definitions: one defines that this experiment after carried out yields one bit of information.
So one bit of information quantifies the knowledge of the answer to a true/false variable.
Now let $Z$ be a discrete random variable with values $(z_i)_iin I$ and probabilities $(p_i)_iin I$. Here I'm assuming that $I$ is finite.
If one performs the experiment, how do we know the amount of information one will have gained?
My initial idea was to "decompose $Z$ into true/false questions". So for fixed $i$ we ask "is the value observed $z_i$?" This is a true/false question and the answer yields one bit of information.
This would suggest that observing $Z$ yields $|I|$ bits of information. But this is clearly wrong. The initial random variable $X$ has $I=0,1$ and hence $|I|=2$, while we know we should have 1 bit of information only.
Again, I might have the wrong definition of what is information, or of what is one bit.
So my question is: quantifying information in bits, if $Z$ is a random variable as above described, after we observe $Z$, how many bits of information we have?
probability statistics random-variables information-theory
asked Jul 16 at 19:46
user1620696
11k336103
11k336103
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42
add a comment |Â
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
The instantaneous information content (measured in terms of Shannnon entropy) of your binary experiment (yielding one exemplar of X) in bits is $- log _2p$ if you get true and $- log _2left( 1 - p right)$ if you get false. This equals one bit only if $p = 0.5$. The equivalent instantaneous information content (also measured in terms of Shannnon entropy) of your multi-outcome experiment (yielding one exemplar of Z) in bits is $- log _2p_i$, which depends upon which particular outcome you observe. Needless to say, rarer outcomes carry more information content.
Your intuition of breaking things down into binary questions is a (VERY) good one, but doing it using the Shannon information measurement scale can be quite cumbersome. A much more intuitive way is to use the discriminating information measurement scale, where the instantaneous information content (in bits) from a binary experiment is $log _2left( fracp1 - p right)$ if you get true and $log _2left( frac1 - pp right) = - log _2left( fracp1 - p right)$ if you get false, although we typically use natural logs, leading to units of "nats". The discriminating information scale can be thought of as an alternative scale for measuring information (like Celsius and Fahrenheit temperature scales, except that the relationship between entropy and discriminating information is not linear). Using the discriminating information scale, the amount of information the exemplar of Z provides is precisely the same as the amount of information that it would provide for the binary experiment were the result is true, namely $log _2left( fracp_i1 - p_i right)$. This makes it very easy to cast more complicated experiments into equivalent sets of binary experiments, and suggests (I don't know how to prove this yet) that information measurement for any problem of arbitrary complexity can always be cast in terms of the discriminating information content of all the underlying binary experiments.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
The instantaneous information content (measured in terms of Shannnon entropy) of your binary experiment (yielding one exemplar of X) in bits is $- log _2p$ if you get true and $- log _2left( 1 - p right)$ if you get false. This equals one bit only if $p = 0.5$. The equivalent instantaneous information content (also measured in terms of Shannnon entropy) of your multi-outcome experiment (yielding one exemplar of Z) in bits is $- log _2p_i$, which depends upon which particular outcome you observe. Needless to say, rarer outcomes carry more information content.
Your intuition of breaking things down into binary questions is a (VERY) good one, but doing it using the Shannon information measurement scale can be quite cumbersome. A much more intuitive way is to use the discriminating information measurement scale, where the instantaneous information content (in bits) from a binary experiment is $log _2left( fracp1 - p right)$ if you get true and $log _2left( frac1 - pp right) = - log _2left( fracp1 - p right)$ if you get false, although we typically use natural logs, leading to units of "nats". The discriminating information scale can be thought of as an alternative scale for measuring information (like Celsius and Fahrenheit temperature scales, except that the relationship between entropy and discriminating information is not linear). Using the discriminating information scale, the amount of information the exemplar of Z provides is precisely the same as the amount of information that it would provide for the binary experiment were the result is true, namely $log _2left( fracp_i1 - p_i right)$. This makes it very easy to cast more complicated experiments into equivalent sets of binary experiments, and suggests (I don't know how to prove this yet) that information measurement for any problem of arbitrary complexity can always be cast in terms of the discriminating information content of all the underlying binary experiments.
add a comment |Â
up vote
0
down vote
The instantaneous information content (measured in terms of Shannnon entropy) of your binary experiment (yielding one exemplar of X) in bits is $- log _2p$ if you get true and $- log _2left( 1 - p right)$ if you get false. This equals one bit only if $p = 0.5$. The equivalent instantaneous information content (also measured in terms of Shannnon entropy) of your multi-outcome experiment (yielding one exemplar of Z) in bits is $- log _2p_i$, which depends upon which particular outcome you observe. Needless to say, rarer outcomes carry more information content.
Your intuition of breaking things down into binary questions is a (VERY) good one, but doing it using the Shannon information measurement scale can be quite cumbersome. A much more intuitive way is to use the discriminating information measurement scale, where the instantaneous information content (in bits) from a binary experiment is $log _2left( fracp1 - p right)$ if you get true and $log _2left( frac1 - pp right) = - log _2left( fracp1 - p right)$ if you get false, although we typically use natural logs, leading to units of "nats". The discriminating information scale can be thought of as an alternative scale for measuring information (like Celsius and Fahrenheit temperature scales, except that the relationship between entropy and discriminating information is not linear). Using the discriminating information scale, the amount of information the exemplar of Z provides is precisely the same as the amount of information that it would provide for the binary experiment were the result is true, namely $log _2left( fracp_i1 - p_i right)$. This makes it very easy to cast more complicated experiments into equivalent sets of binary experiments, and suggests (I don't know how to prove this yet) that information measurement for any problem of arbitrary complexity can always be cast in terms of the discriminating information content of all the underlying binary experiments.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The instantaneous information content (measured in terms of Shannnon entropy) of your binary experiment (yielding one exemplar of X) in bits is $- log _2p$ if you get true and $- log _2left( 1 - p right)$ if you get false. This equals one bit only if $p = 0.5$. The equivalent instantaneous information content (also measured in terms of Shannnon entropy) of your multi-outcome experiment (yielding one exemplar of Z) in bits is $- log _2p_i$, which depends upon which particular outcome you observe. Needless to say, rarer outcomes carry more information content.
Your intuition of breaking things down into binary questions is a (VERY) good one, but doing it using the Shannon information measurement scale can be quite cumbersome. A much more intuitive way is to use the discriminating information measurement scale, where the instantaneous information content (in bits) from a binary experiment is $log _2left( fracp1 - p right)$ if you get true and $log _2left( frac1 - pp right) = - log _2left( fracp1 - p right)$ if you get false, although we typically use natural logs, leading to units of "nats". The discriminating information scale can be thought of as an alternative scale for measuring information (like Celsius and Fahrenheit temperature scales, except that the relationship between entropy and discriminating information is not linear). Using the discriminating information scale, the amount of information the exemplar of Z provides is precisely the same as the amount of information that it would provide for the binary experiment were the result is true, namely $log _2left( fracp_i1 - p_i right)$. This makes it very easy to cast more complicated experiments into equivalent sets of binary experiments, and suggests (I don't know how to prove this yet) that information measurement for any problem of arbitrary complexity can always be cast in terms of the discriminating information content of all the underlying binary experiments.
The instantaneous information content (measured in terms of Shannnon entropy) of your binary experiment (yielding one exemplar of X) in bits is $- log _2p$ if you get true and $- log _2left( 1 - p right)$ if you get false. This equals one bit only if $p = 0.5$. The equivalent instantaneous information content (also measured in terms of Shannnon entropy) of your multi-outcome experiment (yielding one exemplar of Z) in bits is $- log _2p_i$, which depends upon which particular outcome you observe. Needless to say, rarer outcomes carry more information content.
Your intuition of breaking things down into binary questions is a (VERY) good one, but doing it using the Shannon information measurement scale can be quite cumbersome. A much more intuitive way is to use the discriminating information measurement scale, where the instantaneous information content (in bits) from a binary experiment is $log _2left( fracp1 - p right)$ if you get true and $log _2left( frac1 - pp right) = - log _2left( fracp1 - p right)$ if you get false, although we typically use natural logs, leading to units of "nats". The discriminating information scale can be thought of as an alternative scale for measuring information (like Celsius and Fahrenheit temperature scales, except that the relationship between entropy and discriminating information is not linear). Using the discriminating information scale, the amount of information the exemplar of Z provides is precisely the same as the amount of information that it would provide for the binary experiment were the result is true, namely $log _2left( fracp_i1 - p_i right)$. This makes it very easy to cast more complicated experiments into equivalent sets of binary experiments, and suggests (I don't know how to prove this yet) that information measurement for any problem of arbitrary complexity can always be cast in terms of the discriminating information content of all the underlying binary experiments.
answered Jul 16 at 20:41


John Polcari
382111
382111
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2853791%2famount-of-information-in-one-experiment%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
" one defines that this experiment after carried out yields one bit of information." No, the experiment yields $h(p)$ bits of information (one bit if $p=frac12$). Furthermore, that's the average amount of information.
– leonbloy
Jul 17 at 13:42