Deriving Probability Theory from Information Theory
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.
I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.
It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.
In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.
Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.
EDIT: (For clarification)
Instead of defining information in terms of probability:
I(m) := -log(Pr(M=m)) // log base 2 for information in bits
It seems like one could define probability in terms of information (or lack there of):
Pr(M=m) := b^(-I(m)) // b=2 for information given in bits
If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.
That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.
probability probability-theory definition information-theory kolmogorov-complexity
 |Â
show 1 more comment
up vote
0
down vote
favorite
In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.
I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.
It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.
In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.
Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.
EDIT: (For clarification)
Instead of defining information in terms of probability:
I(m) := -log(Pr(M=m)) // log base 2 for information in bits
It seems like one could define probability in terms of information (or lack there of):
Pr(M=m) := b^(-I(m)) // b=2 for information given in bits
If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.
That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.
probability probability-theory definition information-theory kolmogorov-complexity
1
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
2
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52
 |Â
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.
I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.
It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.
In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.
Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.
EDIT: (For clarification)
Instead of defining information in terms of probability:
I(m) := -log(Pr(M=m)) // log base 2 for information in bits
It seems like one could define probability in terms of information (or lack there of):
Pr(M=m) := b^(-I(m)) // b=2 for information given in bits
If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.
That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.
probability probability-theory definition information-theory kolmogorov-complexity
In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.
I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.
It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.
In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.
Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.
EDIT: (For clarification)
Instead of defining information in terms of probability:
I(m) := -log(Pr(M=m)) // log base 2 for information in bits
It seems like one could define probability in terms of information (or lack there of):
Pr(M=m) := b^(-I(m)) // b=2 for information given in bits
If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.
That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.
probability probability-theory definition information-theory kolmogorov-complexity
edited Jul 25 at 19:24
asked Jul 24 at 23:17
arachnivore
11
11
1
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
2
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52
 |Â
show 1 more comment
1
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
2
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52
1
1
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
2
2
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52
 |Â
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2861854%2fderiving-probability-theory-from-information-theory%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54
2
How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49
Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39
It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47
Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52