Deriving Probability Theory from Information Theory

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.



I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.



It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.



In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.



Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.



EDIT: (For clarification)



Instead of defining information in terms of probability:



I(m) := -log(Pr(M=m)) // log base 2 for information in bits



It seems like one could define probability in terms of information (or lack there of):



Pr(M=m) := b^(-I(m)) // b=2 for information given in bits



If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.



That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.







share|cite|improve this question

















  • 1




    See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
    – David G. Stork
    Jul 24 at 23:54






  • 2




    How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
    – Michael
    Jul 25 at 0:49











  • Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
    – arachnivore
    Jul 25 at 2:39










  • It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
    – arachnivore
    Jul 25 at 2:47










  • Thanks for the suggestion, David G. Stork!
    – arachnivore
    Jul 25 at 2:52














up vote
0
down vote

favorite












In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.



I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.



It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.



In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.



Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.



EDIT: (For clarification)



Instead of defining information in terms of probability:



I(m) := -log(Pr(M=m)) // log base 2 for information in bits



It seems like one could define probability in terms of information (or lack there of):



Pr(M=m) := b^(-I(m)) // b=2 for information given in bits



If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.



That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.







share|cite|improve this question

















  • 1




    See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
    – David G. Stork
    Jul 24 at 23:54






  • 2




    How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
    – Michael
    Jul 25 at 0:49











  • Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
    – arachnivore
    Jul 25 at 2:39










  • It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
    – arachnivore
    Jul 25 at 2:47










  • Thanks for the suggestion, David G. Stork!
    – arachnivore
    Jul 25 at 2:52












up vote
0
down vote

favorite









up vote
0
down vote

favorite











In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.



I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.



It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.



In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.



Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.



EDIT: (For clarification)



Instead of defining information in terms of probability:



I(m) := -log(Pr(M=m)) // log base 2 for information in bits



It seems like one could define probability in terms of information (or lack there of):



Pr(M=m) := b^(-I(m)) // b=2 for information given in bits



If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.



That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.







share|cite|improve this question













In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.



I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.



It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.



In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.



Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.



EDIT: (For clarification)



Instead of defining information in terms of probability:



I(m) := -log(Pr(M=m)) // log base 2 for information in bits



It seems like one could define probability in terms of information (or lack there of):



Pr(M=m) := b^(-I(m)) // b=2 for information given in bits



If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.



That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Jul 25 at 19:24
























asked Jul 24 at 23:17









arachnivore

11




11







  • 1




    See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
    – David G. Stork
    Jul 24 at 23:54






  • 2




    How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
    – Michael
    Jul 25 at 0:49











  • Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
    – arachnivore
    Jul 25 at 2:39










  • It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
    – arachnivore
    Jul 25 at 2:47










  • Thanks for the suggestion, David G. Stork!
    – arachnivore
    Jul 25 at 2:52












  • 1




    See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
    – David G. Stork
    Jul 24 at 23:54






  • 2




    How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
    – Michael
    Jul 25 at 0:49











  • Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
    – arachnivore
    Jul 25 at 2:39










  • It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
    – arachnivore
    Jul 25 at 2:47










  • Thanks for the suggestion, David G. Stork!
    – arachnivore
    Jul 25 at 2:52







1




1




See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54




See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/…
– David G. Stork
Jul 24 at 23:54




2




2




How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49





How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
– Michael
Jul 25 at 0:49













Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39




Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
– arachnivore
Jul 25 at 2:39












It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47




It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
– arachnivore
Jul 25 at 2:47












Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52




Thanks for the suggestion, David G. Stork!
– arachnivore
Jul 25 at 2:52















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2861854%2fderiving-probability-theory-from-information-theory%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2861854%2fderiving-probability-theory-from-information-theory%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?