Deriving Probability Theory from Information Theory

up vote
0
down vote

favorite

In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.

I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.

It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.

In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.

Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.

EDIT: (For clarification)

Instead of defining information in terms of probability:

I(m) := -log(Pr(M=m)) // log base 2 for information in bits

It seems like one could define probability in terms of information (or lack there of):

Pr(M=m) := b^(-I(m)) // b=2 for information given in bits

If we take the definition of mathematics from above: The study of patterns
We could build a formal language for describing patterns from a set of symbols (say 0,1). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is 0,1). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.

That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.

edited Jul 25 at 19:24

asked Jul 24 at 23:17

arachnivore

1

See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/â€¦
â€“Â David G. Stork
Jul 24 at 23:54

2

How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
â€“Â Michael
Jul 25 at 0:49

Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
â€“Â arachnivore
Jul 25 at 2:39

It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
â€“Â arachnivore
Jul 25 at 2:47

Thanks for the suggestion, David G. Stork!
â€“Â arachnivore
Jul 25 at 2:52

Â |Â
show 1 more comment

up vote
0
down vote

favorite

In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.

It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.

EDIT: (For clarification)

Instead of defining information in terms of probability:

I(m) := -log(Pr(M=m)) // log base 2 for information in bits

It seems like one could define probability in terms of information (or lack there of):

Pr(M=m) := b^(-I(m)) // b=2 for information given in bits

edited Jul 25 at 19:24

asked Jul 24 at 23:17

arachnivore

1

See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/â€¦
â€“Â David G. Stork
Jul 24 at 23:54

2

How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
â€“Â Michael
Jul 25 at 0:49

Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
â€“Â arachnivore
Jul 25 at 2:39

It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
â€“Â arachnivore
Jul 25 at 2:47

Thanks for the suggestion, David G. Stork!
â€“Â arachnivore
Jul 25 at 2:52

Â |Â
show 1 more comment

up vote
0
down vote

favorite

In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.

It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.

EDIT: (For clarification)

Instead of defining information in terms of probability:

I(m) := -log(Pr(M=m)) // log base 2 for information in bits

It seems like one could define probability in terms of information (or lack there of):

Pr(M=m) := b^(-I(m)) // b=2 for information given in bits

edited Jul 25 at 19:24

asked Jul 24 at 23:17

arachnivore

In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.

It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.

EDIT: (For clarification)

Instead of defining information in terms of probability:

I(m) := -log(Pr(M=m)) // log base 2 for information in bits

It seems like one could define probability in terms of information (or lack there of):

Pr(M=m) := b^(-I(m)) // b=2 for information given in bits

edited Jul 25 at 19:24

asked Jul 24 at 23:17

arachnivore

edited Jul 25 at 19:24

asked Jul 24 at 23:17

arachnivore

asked Jul 24 at 23:17

arachnivore

asked Jul 24 at 23:17

arachnivore

1

See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/â€¦
â€“Â David G. Stork
Jul 24 at 23:54

2

How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
â€“Â Michael
Jul 25 at 0:49

Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
â€“Â arachnivore
Jul 25 at 2:39

It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
â€“Â arachnivore
Jul 25 at 2:47

Thanks for the suggestion, David G. Stork!
â€“Â arachnivore
Jul 25 at 2:52

Â |Â
show 1 more comment

1

See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/â€¦
â€“Â David G. Stork
Jul 24 at 23:54

2

How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
â€“Â Michael
Jul 25 at 0:49

Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
â€“Â arachnivore
Jul 25 at 2:39

It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
â€“Â arachnivore
Jul 25 at 2:47

Thanks for the suggestion, David G. Stork!
â€“Â arachnivore
Jul 25 at 2:52

See David MacKay's book: amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/â€¦
â€“Â David G. Stork
Jul 24 at 23:54

How do you propose to derive probability theory from information theory? And why? The concepts of entropy and mutual information require probability. It is like deriving arithmetic from calculus.
â€“Â Michael
Jul 25 at 0:49

Well, again I'm no mathematician, but the idea is that the probability we estimate for some event is contingent upon the information we have of the relevant factors. Maybe instead of defining information in terms of probability: I(m) = -log(p(m)) It makes more sense to define probability in terms of information: p(m) = 2^-I(m) Paul Revere may have estimated the probability of the Brits arriving by sea as 50% but the Brits knew it was 100% and had Revere factored in information that was relevant to the decision, he could have arrived at a better estimate.
â€“Â arachnivore
Jul 25 at 2:39

It seems like one could start with something like a formal language constructed from a finite set of symbols (say 0,1) and define information in terms of numbers of symbols, then bring in Turing and Kolmogorov etc. and work up to a probability-free theory of information. Then define probability in terms of information instead of vice-versa. It's all about dealing with systems for which we have incomplete information. It makes more sense to me than a frequentist, subjective, or objective interpretation.
â€“Â arachnivore
Jul 25 at 2:47

Thanks for the suggestion, David G. Stork!
â€“Â arachnivore
Jul 25 at 2:52

Â |Â
show 1 more comment

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2861854%2fderiving-probability-theory-from-information-theory%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik