The number of combinations of Fi to F255 that satisfies the following inequality.
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Consider the following equation:
$H(x) = - sum_i=0^255 P_i log_2 (P_i) $
When $ P_0 = P_1 = ... =P_255 = frac 1256$
$H(x) = - sum_i=0^255 frac 1256 log_2 (frac 1256) = 8 $
Where $p_i$ is the probability of the character $x_i$ to occur in the file.
The file Size is 256 characters.
We will refer to the frequency of the character $x_i$ as $f_i$. ($p_i$ = $f_i / 256)$.
The sum of the frequencies of the characters will always be $256$.
My question is:
Given that $H(x)$ is $ge 7$, How to calculate the number of possible combinations of characters frequencies (from $f_0$ to $f_255$) that makes the inequality true?
summation combinations information-theory
 |Â
show 10 more comments
up vote
0
down vote
favorite
Consider the following equation:
$H(x) = - sum_i=0^255 P_i log_2 (P_i) $
When $ P_0 = P_1 = ... =P_255 = frac 1256$
$H(x) = - sum_i=0^255 frac 1256 log_2 (frac 1256) = 8 $
Where $p_i$ is the probability of the character $x_i$ to occur in the file.
The file Size is 256 characters.
We will refer to the frequency of the character $x_i$ as $f_i$. ($p_i$ = $f_i / 256)$.
The sum of the frequencies of the characters will always be $256$.
My question is:
Given that $H(x)$ is $ge 7$, How to calculate the number of possible combinations of characters frequencies (from $f_0$ to $f_255$) that makes the inequality true?
summation combinations information-theory
2
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
What inequality are you talking about?
– Ali
Jul 21 at 9:29
1
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56
 |Â
show 10 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Consider the following equation:
$H(x) = - sum_i=0^255 P_i log_2 (P_i) $
When $ P_0 = P_1 = ... =P_255 = frac 1256$
$H(x) = - sum_i=0^255 frac 1256 log_2 (frac 1256) = 8 $
Where $p_i$ is the probability of the character $x_i$ to occur in the file.
The file Size is 256 characters.
We will refer to the frequency of the character $x_i$ as $f_i$. ($p_i$ = $f_i / 256)$.
The sum of the frequencies of the characters will always be $256$.
My question is:
Given that $H(x)$ is $ge 7$, How to calculate the number of possible combinations of characters frequencies (from $f_0$ to $f_255$) that makes the inequality true?
summation combinations information-theory
Consider the following equation:
$H(x) = - sum_i=0^255 P_i log_2 (P_i) $
When $ P_0 = P_1 = ... =P_255 = frac 1256$
$H(x) = - sum_i=0^255 frac 1256 log_2 (frac 1256) = 8 $
Where $p_i$ is the probability of the character $x_i$ to occur in the file.
The file Size is 256 characters.
We will refer to the frequency of the character $x_i$ as $f_i$. ($p_i$ = $f_i / 256)$.
The sum of the frequencies of the characters will always be $256$.
My question is:
Given that $H(x)$ is $ge 7$, How to calculate the number of possible combinations of characters frequencies (from $f_0$ to $f_255$) that makes the inequality true?
summation combinations information-theory
edited Jul 21 at 22:41
asked Jul 21 at 1:14
Hesham H.
33
33
2
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
What inequality are you talking about?
– Ali
Jul 21 at 9:29
1
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56
 |Â
show 10 more comments
2
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
What inequality are you talking about?
– Ali
Jul 21 at 9:29
1
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56
2
2
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
What inequality are you talking about?
– Ali
Jul 21 at 9:29
What inequality are you talking about?
– Ali
Jul 21 at 9:29
1
1
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56
 |Â
show 10 more comments
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
You want
begineqnarray*
H&=&-sum_ip_ilog_2p_i\
&=&
-sum_ifracf_i256log_2fracf_i256\
&=&-frac1256sum_ileft(f_ilog_2f_i-f_ilog_2256right)\
&=&8-frac1256sum_if_ilog_2f_i\
&ge&7
endeqnarray*
and thus
$$
sum_if_ilog_2f_ile256;,
$$
or (with $0^0=1$)
$$
prod_if_i^f_ile2^256;,
$$
which is a good form of the condition for computer purposes, as it allows us to do everything with integer computations without rounding errors.
This Java code performs the computation.
The number of admissible frequency assignments with the characters treated as indistinguishable is $83497348$.
The number of admissible frequency assignments with the characters treated as distinguishable is
24377117336293373068077194760018608890164075844989457788709896790123975565346885123594568979784410896832589991747589003175962663714550060904409876659.
The number of admissible files is
32294749681296695275407031178128991719155449234352428397819516042408274669320014204971984810980113839713404991542286248511856534533725621770720443204257942757841685726869445418911988343694467610597716668644890003072566046313323943032907344235616323318993946853455308766090267464412982787428679760943574604833815718162734374857821826072728560557821111484218984087078219375690986647992988689743586158787866836166182263828747140499381872352135857955196142673766290736723267626340276801993482047192554123014017270771149942354934825395952764480929741003537616825060105246264524800000000000000000000000000000000000000000000.
There are $256^256$ files in total, so approximately $99.93%$ of the files fulfil the condition.
The results should be reasonably reliable, as the total number of files is correctly reproduced when the entropy condition is omitted, and this depends on all three counts being performed correctly.
Here's a $log_2$-plot of the proportion of files with $n$ characters chosen from $n$ symbols for which
$$
prod_if_i^f_igt2^n;,
$$
that is, the proportion of inadmissible files. The line is a linear fit to the last $40$ data points, with slope approximately $-0.0317$. I don't know how this constant could be calculated. Note the interesting odd/even structure. (Click on the image to see the full high-resolution version.)
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
You want
begineqnarray*
H&=&-sum_ip_ilog_2p_i\
&=&
-sum_ifracf_i256log_2fracf_i256\
&=&-frac1256sum_ileft(f_ilog_2f_i-f_ilog_2256right)\
&=&8-frac1256sum_if_ilog_2f_i\
&ge&7
endeqnarray*
and thus
$$
sum_if_ilog_2f_ile256;,
$$
or (with $0^0=1$)
$$
prod_if_i^f_ile2^256;,
$$
which is a good form of the condition for computer purposes, as it allows us to do everything with integer computations without rounding errors.
This Java code performs the computation.
The number of admissible frequency assignments with the characters treated as indistinguishable is $83497348$.
The number of admissible frequency assignments with the characters treated as distinguishable is
24377117336293373068077194760018608890164075844989457788709896790123975565346885123594568979784410896832589991747589003175962663714550060904409876659.
The number of admissible files is
32294749681296695275407031178128991719155449234352428397819516042408274669320014204971984810980113839713404991542286248511856534533725621770720443204257942757841685726869445418911988343694467610597716668644890003072566046313323943032907344235616323318993946853455308766090267464412982787428679760943574604833815718162734374857821826072728560557821111484218984087078219375690986647992988689743586158787866836166182263828747140499381872352135857955196142673766290736723267626340276801993482047192554123014017270771149942354934825395952764480929741003537616825060105246264524800000000000000000000000000000000000000000000.
There are $256^256$ files in total, so approximately $99.93%$ of the files fulfil the condition.
The results should be reasonably reliable, as the total number of files is correctly reproduced when the entropy condition is omitted, and this depends on all three counts being performed correctly.
Here's a $log_2$-plot of the proportion of files with $n$ characters chosen from $n$ symbols for which
$$
prod_if_i^f_igt2^n;,
$$
that is, the proportion of inadmissible files. The line is a linear fit to the last $40$ data points, with slope approximately $-0.0317$. I don't know how this constant could be calculated. Note the interesting odd/even structure. (Click on the image to see the full high-resolution version.)
add a comment |Â
up vote
0
down vote
accepted
You want
begineqnarray*
H&=&-sum_ip_ilog_2p_i\
&=&
-sum_ifracf_i256log_2fracf_i256\
&=&-frac1256sum_ileft(f_ilog_2f_i-f_ilog_2256right)\
&=&8-frac1256sum_if_ilog_2f_i\
&ge&7
endeqnarray*
and thus
$$
sum_if_ilog_2f_ile256;,
$$
or (with $0^0=1$)
$$
prod_if_i^f_ile2^256;,
$$
which is a good form of the condition for computer purposes, as it allows us to do everything with integer computations without rounding errors.
This Java code performs the computation.
The number of admissible frequency assignments with the characters treated as indistinguishable is $83497348$.
The number of admissible frequency assignments with the characters treated as distinguishable is
24377117336293373068077194760018608890164075844989457788709896790123975565346885123594568979784410896832589991747589003175962663714550060904409876659.
The number of admissible files is
32294749681296695275407031178128991719155449234352428397819516042408274669320014204971984810980113839713404991542286248511856534533725621770720443204257942757841685726869445418911988343694467610597716668644890003072566046313323943032907344235616323318993946853455308766090267464412982787428679760943574604833815718162734374857821826072728560557821111484218984087078219375690986647992988689743586158787866836166182263828747140499381872352135857955196142673766290736723267626340276801993482047192554123014017270771149942354934825395952764480929741003537616825060105246264524800000000000000000000000000000000000000000000.
There are $256^256$ files in total, so approximately $99.93%$ of the files fulfil the condition.
The results should be reasonably reliable, as the total number of files is correctly reproduced when the entropy condition is omitted, and this depends on all three counts being performed correctly.
Here's a $log_2$-plot of the proportion of files with $n$ characters chosen from $n$ symbols for which
$$
prod_if_i^f_igt2^n;,
$$
that is, the proportion of inadmissible files. The line is a linear fit to the last $40$ data points, with slope approximately $-0.0317$. I don't know how this constant could be calculated. Note the interesting odd/even structure. (Click on the image to see the full high-resolution version.)
add a comment |Â
up vote
0
down vote
accepted
up vote
0
down vote
accepted
You want
begineqnarray*
H&=&-sum_ip_ilog_2p_i\
&=&
-sum_ifracf_i256log_2fracf_i256\
&=&-frac1256sum_ileft(f_ilog_2f_i-f_ilog_2256right)\
&=&8-frac1256sum_if_ilog_2f_i\
&ge&7
endeqnarray*
and thus
$$
sum_if_ilog_2f_ile256;,
$$
or (with $0^0=1$)
$$
prod_if_i^f_ile2^256;,
$$
which is a good form of the condition for computer purposes, as it allows us to do everything with integer computations without rounding errors.
This Java code performs the computation.
The number of admissible frequency assignments with the characters treated as indistinguishable is $83497348$.
The number of admissible frequency assignments with the characters treated as distinguishable is
24377117336293373068077194760018608890164075844989457788709896790123975565346885123594568979784410896832589991747589003175962663714550060904409876659.
The number of admissible files is
32294749681296695275407031178128991719155449234352428397819516042408274669320014204971984810980113839713404991542286248511856534533725621770720443204257942757841685726869445418911988343694467610597716668644890003072566046313323943032907344235616323318993946853455308766090267464412982787428679760943574604833815718162734374857821826072728560557821111484218984087078219375690986647992988689743586158787866836166182263828747140499381872352135857955196142673766290736723267626340276801993482047192554123014017270771149942354934825395952764480929741003537616825060105246264524800000000000000000000000000000000000000000000.
There are $256^256$ files in total, so approximately $99.93%$ of the files fulfil the condition.
The results should be reasonably reliable, as the total number of files is correctly reproduced when the entropy condition is omitted, and this depends on all three counts being performed correctly.
Here's a $log_2$-plot of the proportion of files with $n$ characters chosen from $n$ symbols for which
$$
prod_if_i^f_igt2^n;,
$$
that is, the proportion of inadmissible files. The line is a linear fit to the last $40$ data points, with slope approximately $-0.0317$. I don't know how this constant could be calculated. Note the interesting odd/even structure. (Click on the image to see the full high-resolution version.)
You want
begineqnarray*
H&=&-sum_ip_ilog_2p_i\
&=&
-sum_ifracf_i256log_2fracf_i256\
&=&-frac1256sum_ileft(f_ilog_2f_i-f_ilog_2256right)\
&=&8-frac1256sum_if_ilog_2f_i\
&ge&7
endeqnarray*
and thus
$$
sum_if_ilog_2f_ile256;,
$$
or (with $0^0=1$)
$$
prod_if_i^f_ile2^256;,
$$
which is a good form of the condition for computer purposes, as it allows us to do everything with integer computations without rounding errors.
This Java code performs the computation.
The number of admissible frequency assignments with the characters treated as indistinguishable is $83497348$.
The number of admissible frequency assignments with the characters treated as distinguishable is
24377117336293373068077194760018608890164075844989457788709896790123975565346885123594568979784410896832589991747589003175962663714550060904409876659.
The number of admissible files is
32294749681296695275407031178128991719155449234352428397819516042408274669320014204971984810980113839713404991542286248511856534533725621770720443204257942757841685726869445418911988343694467610597716668644890003072566046313323943032907344235616323318993946853455308766090267464412982787428679760943574604833815718162734374857821826072728560557821111484218984087078219375690986647992988689743586158787866836166182263828747140499381872352135857955196142673766290736723267626340276801993482047192554123014017270771149942354934825395952764480929741003537616825060105246264524800000000000000000000000000000000000000000000.
There are $256^256$ files in total, so approximately $99.93%$ of the files fulfil the condition.
The results should be reasonably reliable, as the total number of files is correctly reproduced when the entropy condition is omitted, and this depends on all three counts being performed correctly.
Here's a $log_2$-plot of the proportion of files with $n$ characters chosen from $n$ symbols for which
$$
prod_if_i^f_igt2^n;,
$$
that is, the proportion of inadmissible files. The line is a linear fit to the last $40$ data points, with slope approximately $-0.0317$. I don't know how this constant could be calculated. Note the interesting odd/even structure. (Click on the image to see the full high-resolution version.)
answered Jul 22 at 8:06
joriki
164k10180328
164k10180328
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2858152%2fthe-number-of-combinations-of-fi-to-f255-that-satisfies-the-following-inequality%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
Better if you can write out the equation. The help menu has instructions for formatting mathematics here.
– Gerry Myerson
Jul 21 at 4:56
This seems difficult. Could you give us some context how you encountered this problem? Do you have reason to believe that there's a solution in closed form?
– joriki
Jul 21 at 5:01
What inequality are you talking about?
– Ali
Jul 21 at 9:29
1
@GerryMyerson Done it.
– Hesham H.
Jul 21 at 14:41
@joriki I encountered this problem while I was thinking about the entropy of data and information theory, it is not in a textbook, I was wondering how many files of size 256 bytes and has a data entropy of 7 or higher, or in general how to calculate the number of files with an entropy ranging between 2 values. Can you explain what do mean by "a solution in closed form"?
– Hesham H.
Jul 21 at 14:56