ukmuiik

Question

enter image description here

Consider the illustrated binomial (not binary) tree with $n$ steps (drawn for $n=5$, but consider $n$ variable). Start a random walk at the left-hand node, and at each step you have probability $p$ of visiting the node to your lower-right, and probability $1-p$ of visiting the node to your upper-right.

What's the probability, call it $P_n$ (or whatever else you like:), of completing an $n$-step walk without ever visiting any node above the dotted-line baseline, i.e., never visiting one of the labelled "prohibited nodes". You're allowed to visit nodes "touching" the baseline (possible on even-numbered steps only), but never nodes above it.

Â Â Edit #3 (new binomial coeff identity???)

-------------------------------------------------------

@BrianTung answered the question below, and below that I added another "answer", just numerically confirming Brian's result, based on his $_nD_k$ "modified binomial coefficients" (see Brian's answer). And I subsequently modified that program, just to see if my original effort at an answer, based on my $N^m_n$ coefficients (see Edit#2 below), might also work. Kind of surprised me to find that my answer works, too.

And what just occurred to me is that since both answers involve exactly the same (somewhat modified) cumulative binomial distribution sum of the form $sum_k mboxcoef_n,kp^k(1-p)^n-k$, therefore both our coefficeints must be identical. And (after juggling my non-standard $m,n$-notation in $N^m_n$ to Brian's standard $n,k$'s) that leads to the following necessary-but-goofy-looking identity,
$$ fracn-2k+1n-k+1binomnk =
binomn-1k - sum_i=1^k-1 i binomn-i-2k-i-1, hspace10pt kleqfrac n2$$
where Brian's $_nD_k$ is on the lhs and mine's on the rhs. Just to be sure, I numerically checked it, and it's correct. And maybe it algebraically simplifies to some common binomial coefficient formula, but I'm not seeing how to do that. And if it doesn't simplify, then you probably "saw it here first":).

Â Â Motivation

----------------------

Firstly, much thanks to @saulspatz and @BrianTung for their elaborate effort replying to this question. So I thought I should add a paragraph explaining why I'd asked it (beyond recreational). That relates to the comment link https://en.wikipedia.org/wiki/Binomial_options_pricing_model I posted below Â (Â chiding :) @jorwiki for his terminology usage formalityÂ ). Anyway...

The purpose is pricing (getting the current value of) cashflows from consumer mortgages along a binomial tree of interest rates (determined from treasury notes and bonds, and from a given volatility parameter). But mortgages are like options due to prepayments, e.g., if you sell your house then you typically pay off the entire mortgage and all cashflows abruptly stop. Statistics for that kind of prepayment are pretty much actuarially determined from observed behaviors. But if interest rates go down a lot, people may choose to refinance their mortgages to take advantage of lower monthly payments (after accounting for refinance costs, etc). That's called "rational exercise", although prepayment models usually model plenty of not-so-rational behavior, too.

So the "baseline" here represents rational exercise -- the rate below which the mortgage is refinanced and cashflows stop. The actual model's on a computer, and this idealized closed-form solution is therefore irrelevant. But for testing and comparison purposes, it's a useful tool to have. Of course, however, I seem to have underestimated the number-theory-type stuff required for that solution :)

Â Â EDIT#1 (some preceding work)

---------------------------------------------

As per @saulspatz's comment request below, I'm showing some work I'd
already tried. But it's just to demonstrate that I tried.
Don't actually try reading it too carefully.
It's not well-written, and in the end it didn't lead much of anywhere.

For $p=.5$ I tried arguing heads/tails "paths", somewhat along the following lines.


 tails<----- oN00 ----->heads
 / 
 +-------------------------------+ / +--------------------------------+
 | Nht denotes number of "legal"/ oN01 o An "illegal" path on the right |
 | paths from N00 to the node / / / side of the tree has more |
 | representing h heads and / / / heads than tails before the |
 | t tails. "Legal" paths / oN02 oN11 o n-th trial. The lower |
 | must stay within the / / / / left of the tree has |
 | labelled portion / / / / too many tails to |
 | of the tree. / oN03 oN21 o o catch up by n. |
 +-----------------------+ / / / / +------------------------+
 / / / / 
 oN04 oN13 oN22 o o
 / / / / / 
 / / / / / 
 oN05 oN14 oN23 o o o
 / / / / / / 
 / / / / / / 
 oN06 oN15 oN24 oN33 o o o

For any $N^0_t$, note that $N^0_1=N^0_2=cdots=1$
since there's only one possible path along the ``left edge''
of the tree, i.e., toss all tails).

For $N^1_6$, there's exactly one path from $N^0_6$ to $N^1_6$,
one path from $N^0_5$ to $N^1_6$ ($N^0_5rightarrow
N^1_5rightarrow N^1_6$) in addition to
the preceding path through $N^0_6$, one path from $N^0_4$
to $N^1_6$ ($N^0_4rightarrow N^1_4rightarrow
N^1_5rightarrow N^1_6$) in addition to
the preceding paths, $ldots$, and, finally, one
additional path from $N^0_1$ to $N^1_6$
(there are no additional
paths from $N^0_0$ to $N^1_6$).

By similar reasoning we obtain the following sequence of
iterative formulas:

$N^1_6$ = $N^0_6$ + $N^0_5$ + $N^0_4$ +
$N^0_3$ + $N^0_2$ + $N^0_1$

$N^2_6$ = $N^1_6$ + $N^1_5$ + $N^1_4$ +
$N^1_3$ + $N^1_2$

$N^3_6$ = $N^2_6$ + $N^2_5$ + $N^2_4$ + $N^2_3$

$N^4_6$ = $N^3_6$ + $N^3_5$ + $N^3_4$

$N^5_6$ = $N^4_6$ + $N^4_5$

$N^6_6$ = $N^6_6$

For "interior" nodes, the same reasoning applies,
e.g.:

$N^3_5$ = $N^2_5$ + $N^2_4$ + $N^2_3$
etc.

For the general case we therefore conclude:

$N^m_n$ = $sum_k=m^n N^m-1_k$

Â Â Â Â = $sum_k=1^n N^m-1_k$
$-$ $sum_k=1^m-1 N^m-1_k$

with $N^0_n=1$ (and, trivially, $N^1_n=n$).

Â Â EDIT#2 (some recurrence relations)

-------------------------------------------------

Following along with @BrianTung's work below, I'm posting some recurrence relations that seemed to be useful while I was trying to find a closed-form solution for my $N^m_n$. I'll continue using my notation rather than Brian's pretty-much-corresponding $_nD_k$, just so I can transcribe my notes with hopefully minimal mistakes...

Firstly, a generalization of the usual $sum_1^ni=fracn(n+1)2$,

define $H^m_n$ as follows,

$ beginarraycclcl
H^1_n & = & & & mbox$1$ for all $n$ \
H^2_n & = & sum_i=1^n H^1_i & = & n \
H^3_n & = & sum_i=1^n H^2_i & = & fracn(n+1)2! mboxas usual\
H^4_n & = & sum_i=1^n H^3_i & stackrel?= &
underbracean^3+bn^2+cn_
mboxsolve for $a,b,c$ $ldots$ \
& & & = & frac16n^3 + frac12n^2 + frac13n \
& & & = & fracn(n+1)(n+2)3! \
endarray$

Now, by examination we infer the general expression

$beginarrayccl
H^m_n & = & frac1(m-1)! prod_k=1^m-1 (n+k-1) \
& = & frac(n+m-2)!(m-1)!(n-1)! \
& = & binomn+m-2m-1 = binomn+m-2n-1 \
endarray$

So now a closed-form solution for $N^m_n$ can tentatively be developed from our earlier (from Edit#1)

$N^m_n$ = $sum_k=m^n N^m-1_k$

Â Â Â Â = $sum_k=1^n N^m-1_k$
- $sum_k=1^m-1 N^m-1_k$

by using the above $H^m_n$ formula to iteratively evaluate...

$beginarraycclclclcl
N^0_n &=& 1 \
&equiv& H^1_n \
N^1_n &=& sum_k=1^n N^0_k - underbracesum_k=1^0 N^0_k_=0 \
&=& sum_k=1^n H^1_n \
&=& H^2_n \
N^2_n &=& sum_k=1^n N^1_k - sum_k=1^1 N^1_k \
&=& sum_k=1^n H^2_k - sum_k=1^1 H^2_k \
&=& H^3_n - underbraceH^3_1_=1=1H^1_n \
&=& H^3_n - H^1_n \
N^3_n &=& sum_k=1^n(H^3_k - H^1_k) - sum_k=1^2(H^3_k - H^1_k) \
&=& (H^4_n -H^2_n) - underbrace(H^4_2 - H^2_2)_=2=2H^1_n \
N^4_n &=& (H^5_n - H^3_n - 2H^2_n)
- underbrace(H^5_3 - H^3_3 - 2H^2_3)_=3=3H^1_n \
N^5_n &=& (H^6_n - H^4_n - 2H^3_n - 3H^2_n)
- underbrace(H^6_4-H^4_4-2H^3_4-3H^2_4)_=4=4H^1_n \
vdots \
N^m_n &=& H^m+1_n - sum_k=1^m-1 kH^m-k_n \
& & mboxiff for any integer $m$ we have
m stackrel?= H^m+2_m - sum_k=1^m-1 kH^m+1-k_m\
& & mboxwhich seems to work out for the cases I tested
endarray$

@saulspatz Thanks. That looks almost identical to my question, except for the "begin and end at the same level" part in Baracchini's paper. We can end at-or-below the same level, so the path-counting is different. I'm not sure how different, i.e., maybe it's just a simple generalization. I wasn't aware of this stuff, and will take a closer look. Thanks again. — Jul 23 at 14:06
Are you familiar with generating functions? (By the way, this is not a tree; a tree is acyclical.) — Jul 23 at 14:08
Yes, at first glance I thought you were just asking about Dyck paths, but then I realized your question was different, and deleted the comment. Note that we know the probability of touching the baseline after $2n$ steps. — Jul 23 at 14:09
I'd say it's a DAG: Directed Acyclic Graph. Not a tree, since nodes can be reached in multiple ways, but this is still acyclic, since once you leave a node you can never return to it. — Jul 23 at 15:12

score 2 · Accepted Answer · 2018-07-23 15:41:17Z

ETA2: I rolled back, because the OP's notation is a little different from mine. OP uses $_mN_k$ for the number of ways for $k$ upward moves to be chosen out of $k+m$ total moves; I use $_nD_k$ for the number of ways for $k$ upward moves to be chosen out of $n$ total moves.

Partial answer. It occurs to me that one can proceed as one normally does with the binomial distribution, but one must use different coefficients. Whereas the normal binomial coefficients $_nC_k$ have a recurrence of

$$
_nC_k = _n-1C_k-1 + _n-1C_k
$$

our modified coefficients $_nD_k$ have a recurrence of

$$
_nD_k = begincases
_n-1D_k-1 + _n-1D_k & k leq n/2 \
0 & textotherwise
endcases
$$

Once this is done, it seems to me that we can compute

$$
P(textpermissible path of length $n$)
= sum_k=0^lfloor n/2 rfloor _nD_k (1-p)^k p^n-k
$$

A little pencil-and-paper noodling and trawling on OEIS seems to show that

$$
_nD_k = fracn-2k+1n-k+1 binomnk
$$

Note that this puts the Catalan numbers right on the critical line. I still have to figure out $_nD_k$ from first principles, though.

Thanks a lot, Brian. Following along with your work, I added an "Edit#2" to the original question, displaying some of the additional recurrence relations I'd cobbled together while trying to construct a closed-form solution for (in my notation) $N^m_n$. Not sure how correct and/or useful they'll ultimately be. — Jul 23 at 17:04
@JohnForkosh It seems to me that Brian Tung has given the closed-form solution in this answer. All one has to do is to substitute the formula into the recurrence and verify that it is satisfied. I've done this for the case $kle(n-1)/2$ and I'm working on the edge cases. — Jul 23 at 17:09
@saulspatz Thanks, yes, I think you and Brian are right. His approach (modified binomial coefficients) seems a lot more straightforward than what I'd been trying to do. So Brian's almost certainly exactly right, and I was just trying to see if what I'd tried to do resulted in the same/equivalent answer (though I'm not betting any money on it:). — Jul 23 at 17:32
(at)Brian and @saulspatz -- I added an Edit#3 near the top of the original question, illustrating what looks to me like maybe a new binomial coefficient identity. But maybe you better-qualified guys can simplify it to some standard, well-known identity. It's based on the fact that I numerically checked my original-but-tentative $N^m_n$ solution and was surprised to find it actually works. But that must mean that (after juggling my $m,n$ indexes to your $n,k$'s) that $_nD_k=N^m_n$, which the edit expands into binomial coefficients. And I numerically checked it, confirming it's true. — Jul 26 at 12:13

score 2 · Answer 2 · 2018-07-24 15:36:06Z

This is a follow-up to Brian Tung's answer, just confirming his solution. It really belongs in a comment, but it doesn't fit.

To verify that the formula for $_nD_k$ is correct, we have to check that is satisfies appropriate initial conditions, and check that it satisfies the recurrence. The formula gives $$_1D_1=0, _1D_0=1,$$ which is correct.

Now suppose that $kle(n-1)/2$. Then also $k-1le (n-1)/2$ and $kle n/2$ so in terms of the recurrence, $D$ is not 0 and is given by the formula. We have$$
_nD_k-_n-1D_k-1-_n-1D_k=\
beginalign
&=fracn-2k+1n-k+1nchoose k-fracn-2k+2n-k+1n-1choose k-1-fracn-2kn-kn-1choose k\
&=left(fracn-2k+1n-k+1-fracn-2k+2n-k+1right)n-1choose k-1+
left(fracn-2k+1n-k+1-fracn-2kn-kright)n-1choose k\
&=frac-1n-k+1n-1choose k-1+frack(n-k)(n-k+1)n-1choose k\&=0,\
endalign
$$
since $$n-1choose k=fracn-kkn-1choose k-1$$

Now suppose $k>(n-1)/2.$ The only case we have to check is when $k=n/2$ since if $k$ is any larger we have $_nD_k=0.$ Now we have
$$_2kD_k-_2k-1D_k-1-_2k-1D_k=\
beginalign
&=_2kD_k-_2k-1D_k-1-0\
&=frac1k+12kchoose k-frac2k+12k-1choose k-1\
&=frac1k+1left(frac(2k)!k!k!-22k-1choose k-1 right)\
&=frac1k+1left(frac2(2k-1)!(k-1)!k!-22k-1choose k-1 right)\&=0
endalign
$$

NOTE Can someone tell me how to improve the formatting, please? On my Mac, at least, in something like $$_nD_k-_n-1D_k-1-_n-1D_k$$ the $n-1$ pre-subscripts are too far way from the $Dtext'$s and too close to the minus signs.

EDIT All I had to do was to put the terms in braces to get $$_nD_k-_n-1D_k-1-_n-1D_k$$

Thanks again, Saul. I tried "checking" both answers, but checking a second automatically unchecked the first. I usually write "_kD" for a pre-subscript, having that subscript explicitly subscripting an empty "" expression. But I'm not aware of any canonical markup syntax for obtaining that notation. (By the way, I added a "Motivation" paragraph just below the question, explaining why I'd asked it in the first place.) — Jul 23 at 18:39
You can only accept one answer, and it should certainly be Brian's. :-) — Jul 23 at 18:42
The before the subscript is essentially attached to the previous atom, preventing the subscript from glomming onto it. This has to do with the way LaTeX and its derivatives kern operators when they're unary vs binary, I think? At any rate, it does the right thing. — Jul 23 at 20:36
@Brian So it's parsed as a subscript on the minus sign; makes sense, though not what I want, of course. I wonder if there's a way around it; maybe I'll ask on the LaTex forum. — Jul 23 at 20:41
@saulspatz thanks for the markup explanation. And I doubly-confirmed Brian's solution with some numerical tests reproduced below. Pretty much no doubt that his entire development is correct. — Jul 24 at 15:44

score 1 · Answer 3 · 2018-07-24 15:35:21Z

As per @saulspatz's answer, "This is [also] a follow-up to Brian Tung's answer, just confirming his solution." In this case, the confirmation's numerical. I coded a recursive solution to the problem, as well as Brian's closed-form solution, printing both results. And they both agree. For very small $n$, the answer's obvious, and we're both unmistakably correct. For larger $n$, correctness is hopefully inferred from the continued agreement.

So first, my code's below. And that's followed by the examples cited above, and a little further discussion. Here's the code...

/* ---
 * standard headers
 * ------------------- */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* ---
 * globals (to reduce recursive stack)
 * -------------------------------------- */
static int n = 16; /* total# periods/steps in binomial tree */
static double pup = 0.5; /* probability of "up"-step (pdown=1-pup) */
static int maxup = 999999; /* maximum allowable kup-kdown */
static double npaths = 0.0; /* #paths satisfying maxup constraint */

/****************************************************************************
 * Function: pmoddyck ( k, kup )
 * Purpose: See https://math.stackexchange.com/questions/2860403/
 * ...recursively enumerate "modified dyck paths" through
 * n-period binomial tree, and calculate probability
 * of successfully completing the entire "binomial walk"
 * --------------------------------------------------------------------------
 * Arguments: k (I) int containing number periods/steps
 * already completed in binomial tree
 * kup (I) int containing number of up-steps
 * --------------------------------------------------------------------------
 * Returns: (double) probability of walking n-period tree without
 * ever crossing past "critical baseline"
 * --------------------------------------------------------------------------
 * Notes: o call pmoddyck(0,0) to start the recursion,
 * and return the overall probability (with npaths
 * returning the #paths satisfying the "maxup" constraint)
 ***************************************************************************/
double pmoddyck ( int k, int kup ) 
 /* note: kdown=k-kup, so kup-kdown=2*kup-k */
 if ( 2*kup - k > maxup ) return ( 0.0 ); /* abort failed paths */
 if ( k >= n ) /*completed entire path successfully*/
 npaths += 1.0; /*#successful paths=2^n if maxup=999*/
 return ( pow(pup,(double)kup)*pow(1.-pup,(double)(k-kup)) ); 
 return ( pmoddyck(k+1,kup) + pmoddyck(k+1,kup+1) ); /* next step: down+up */
 /* --- end-of-function pmoddyck() --- */

/****************************************************************************
 * Function: Dcoef ( n, k )
 * Purpose: Brian Tung's "modified binomial coefficient",
 * see https://math.stackexchange.com/questions/2860403/
 * --------------------------------------------------------------------------
 * Arguments: n (I) n items...
 * k (I) ...taken k at a time
 * Returns: (double) as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes: o
 ***************************************************************************/
/* --- entry point --- */
double Dcoef ( int n, int k ) 
 double bcoef(), dcoef = 0.0;
 if ( k <= n/2 )
 dcoef = bcoef(n,k) * ((double)(n-2*k+1))/((double)(n-k+1));
 return ( dcoef ); 


/****************************************************************************
 * Function: bcoef ( n, k )
 * Purpose: binomial coefficient = n!/(k!(n-k)!)
 * --------------------------------------------------------------------------
 * Arguments: n (I) n items...
 * k (I) ...taken k at a time
 * Returns: (double) as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes: o Algorithm avoids dividing one (very) large number
 * by another using
 * n!/k!(n-k)! = (n-k+1)(n-k+2)...(n-k+i)/1*2*...*k if k<=n-k,
 * or = (k+1)(k+2)...(k+(n-k))/1*2*...*(n-k) if k>n-k
 * In both cases the #terms in numerator and denom is the same.
 ***************************************************************************/
/* --- entry point --- */
double bcoef ( int n, int k ) 
 double coef = 1.0; /* init with multiplicative ident */
 /* --- bcoef(n,k)=bcoef(n,n-k), so choose smaller number terms --- */
 int kterm=0, nterms=n-k; /* number of terms... */
 if ( k<nterms ) nterms=k; /* ...is lesser of k,n-k */
 /* --- accumulate coef=coef*(n-nterms+kterm)/kterm, kterm=1...nterms --- */
 while ( kterm++ < nterms ) /* need another term */
 coef *= ((double)(n-nterms+kterm))/((double)kterm);
 return ( coef ); /* return binomial coef to caller */
 /* --- end-of-function bcoef() --- */


/****************************************************************************
 * Program: moddyck n maxup pup
 * Purpose: Test Brian Tung's closed-form solution to
 * https://math.stackexchange.com/questions/2860403/
 * as well as my recursive numerical evaluation
 * of the same problem, and see if they agree.
 * --------------------------------------------------------------------------
 * Command-Line Arguments:
 * n (I) int containing #periods/steps in tree
 * maxup (I) int containing maximum allowed
 * #up-steps - #down-steps
 * anywhere along path...
 * maxup=0 is the stackexchange problem
 * maxup=999 permits all paths for a check
 * --------------------------------------------------------------------------
 * Notes: o
 ***************************************************************************/
int main ( int argc, char *argv ) 
 /* ---
 * allocations and declarations
 * ------------------------------- */
 double pmoddyck(), /* recursive numerical evaluation */
 pBrian = 0.0, /* Brian's probability, */
 Bpaths = 0.0; /* and Brian's #paths count */
 double Dcoef(); /* Brian's modified binomial coef */
 int k=0; /* current period/step in tree */
 /* ---
 * command-line args
 * -------------------- */
 n = ( argc > 1 ? atoi(argv[1]) : 16 ); /*16-period tree*/
 maxup = ( argc > 2 ? atoi(argv[2]) : 999999 ); /*allow all paths*/
 pup = ( argc > 3 ? atof(argv[3]) : 0.5 ); /*up/down=50/50*/
 /* ---
 * recursive evalutaion
 * ----------------------- */
 npaths = 0.0;
 printf ( " #paths=%.2lf, pmoddyck(n=%d,maxup=%d,pup=%.3lf) = %.8lfn",
 npaths, n,maxup,pup, pmoddyck(0,0) );
 /* ---
 * Brian Tung's solution
 * ----------------------- */
 Bpaths = 0.0;
 for ( k=0; k<=(n/2); k++ ) 
 double Dnk = Dcoef(n,k);
 double p = 1.0 - pup; /* oops -- I reversed my p's */
 Bpaths += Dnk;
 pBrian += Dnk*pow(1.-p,(double)k)*pow(p,(double)(n-k));
 /* --- end-of-for(k) --- */
 printf ( " Bpaths=%.2lf, pBrian(n=%d,maxup=0,pup=%.3lf) = %.8lfn",
 Bpaths, n,pup, pBrian );
 exit ( 0 );
 /* --- end-of-function main() --- */
/* ----------------------- END-OF-FILE MODDYCK.C ------------------------- */

So if you want to run this yourself, cut-and-paste it into file moddyck.c
(modified Dyck paths, as per @saulspatz's reference cited in the comments to the original question, and Brian's mention of Catalan numbers). Then compile and run it as

cc moddyck.c -lm -o moddyck

./moddyck 16 0 .5

where that run is for $n=16$, $mboxmaxup=0$ (explained below), and $p=0.5$.
The zero for maxup puts the dotted baseline/critical-line exactly where shown along the center of the binomial tree. $mboxmaxup=1$ tells my recursive algorithm to allow paths that are one node higher. However, Brian's solution doesn't accommodate that, so to compare the solutions you have to stick with $mboxmaxup=0$.

And here are some promised small-$n$ comparisons...

bash-4.3$ ./moddyck 1 0 .5
 #paths=1.00, pmoddyck(n=1,maxup=0,pup=0.500) = 0.50000000
 Bpaths=1.00, pBrian(n=1,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 2 0 .5
 #paths=2.00, pmoddyck(n=2,maxup=0,pup=0.500) = 0.50000000
 Bpaths=2.00, pBrian(n=2,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 3 0 .5
 #paths=3.00, pmoddyck(n=3,maxup=0,pup=0.500) = 0.37500000
 Bpaths=3.00, pBrian(n=3,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 4 0 .5
 #paths=6.00, pmoddyck(n=4,maxup=0,pup=0.500) = 0.37500000
 Bpaths=6.00, pBrian(n=4,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 5 0 .5
 #paths=10.00, pmoddyck(n=5,maxup=0,pup=0.500) = 0.31250000
 Bpaths=10.00, pBrian(n=5,maxup=0,pup=0.500) = 0.31250000

There are always $2^n$ total paths (duh, it's a binomial tree:), and our displayed #paths/Bpaths are the number of them that successfully traverse the tree. So, for $n=1,2,3$ you can pretty much draw the successful paths in your head, and see that we're both right. But you ain't gonna want to try that with the next few examples below, where I also changed the $p$'s around a little, just to check that also...

bash-4.3$ ./moddyck 16 0 .5
 #paths=12870.00, pmoddyck(n=16,maxup=0,pup=0.500) = 0.19638062
 Bpaths=12870.00, pBrian(n=16,maxup=0,pup=0.500) = 0.19638062
bash-4.3$ ./moddyck 18 0 .75
 #paths=48620.00, pmoddyck(n=18,maxup=0,pup=0.750) = 0.00308244
 Bpaths=48620.00, pBrian(n=18,maxup=0,pup=0.750) = 0.00308244
bash-4.3$ ./moddyck 20 0 .25
 #paths=184756.00, pmoddyck(n=20,maxup=0,pup=0.250) = 0.66734600
 Bpaths=184756.00, pBrian(n=20,maxup=0,pup=0.250) = 0.66734600
bash-4.3$ ./moddyck 22 0 .5
 #paths=705432.00, pmoddyck(n=22,maxup=0,pup=0.500) = 0.16818810
 Bpaths=705432.00, pBrian(n=22,maxup=0,pup=0.500) = 0.16818810
bash-4.3$ ./moddyck 24 0 .75
 #paths=2704156.00, pmoddyck(n=24,maxup=0,pup=0.750) = 0.00091751
 Bpaths=2704156.00, pBrian(n=24,maxup=0,pup=0.750) = 0.00091751

@saulspatz yeah, I see from your stackoverflow stuff that you also program. Take a look at that pmoddyck() recursive function. It surprised the heck out of me when I realized the entire problem's numerically solved in literally five trivial lines of code. I had to first stare at a blank screen for like 20 minutes, trying to figure out how to do it. But when I finally "got it", the simplicity of the algorithm versus the more complex formal analysis was kind of astounding. — Jul 24 at 16:02

score 2 · Accepted Answer · 2018-07-23 15:41:17Z

ETA2: I rolled back, because the OP's notation is a little different from mine. OP uses $_mN_k$ for the number of ways for $k$ upward moves to be chosen out of $k+m$ total moves; I use $_nD_k$ for the number of ways for $k$ upward moves to be chosen out of $n$ total moves.

Partial answer. It occurs to me that one can proceed as one normally does with the binomial distribution, but one must use different coefficients. Whereas the normal binomial coefficients $_nC_k$ have a recurrence of

$$
_nC_k = _n-1C_k-1 + _n-1C_k
$$

our modified coefficients $_nD_k$ have a recurrence of

$$
_nD_k = begincases
_n-1D_k-1 + _n-1D_k & k leq n/2 \
0 & textotherwise
endcases
$$

Once this is done, it seems to me that we can compute

$$
P(textpermissible path of length $n$)
= sum_k=0^lfloor n/2 rfloor _nD_k (1-p)^k p^n-k
$$

A little pencil-and-paper noodling and trawling on OEIS seems to show that

$$
_nD_k = fracn-2k+1n-k+1 binomnk
$$

Note that this puts the Catalan numbers right on the critical line. I still have to figure out $_nD_k$ from first principles, though.

Thanks a lot, Brian. Following along with your work, I added an "Edit#2" to the original question, displaying some of the additional recurrence relations I'd cobbled together while trying to construct a closed-form solution for (in my notation) $N^m_n$. Not sure how correct and/or useful they'll ultimately be. — Jul 23 at 17:04
@JohnForkosh It seems to me that Brian Tung has given the closed-form solution in this answer. All one has to do is to substitute the formula into the recurrence and verify that it is satisfied. I've done this for the case $kle(n-1)/2$ and I'm working on the edge cases. — Jul 23 at 17:09
@saulspatz Thanks, yes, I think you and Brian are right. His approach (modified binomial coefficients) seems a lot more straightforward than what I'd been trying to do. So Brian's almost certainly exactly right, and I was just trying to see if what I'd tried to do resulted in the same/equivalent answer (though I'm not betting any money on it:). — Jul 23 at 17:32
(at)Brian and @saulspatz -- I added an Edit#3 near the top of the original question, illustrating what looks to me like maybe a new binomial coefficient identity. But maybe you better-qualified guys can simplify it to some standard, well-known identity. It's based on the fact that I numerically checked my original-but-tentative $N^m_n$ solution and was surprised to find it actually works. But that must mean that (after juggling my $m,n$ indexes to your $n,k$'s) that $_nD_k=N^m_n$, which the edit expands into binomial coefficients. And I numerically checked it, confirming it's true. — Jul 26 at 12:13

score 2 · Answer 5 · 2018-07-24 15:36:06Z

This is a follow-up to Brian Tung's answer, just confirming his solution. It really belongs in a comment, but it doesn't fit.

To verify that the formula for $_nD_k$ is correct, we have to check that is satisfies appropriate initial conditions, and check that it satisfies the recurrence. The formula gives $$_1D_1=0, _1D_0=1,$$ which is correct.

Now suppose that $kle(n-1)/2$. Then also $k-1le (n-1)/2$ and $kle n/2$ so in terms of the recurrence, $D$ is not 0 and is given by the formula. We have$$
_nD_k-_n-1D_k-1-_n-1D_k=\
beginalign
&=fracn-2k+1n-k+1nchoose k-fracn-2k+2n-k+1n-1choose k-1-fracn-2kn-kn-1choose k\
&=left(fracn-2k+1n-k+1-fracn-2k+2n-k+1right)n-1choose k-1+
left(fracn-2k+1n-k+1-fracn-2kn-kright)n-1choose k\
&=frac-1n-k+1n-1choose k-1+frack(n-k)(n-k+1)n-1choose k\&=0,\
endalign
$$
since $$n-1choose k=fracn-kkn-1choose k-1$$

Now suppose $k>(n-1)/2.$ The only case we have to check is when $k=n/2$ since if $k$ is any larger we have $_nD_k=0.$ Now we have
$$_2kD_k-_2k-1D_k-1-_2k-1D_k=\
beginalign
&=_2kD_k-_2k-1D_k-1-0\
&=frac1k+12kchoose k-frac2k+12k-1choose k-1\
&=frac1k+1left(frac(2k)!k!k!-22k-1choose k-1 right)\
&=frac1k+1left(frac2(2k-1)!(k-1)!k!-22k-1choose k-1 right)\&=0
endalign
$$

NOTE Can someone tell me how to improve the formatting, please? On my Mac, at least, in something like $$_nD_k-_n-1D_k-1-_n-1D_k$$ the $n-1$ pre-subscripts are too far way from the $Dtext'$s and too close to the minus signs.

EDIT All I had to do was to put the terms in braces to get $$_nD_k-_n-1D_k-1-_n-1D_k$$

Thanks again, Saul. I tried "checking" both answers, but checking a second automatically unchecked the first. I usually write "_kD" for a pre-subscript, having that subscript explicitly subscripting an empty "" expression. But I'm not aware of any canonical markup syntax for obtaining that notation. (By the way, I added a "Motivation" paragraph just below the question, explaining why I'd asked it in the first place.) — Jul 23 at 18:39
You can only accept one answer, and it should certainly be Brian's. :-) — Jul 23 at 18:42
The before the subscript is essentially attached to the previous atom, preventing the subscript from glomming onto it. This has to do with the way LaTeX and its derivatives kern operators when they're unary vs binary, I think? At any rate, it does the right thing. — Jul 23 at 20:36
@Brian So it's parsed as a subscript on the minus sign; makes sense, though not what I want, of course. I wonder if there's a way around it; maybe I'll ask on the LaTex forum. — Jul 23 at 20:41
@saulspatz thanks for the markup explanation. And I doubly-confirmed Brian's solution with some numerical tests reproduced below. Pretty much no doubt that his entire development is correct. — Jul 24 at 15:44

score 1 · Answer 6 · 2018-07-24 15:35:21Z

As per @saulspatz's answer, "This is [also] a follow-up to Brian Tung's answer, just confirming his solution." In this case, the confirmation's numerical. I coded a recursive solution to the problem, as well as Brian's closed-form solution, printing both results. And they both agree. For very small $n$, the answer's obvious, and we're both unmistakably correct. For larger $n$, correctness is hopefully inferred from the continued agreement.

So first, my code's below. And that's followed by the examples cited above, and a little further discussion. Here's the code...

/* ---
 * standard headers
 * ------------------- */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* ---
 * globals (to reduce recursive stack)
 * -------------------------------------- */
static int n = 16; /* total# periods/steps in binomial tree */
static double pup = 0.5; /* probability of "up"-step (pdown=1-pup) */
static int maxup = 999999; /* maximum allowable kup-kdown */
static double npaths = 0.0; /* #paths satisfying maxup constraint */

/****************************************************************************
 * Function: pmoddyck ( k, kup )
 * Purpose: See https://math.stackexchange.com/questions/2860403/
 * ...recursively enumerate "modified dyck paths" through
 * n-period binomial tree, and calculate probability
 * of successfully completing the entire "binomial walk"
 * --------------------------------------------------------------------------
 * Arguments: k (I) int containing number periods/steps
 * already completed in binomial tree
 * kup (I) int containing number of up-steps
 * --------------------------------------------------------------------------
 * Returns: (double) probability of walking n-period tree without
 * ever crossing past "critical baseline"
 * --------------------------------------------------------------------------
 * Notes: o call pmoddyck(0,0) to start the recursion,
 * and return the overall probability (with npaths
 * returning the #paths satisfying the "maxup" constraint)
 ***************************************************************************/
double pmoddyck ( int k, int kup ) 
 /* note: kdown=k-kup, so kup-kdown=2*kup-k */
 if ( 2*kup - k > maxup ) return ( 0.0 ); /* abort failed paths */
 if ( k >= n ) /*completed entire path successfully*/
 npaths += 1.0; /*#successful paths=2^n if maxup=999*/
 return ( pow(pup,(double)kup)*pow(1.-pup,(double)(k-kup)) ); 
 return ( pmoddyck(k+1,kup) + pmoddyck(k+1,kup+1) ); /* next step: down+up */
 /* --- end-of-function pmoddyck() --- */

/****************************************************************************
 * Function: Dcoef ( n, k )
 * Purpose: Brian Tung's "modified binomial coefficient",
 * see https://math.stackexchange.com/questions/2860403/
 * --------------------------------------------------------------------------
 * Arguments: n (I) n items...
 * k (I) ...taken k at a time
 * Returns: (double) as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes: o
 ***************************************************************************/
/* --- entry point --- */
double Dcoef ( int n, int k ) 
 double bcoef(), dcoef = 0.0;
 if ( k <= n/2 )
 dcoef = bcoef(n,k) * ((double)(n-2*k+1))/((double)(n-k+1));
 return ( dcoef ); 


/****************************************************************************
 * Function: bcoef ( n, k )
 * Purpose: binomial coefficient = n!/(k!(n-k)!)
 * --------------------------------------------------------------------------
 * Arguments: n (I) n items...
 * k (I) ...taken k at a time
 * Returns: (double) as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes: o Algorithm avoids dividing one (very) large number
 * by another using
 * n!/k!(n-k)! = (n-k+1)(n-k+2)...(n-k+i)/1*2*...*k if k<=n-k,
 * or = (k+1)(k+2)...(k+(n-k))/1*2*...*(n-k) if k>n-k
 * In both cases the #terms in numerator and denom is the same.
 ***************************************************************************/
/* --- entry point --- */
double bcoef ( int n, int k ) 
 double coef = 1.0; /* init with multiplicative ident */
 /* --- bcoef(n,k)=bcoef(n,n-k), so choose smaller number terms --- */
 int kterm=0, nterms=n-k; /* number of terms... */
 if ( k<nterms ) nterms=k; /* ...is lesser of k,n-k */
 /* --- accumulate coef=coef*(n-nterms+kterm)/kterm, kterm=1...nterms --- */
 while ( kterm++ < nterms ) /* need another term */
 coef *= ((double)(n-nterms+kterm))/((double)kterm);
 return ( coef ); /* return binomial coef to caller */
 /* --- end-of-function bcoef() --- */


/****************************************************************************
 * Program: moddyck n maxup pup
 * Purpose: Test Brian Tung's closed-form solution to
 * https://math.stackexchange.com/questions/2860403/
 * as well as my recursive numerical evaluation
 * of the same problem, and see if they agree.
 * --------------------------------------------------------------------------
 * Command-Line Arguments:
 * n (I) int containing #periods/steps in tree
 * maxup (I) int containing maximum allowed
 * #up-steps - #down-steps
 * anywhere along path...
 * maxup=0 is the stackexchange problem
 * maxup=999 permits all paths for a check
 * --------------------------------------------------------------------------
 * Notes: o
 ***************************************************************************/
int main ( int argc, char *argv ) 
 /* ---
 * allocations and declarations
 * ------------------------------- */
 double pmoddyck(), /* recursive numerical evaluation */
 pBrian = 0.0, /* Brian's probability, */
 Bpaths = 0.0; /* and Brian's #paths count */
 double Dcoef(); /* Brian's modified binomial coef */
 int k=0; /* current period/step in tree */
 /* ---
 * command-line args
 * -------------------- */
 n = ( argc > 1 ? atoi(argv[1]) : 16 ); /*16-period tree*/
 maxup = ( argc > 2 ? atoi(argv[2]) : 999999 ); /*allow all paths*/
 pup = ( argc > 3 ? atof(argv[3]) : 0.5 ); /*up/down=50/50*/
 /* ---
 * recursive evalutaion
 * ----------------------- */
 npaths = 0.0;
 printf ( " #paths=%.2lf, pmoddyck(n=%d,maxup=%d,pup=%.3lf) = %.8lfn",
 npaths, n,maxup,pup, pmoddyck(0,0) );
 /* ---
 * Brian Tung's solution
 * ----------------------- */
 Bpaths = 0.0;
 for ( k=0; k<=(n/2); k++ ) 
 double Dnk = Dcoef(n,k);
 double p = 1.0 - pup; /* oops -- I reversed my p's */
 Bpaths += Dnk;
 pBrian += Dnk*pow(1.-p,(double)k)*pow(p,(double)(n-k));
 /* --- end-of-for(k) --- */
 printf ( " Bpaths=%.2lf, pBrian(n=%d,maxup=0,pup=%.3lf) = %.8lfn",
 Bpaths, n,pup, pBrian );
 exit ( 0 );
 /* --- end-of-function main() --- */
/* ----------------------- END-OF-FILE MODDYCK.C ------------------------- */

So if you want to run this yourself, cut-and-paste it into file moddyck.c
(modified Dyck paths, as per @saulspatz's reference cited in the comments to the original question, and Brian's mention of Catalan numbers). Then compile and run it as

cc moddyck.c -lm -o moddyck

./moddyck 16 0 .5

where that run is for $n=16$, $mboxmaxup=0$ (explained below), and $p=0.5$.
The zero for maxup puts the dotted baseline/critical-line exactly where shown along the center of the binomial tree. $mboxmaxup=1$ tells my recursive algorithm to allow paths that are one node higher. However, Brian's solution doesn't accommodate that, so to compare the solutions you have to stick with $mboxmaxup=0$.

And here are some promised small-$n$ comparisons...

bash-4.3$ ./moddyck 1 0 .5
 #paths=1.00, pmoddyck(n=1,maxup=0,pup=0.500) = 0.50000000
 Bpaths=1.00, pBrian(n=1,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 2 0 .5
 #paths=2.00, pmoddyck(n=2,maxup=0,pup=0.500) = 0.50000000
 Bpaths=2.00, pBrian(n=2,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 3 0 .5
 #paths=3.00, pmoddyck(n=3,maxup=0,pup=0.500) = 0.37500000
 Bpaths=3.00, pBrian(n=3,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 4 0 .5
 #paths=6.00, pmoddyck(n=4,maxup=0,pup=0.500) = 0.37500000
 Bpaths=6.00, pBrian(n=4,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 5 0 .5
 #paths=10.00, pmoddyck(n=5,maxup=0,pup=0.500) = 0.31250000
 Bpaths=10.00, pBrian(n=5,maxup=0,pup=0.500) = 0.31250000

There are always $2^n$ total paths (duh, it's a binomial tree:), and our displayed #paths/Bpaths are the number of them that successfully traverse the tree. So, for $n=1,2,3$ you can pretty much draw the successful paths in your head, and see that we're both right. But you ain't gonna want to try that with the next few examples below, where I also changed the $p$'s around a little, just to check that also...

bash-4.3$ ./moddyck 16 0 .5
 #paths=12870.00, pmoddyck(n=16,maxup=0,pup=0.500) = 0.19638062
 Bpaths=12870.00, pBrian(n=16,maxup=0,pup=0.500) = 0.19638062
bash-4.3$ ./moddyck 18 0 .75
 #paths=48620.00, pmoddyck(n=18,maxup=0,pup=0.750) = 0.00308244
 Bpaths=48620.00, pBrian(n=18,maxup=0,pup=0.750) = 0.00308244
bash-4.3$ ./moddyck 20 0 .25
 #paths=184756.00, pmoddyck(n=20,maxup=0,pup=0.250) = 0.66734600
 Bpaths=184756.00, pBrian(n=20,maxup=0,pup=0.250) = 0.66734600
bash-4.3$ ./moddyck 22 0 .5
 #paths=705432.00, pmoddyck(n=22,maxup=0,pup=0.500) = 0.16818810
 Bpaths=705432.00, pBrian(n=22,maxup=0,pup=0.500) = 0.16818810
bash-4.3$ ./moddyck 24 0 .75
 #paths=2704156.00, pmoddyck(n=24,maxup=0,pup=0.750) = 0.00091751
 Bpaths=2704156.00, pBrian(n=24,maxup=0,pup=0.750) = 0.00091751

@saulspatz yeah, I see from your stackoverflow stuff that you also program. Take a look at that pmoddyck() recursive function. It surprised the heck out of me when I realized the entire problem's numerically solved in literally five trivial lines of code. I had to first stare at a blank screen for like 20 minutes, trying to figure out how to do it. But when I finally "got it", the simplicity of the algorithm versus the more complex formal analysis was kind of astounding. — Jul 24 at 16:02

Search This Blog

ukmuiik

What's the probability of completing the illustrated â€œbinomial walkâ€ without ever visiting a node above the baseline?

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?

Category

Random preview

What's the probability of completing the illustrated â€œbinomial walkâ€ without ever visiting a node above the baseline?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Color the edges and diagonals of a regular polygon

Relationship between determinant of matrix and determinant of adjoint?

What's the probability of completing the illustrated â€œbinomial walkâ€ without ever visiting a node above the baseline?

3 Answers
3

3 Answers
3

3 Answers
3