theta_etc: August 2011

Saturday, August 13, 2011

Proof of Conjecture for paths in hypercube lattices

Problem:

hypercubical lattice in D dimensions
start from O = (0,0...)
allowed moves are only of the form (0,0,+, 0,0, ...)
End point of path is (N1, N2,...ND)

What is the number of paths?

Answer:
The multinomial coefficient
(N1+N2 ... ND)C(N1)(N2)...(ND) = (N1+N2...+ND)! / ( N1! N2! ...ND!)

Proof:
Let the whole number valued coordinate axes be n_i, i = 1...D.
The end point lies on the hyperplane \sum(n_i) = \sum(N_i).
One has to choose \sum(N_i) times to get to that hyperplane, from between D choices each time (the choice of direction to move in).
To get to the point (N_1, N_2..., N_D), exactly N_i choices have to have been in the i- direction.
Hence the number of paths is the number of ways of choosing the above partition from its sum.

In 3D, this is the number of ways of choosing 3 Easts, 5 Norths and 2 Ups from 10 choices to reach the point 3E, 5N and 2U.

Hence the multinomial coefficient. QED

As a little extra: lets say we were only interested in paths between diagonally opposite points of the D-hypercube, say (0,0,...) and (N,N,...).

Then the number of paths is (D*N)!/(N!)**D, which was my conjecture two days ago!!!

Now, the Analytical answer can be produced with 2 lines more of code: one to read the dimension from the command line, the other to read the coordinates of the end point.

The codeCounting program would need a bit more, a couple of for loops to write out the D-dimensional python code (if needed) and then to call it.

But the amount of time it is going to take is going to be ... HELP?

Last point for today: The factorial function can be extended to complex plane except negative integers by the Gamma function.

So now I can legitimately answer the question: How many paths are there to a say integer diagonal point in a fractional dimension?

Answer = Gamma(N*D +1)/(N!)**D.

_ : Who cares about your perspective? What practical value can it possibly have?

Here goes: Play fast and loose with "fractional" = "fractal". Now we actually have something we can visualize: paths on a square Sierpinski gasket e.g.

And the practical use? Conjecture: The structure of the web or networks on the web is fractal. There is fractal behaviour even in Statistical Natural Language Processing, Benoit Mandelbrot did important work on it in the 50's.

Lets say that the LinkedIN network is fractal. Maybe the above could be of some use in counting paths, for a better calculation of the clustering coefficient of an egonet.

From the perspective of SNLP, simple extensions of the above counting processes lead to the distributions of sizes of stochastically generated words, e.g with truncation on choice of ' '.

Ranjeet

simplifying the boundary conditions

Last night and this morning when I woke up I was thinking of the extension to higher dimensions. One thought was that the number of elif conditions with special case sums for the points on the boundary would grow as 2**D -1, all the "lower" edges, faces, vertices etc. Even if I wrote python code to

take in the dimension as an argument and
have it write python code for the dimension required (which I'll do in a minute)

this would be quite tedious.
Then I realised that the recursive self-calls stop when they encounter a value, so I can just set boundary conditions for "parent nodes" on the enveloping lower hyper-planes (of which there are only D) to be 0.

Here is the somewhat more "biutiful" code (I rented it, but ended up watching Terminator 3 instead):

#!/usr/bin/python
""" 'codeCountPaths1.py Natural [Natural]'
    Single argument -> (n,n)
    code to count the number of paths
    Assumption 1: starting at (0,0)
    Assumption 2: ending at user input (n,m)
    moving in (+,+) direction only, on a complete Euclidean square lattice.
    CHANGE from codeCountPaths.py:
    A branch of the recursion will stop when it encounters a value, so
    BCs can be accounted for by putting in lines of
    0s at n= -1 and m = -1 as opposed to
    restricting the sum for boundary points to just those parent nodes
    which lie within the domain. There is no need for the 0-valued
    parent nodes to go back to the n+m = 0 line (or soon2B hyperplane)
"""
import sys, math
def NumPaths(i,j,n,m):
if i==0 and j==0:
    numpaths = 1 # BC at origin
elif ((i==-1) and j in range(1,m+1)) or ((j==-1) and i in range(1,n+1)):
    numpaths = 0 # BCs at enveloping lower lines
else:
    numpaths = NumPaths(i-1,j,n,m) + NumPaths(i,j-1,n,m)
return numpaths
def main():
# reads command line arguments for the location of the ending point
if len(sys.argv) < 2:
    print "usage: $ python codeCountPaths.py Natural [Natural]"
elif len(sys.argv) == 2:
    n = int(sys.argv[1])
    m = int(sys.argv[1])
elif len(sys.argv) ==3:
    n = int(sys.argv[1])
    m = int(sys.argv[2])

# scolds user for trying to trick poor little machine
if (n < 0) or (m < 0):
    print "BoundError: n >= 0 and m >= 0. Tricks later, for now positive intege\
r only please"
# asks for answer, and sits back for an awfully long time
ans = NumPaths(n,m,n,m)
print "Code counted number of paths from (0,0) to (",n,', ',m,") on a complet\
e Euclidean square lattice moving only in the (+,+) direction = ", ans
if __name__ == '__main__':
main()

Question: I have a recursion relation for the number of time-steps needed, but can't solve it analytically, neither can I correctly code to just count each time a call is made.

Can one of you CS gods help put better bounds on "awfully long time"?

Code for A BEAUTIFUL BEAUTIFUL PROBLEM

Beautiful code!!!

OK: I can't believe
0) yesterday's code was so Uuugly!!
1) I completely missed all sorts of beautiful things and the joy of having coded this problem (succesfully, meaning without crashing my computer or bringing down the whole LinkedIn site even though I had been careful enough to close all connections before hitting 'return')
2) Yesss! it was also correct!

Yesterday's code:
#!/usr/bin/python                                                              \
""" 'diagPaths.py' """
import sys, math
def diagN(n):
if n < 1:
    print "tricks later, for now positive integer only please"
else:
    return math.factorial(2*(n-1))/math.factorial(n-1)**2
def main():
if len(sys.argv) < 3:
    #
    print "usage: $ python diagPaths.py --n natural number"
elif sys.argv[1] == '--n':
    n = int(sys.argv[2])
    ans = diagN(n)
    print "Number of paths between diagonal vertices for ",n, "-square lattice \
= ", ans
if __name__ == '__main__':
main()

3) If you want to exploit the beauty and power of recursion, code backwards! The number of paths to get to (i,j) is = numpaths(i-1,j) + numaths(i,j-1). Then just ask for numpaths(whatever you want) and let the code call itself ...

4) ... Modulo the bounds: while thinking about the bounds I realised that the upper bounds don't constrain anything. Which in turn means ...

5) The number of paths to the point (i,j) from (0,0) is just (i+j)C(j). Why? It is the i+j th diagonal, and the point is the jth from one end of it, so it is just the Binomial coefficient given above.

6) Which further means that we have an alternative proof of the sum of the squares of the binomial coefficients: yesterday I explained that the numpaths to get to (n,n) is the sum of the squares of the numpaths to get to points on the transvecting diagonal, which is the sum of the squares of the Binomial coefficients. Today, I've explained that the numpaths to get to (i,j) is (i+j)C(j). Hence to get to (n,n) the numpaths is (2n)C(n) = (2n)!/(n!)^2 !!!! - That is exciting, not factorials.

Analytical answer:

#!/usr/bin/python
""" 'AnalyticalCountPaths.py' 1 argument -> end point (n,n)
     2 arguments -> (n,m)
     assume complete square Euclidean lattice
     starting point = (0,0)
"""
import sys, math
def main():
if len(sys.argv) < 2:
    print "usage: $ python AnalyticalCountPaths.py Natural [Natural]"
elif len(sys.argv) == 2:
    n = int(sys.argv[1])
    m = int(sys.argv[1])
elif len(sys.argv) ==3:
    n = int(sys.argv[1])
    m = int(sys.argv[2])
if (n < 0) or (m < 0):
    print "EEp! Bloink! HAL No Compute, go fall into pole of Riemann Zeta. BoundError"
    print "For now positive integers only please"
else:
    ans = math.factorial(n+m)/(math.factorial(n)*math.factorial(m))
print "Analytical solution: Number of paths from (0,0) to (",n,', ',m,") on a\
complete Euclidean square lattice moving only in the (+,+) direction = ", ans
print "\n Oh my but this problem is beautiful!! Alternate proof of sum of squ\
ares of Binomial coefficients!"
if __name__ == '__main__':
main()

so why do I need to write the code for actually counting this?
Because, I can't believe I am saying this, it is EVEN more beautiful!
AND FINALLY:
#!/usr/bin/python
""" 'codeCountPaths.py Natural [Natural]'
    Single argument -> (n,n)
    code to count the number of paths
    Assumption 1: starting at (0,0)
    Assumption 2: ending at user input (n,m)
    moving in (+,+) direction only, on a complete Euclidean square lattice.
"""
import sys, math
def NumPaths(n,m):
if (n < 0) or (m < 0):
    print "BoundError: n >= 0 and m >= 0. Tricks later, for now positive integer only please"
elif n==0 and m==0:
    numpaths = 1
elif n == 0:
    numpaths = NumPaths(n, m-1)
elif m == 0:
    numpaths = NumPaths(n-1,m)
else: # THE MEAT OF THE CODE
    numpaths = NumPaths(n-1,m) + NumPaths(n,m-1)
return numpaths
def main():
if len(sys.argv) < 2:
    print "usage: $ python codeCountPaths.py Natural [Natural]"
elif len(sys.argv) == 2:
    n = int(sys.argv[1])
    m = int(sys.argv[1])
elif len(sys.argv) ==3:
    n = int(sys.argv[1])
    m = int(sys.argv[2])
ans = NumPaths(n,m) # here is where I ask for the answer just once!
print "Code counted number of paths from (0,0) to (",n,', ',m,") on a complet\
e Euclidean square lattice moving only in the (+,+) direction = ", ans
print "\n Oh my but this problem is beautiful!!"
if __name__ == '__main__':
main()

I'll confess I was nervous when I ran this: I'd already calculated the answer in my head, and was praying to Pascal that my codecounting would be right!

Did I thank _ enough for this problem?
Tomorrow: we'll relax the Euclidean requirement, then the "completeness".

By The Way: I should remind myself that any second year Numerical Path Integral Physics Grad student is probably doing this in their sleep.

Thursday, August 11, 2011

A BEAUTIFUL BEAUTIFUL PROBLEM

& BEAUTIFUL CODE WITHOUT CODE

The problem, as I recall it being described to me, is to calculate the number of ways to get from one vertex of a square lattice (of size N points per side) to the diagonally opposite one, where along either axis each move is always in the same direction, say ++. It is or used to be a popular interview question to test the interviewee’s coding skills.

My zeroth thought … I’ll come back to that later.

My 1.1th thought was of course, an iterative approach – figure out a process to get from the k-1th diagonal to the kth: at each step of the iteration use the distribution of ways to get to the pth point on the k-1th diagonal to calculate the number of ways to get to the qth point on the kth, store this new distribution. Start from k=2 and repeat until k=N. OK, that gets us to the diagonal. Thought 1.1 was divide and conquer: get to the transvecting diagonal between the vertices, square the distribution and then add.

My second thought was to take a recursive approach for the first part (the distribution on the diagonal):

Def diag(k):

If k ==1:

Return [1]

Else:

Oldlist = diag(k-1)

Newlist = [ oldlist[i] + oldlist[i+1] for i in range (0,k) ]

Return newlist

I had forgotten how much more beautiful recursion is than iteration!

All you do is call diag(N), square the elements and sum!

I think the above goes as n**3.

Time to implement and try it:

$ python diagonalPaths.py --n natural number

OK, so for n =1,2,3,4,5 we get 1,2,6,20,70.

So how do we know it is right? Let’s say you’ve debugged the code, validated it for n=1,2,3. It agrees with what the interviewer has. The answer agrees with the consensus, still, how do we know that it is right?

Put another way, can you predict the answer for n = 6 or any n, before you run your code, or at least, in less than n**3 time?

I’ll claim that my code runs in n**1 time. How?

Well, I didn’t code the recursion. Which brings me to my 0^th thought, I can do it analytically, why code it?

One can readily see that the distribution on the diagonal is the binomial, since at every node you are making the choice between +/--, and the number of ‘+’ gets you the position along the diagonal. If we just wanted to get to the diagonal, the number of ways is 2**(n-1). So the answer we want is the sum of the squares of the binomial coefficients, which luckily has a closed form : (2N)!/(N!)**2. If ! goes as N-time, so does my code.

Comment 0: I can also ask the question: “to calculate the number of ways to get from one vertex of a square lattice (of size N points per side)”, where N is:

Non-positive

Rational

Real (for the computer this should be pseudo-real or float I think)

Complex

Matrix (real or complex valued)

Anything which forms a group under multiplication, or anything that forms a group under a group operation which satisfies the properties of multiplication, with the relaxation of the commutativity requirement.

Since n! = Gamma(n+1), any of the above can be calculated (convergence for matrices will be harder to prove).

I wouldn’t have the slightest idea how to code any of the above!

Comment 1: What about lattices of different dimensions? Now here you’ve got me, I can’t avoid coding. If we are only interested in the number of ways of getting to the nearest transvecting diagonal plane, the answer is simply D**(n-1). But it is no longer sufficient to square the multinomial coeff.s and add, since we have to get to the other transvecting planes before getting from the last one to the opposite vertex. There are two such planes in 3D, with normal vector (1,1,1) and at distances 1/sqrt(3) and 2/sqrt(3) from the origin. Similarly in higher dimensions.

Now if you construct the following D dimensional solids: namely to take two D-simplices and to join them on their D-hedron faces, then we are reduced to summing the squares of the multinomial coefficients. But otherwise:

the “coding” value of this problem lies in its extension to higher dimensions.

Comment 1.1: I am going to take a reasonable guess and claim that the answer in D-dimensions is (Dn)!/(n!)**D. Again, by using the gamma function (and extending it to Complexes), we can allow both n and D to take on complex values. I have no idea what a complex dimension looks like (though I suspect that John Conway and others do), but a fractional dimension, or at least a real valued fractal dimension, that I can visualize. The famous Sierpinski gasket, also http://www.jimloy.com/fractals/sierpins.htm has a fractal dimension (Box counting = Hausdorff) of 1.585. Similarly, if I construct the square equivalent (taking away 4 of the nine subsquares of a tictactoization of the square at every recursion), I get a lattice of fractal dimension ln(5)/ln(3) ~1.465 (5 copies, 1/3 linear dimension each). I can readily imagine that the constraints of having to pass through certain lattice points reduces the number of paths relative to the number in a square lattice.

Comment 2: What a beautiful introduction to Path Integrals! The function of paths we are “integrating” above is just 1. But you can readily imagine summing the length L of the path over all paths, or say the phase change over the path, exp(i L), or indeed any other function of paths. If you use L, you are doing geometric optics, if you use some particle action S, why, “welcome to quantum mechanics!”

I’d like to thank _ for suggesting this problem.