I am happy to report that I computed P(2, 28) and got the same result as DEK using a simple depth first search algorithm implemented with some clever bitwise operations. It ran for exactly 1 week (168 hours) on a pair of high-end consumer GPUs in my hobby computer.
The program is written in C++ and there are two versions, one that runs on CPU (using all cores), and one that runs on GPU (using all NVidia GPUs in the system). I am still working on a version that will run on both CPU and GPU for P(2, 31).
Running on 22-core Xeon E5-2699v4 processor, the CPU version took 91.6 hours to compute P(2,27). That's approximately the same performance as the GPU version using a single GPU. The algorithm code is identical in the two versions, and that's why it will be easy to make a version that runs on both CPU and GPU.
The new program is faster but would still take 18 weeks for P(2, 31). I think I'll spend some more time digging into the structure of the problem for insights regarding early pruning and symmetry.
I actually like this algorithm better -- it's simpler, solves both normal Langford and Planar Langford, and generalizes Planar Langford to higher than 2 dimensions (see comment at top of code file):
In the Mathematica chart below, I have plotted the runtimes of my previous program (the one that took precisely 1 week to compute P(2,28)) and extrapolated to P(2,31) showing that it would take ~39 weeks.
The new program has a very similar performance chart. For both algorithms, time complexity is O(3.38^n) -- with the same exponent base 3.38, just different constant factors. Remarkable, as the two algorithms search the space in very different ways.