ThatSnail: C++ : Symbolic Regression

Saturday, May 5, 2012

C++ : Symbolic Regression - Part 1

I'm not dead. Just working on something that's taking an awful long amount of time. That's my excuse for my recent post-truancy.

'Wired: Computer Self Discovers Laws of Physics'

See that article up there? Click on it. No? Fine, here's what it says in a nutshell. Some people over at Cornell University stuck a bunch of LED lights onto a chaotic pendulum, hooked up an LED detector onto a computer, and fed the data through some fancy algorithms based on symbolic regression that calculated the gravity equation from scratch.

Sounds complicated? It's not. It's actually extremely simple. Here's the generalized workflow:

Feed some data into the computer, like the positions of the LED lights. These are your state variables.
Use some fancy maths to find the partial derivatives between the state variables. Partial derivatives are used to calculate relations between variables.
Generate thousands of equations randomly, and use genetic algorithms for mutation (value mutation, crossover, etc.). The equations that match the data the closest (via plugging in values) have the highest priority in mutation. This is the symbolic regression step.
Repeat step 3 a hundred thousand gazillion times and after an infinite number of iterations you should converge to the exact equation that fits the data.

That's all. It's very simple.

Programming Symbolic Regression

The main programming obstacles here are as follows:

Building equations in program-my form. There's a predetermined set of operators made into functions and stored as function pointers (add(a,b), sub(a,b), mul(a,b), div(a,b), sin(a), cos(b), among others).
Solving partial derivatives symbolically. I have no idea how to do this yet but I'm sure it's not that hard?
Optimizing it so it doesn't take a hundred thousand years to calculate something nice.

That's the gist of it. If I make any progress I'll mention it on the blog.

If you're interested in your own research, here's the material for the Cornell project. It's a little intimidating to read but isn't all that bad; most of the unfamiliar terms can be figured out in context.

Overview: Distilling Free-Form Natural Laws from Experimental Data
Supporting Online Material