Wednesday, September 19, 2012

C++ : Symbolic Regression - Part 2

Hey, remember that post I made forever ago about making some symbolic regression via genetic programming?  You probably thought I completely gave up on that, right?

Well I have a surprise for you, sillies.  I did.

But then with this new Auto-Generated Music project, I rewrote it from scratch and it actually works.


(Note: The final equation is written in Reverse Polish Notation, which is just a whole lot easier to work with).

But we're not done yet.  For one, the equations tend to converge to local maxima.  In simplespeak, they tend to mutate towards an optimal state, but occasionally the best equation will actually have a radically different structure.  Then, the equation will have to mutate to a state that's actually undesirable, before being able to adjust into the best possible equation.

Another thing you might notice is that it's unreadable as f-ck.  Part of it is because of reverse Polish, but really it's because of redundancy in the code.  The screenshot above doesn't show it, but often you'll get large sections of just adding 0 or multiplying 1.  These options are completely unnecessary, and bog down the code.  Therefore, optimally we'd simplify equations before storing or processing them.

A possible solution to this problem is to hardcode redundant node properties in, which works for very basic redundancies but not much else.  It's easy to do, say, 0 + x = x, or 1 * x = x and call it a day.  But what about sqr(sqrt(x)) = x?  Or i + i + x = x + 2i?  It's not trivial to cover all possible redundancies (although you'd certainly get a good chunk of them).

Another is to do it dynamically, with a second genetic algorithm beneath that takes into account the complexity of the equation.  Say you have x + x + 0 * 1 + x * 1.  Use that as a grading criteria, and mutate it constantly for a certain amount of time.  Then, the equation that matches the initial state exactly (graded by iterating over all possible values) with the fewest nodes replaces the initial.  Problem here is that it's computationally-expensive and also not completely guaranteed to work.

Something to look into is the MOSES algorithm.  Essentially, equations use a set of parameters called knobs, useful for tweaking the equation in mutations (think constants, or what-have-you).

But uh.  Bedtime for me.  'Night.

No comments:

Post a Comment