A research team from the University of California, Irvine (UC Irvine) created an artificial intelligence that taught itself to solve a Rubik’s Cube in 30 moves.
The team was comprised of PhD candidates Stephen McAleer, Forest Agostinelli and Alexander Shmakov, as well as UC Irvine computer science professor Pierre Baldi.
Their paper can be read online through Cornell University’ arXiv preprint server. The paper outlines the process behind creating and training the DeepCube AI.
The problem with reinforcement learning
The team’s first challenge was teaching DeepCube how to learn.
Reinforcement learning is often the best approach for games like chess or Go. Those games have a definitive way to tell whether a move is ‘good’ or ‘bad.’ It makes learning how to play easier for the AI — it can learn what constitutes a ‘good’ move.
However, Rubik’s Cubes don’t have good or bad moves in the conventional sense. In a complex 3D puzzle like the Rubik’s Cube, it’s hard to tell whether a single move has made an improvement to the overall state of the puzzle.
The other challenge is simplicity. A 3x3x3 cube features a total ‘state space’ — that is, number of potential ‘states’ the cube can exist in — of 43,252,003,274,489,856,000 (43 quintillion). However only one of those states matters — the one where every side is the same colour.
A few years ago it was shown that the fewest number of moves to solve a cube from any random scramble is 26. Difficulty arises, however, when trying to teach an AI to solve the puzzle without having to input 43 quintillion points of data into an algorithm.
Learning to reinforce itself
To overcome the limitations of reinforcement learning, the team created and utilized an AI technique known as Autodidactic Iteration.
Autodidactic Iteration allows an AI to create its own reward system, so that it can then use reinforcement learning.
DeepCube is able to learn the strength of its moves without outside help. Furthermore, it uses only the changes to a Rubik’s Cube itself for information.
Whenever the AI performs a move, it jumps to a completed state and then works its way back to the proposed move. DeepCube is therefore able to evaluate the strength and proficiency of the move.
Once the AI has enough information about its current position, it uses a traditional tree search method to examine each possible move to determine the best one. It repeats this until it finishes the cube.
Beyond just learning how to solve a Rubik’s Cube, DeepCube picked up a lot of knowledge about the cube along the way.
For example, DeepCube learned a strategy used by ‘speedcubers’ — people who try and solve Rubik’s Cubes really quickly. The technique involves matching up corner and edge cubelets before placing them in the correct position.
It only took DeepCube 44 hours to learn how to complete solve any standard, randomly scrambled Rubik’s Cube in an average of 30 moves. That puts DeepCube right amidst the speedcubers who employ human historical knowledge.
The researchers now plan on testing the Autodidactic Iteration technique by trying to teach DeepCube how to solve harder 16-sided cubes.
The research could also be used to solve real-world problems. Some applications include predicting the 3D shape of proteins. Instead of figuring out the next move of a Rubik’s Cube, the system could determine the proper sequence of amino acids along a 3D lattice.