AI learns to solve a Rubik’s Cube in 1.2 seconds

DeepCubeA uses a neural network (which apes how the human mind processes information) along with machine learning techniques, in which an AI system learns by detecting patterns and theorizing with little human input. It adopts a reinforcement learning approach, by which it learned “how to solve increasingly difficult states in reverse from the goal state without any specific domain knowledge.”

Endgadget

ABSTRACT

 

Recently, Approximate Policy Iteration (API) algorithms have achieved super- human proficiency in two-player zero-sum games such as Go, Chess, and Shogi without human data. These API algorithms iterate between two policies: a slow policy (tree search), and a fast policy (a neural network). In these two-player games, a reward is always received at the end of the game. However, the Rubik’s Cube has only a single solved state, and episodes are not guaranteed to terminate. This poses a major problem for these API algorithms since they rely on the reward received at the end of the game. We introduce Autodidactic Iteration: an API algorithm that overcomes the problem of sparse rewards by training on a distribution of states that allows the reward to propagate from the goal state to states farther away. Autodi- dactic Iteration is able to learn how to solve the Rubik’s Cube without relying on human data. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves — less than or equal to solvers that employ human domain knowledge.

 

SOLVING THE RUBIK’S CUBE WITH APPROXIMATE POLICY ITERATION

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s