DeepCubeA uses a neural network (which apes how the human mind processes information) along with machine learning techniques, in which an AI system learns by detecting patterns and theorizing with little human input. It adopts a reinforcement learning approach, by which it learned “how to solve increasingly difficult states in reverse from the goal state without any specific domain knowledge.”
Recently, Approximate Policy Iteration (API) algorithms have achieved super- human proficiency in two-player zero-sum games such as Go, Chess, and Shogi without human data. These API algorithms iterate between two policies: a slow policy (tree search), and a fast policy (a neural network). In these two-player games, a reward is always received at the end of the game. However, the Rubik’s Cube has only a single solved state, and episodes are not guaranteed to terminate. This poses a major problem for these API algorithms since they rely on the reward received at the end of the game. We introduce Autodidactic Iteration: an API algorithm that overcomes the problem of sparse rewards by training on a distribution of states that allows the reward to propagate from the goal state to states farther away. Autodi- dactic Iteration is able to learn how to solve the Rubik’s Cube without relying on human data. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves — less than or equal to solvers that employ human domain knowledge.