Deep neural networks are known to teach opaque, uninterpretable representations that are beyond human comprehension. As such, from both scientific and practical points of view, it is exciting to explore what is actually learned and how in the case of superhuman self-taught neural network agents like AlphaZero.
In the new magazine Acquisition of chess knowledge in AlphaZero, DeepMind and Google Brain researcher and former world champion in chess, Vladimir Kramnik, explore how and to what extent human knowledge is acquired by AlphaZero and how chess concepts are represented in its network. They do this through extensive concept probing, behavioral analysis and study of AlphaZero’s activations.
The team aims their study at an improved understanding of:
- Coding of human knowledge.
- Acquisition of knowledge during education.
- Reinterpretation of the value function via the coded chess concepts.
- Comparison of AlphaZero’s evolution with human history.
- Development of AlphaZero’s preferences for candidate transfer.
- Proof of concept against unsupervised concept discovery.
The researchers presuppose their study with the idea that if the representations of strong neural networks like AlphaZero do not resemble human concepts, our ability to understand faithful explanations of their decisions will be limited, ultimately limiting what we can achieve with neural network interpretation .
The team records human concepts from network activations on a large dataset of inputs, examining each concept at each block and across many control points during AlphaZero’s self-game training process. This enables them to build a picture of what has been learned, when it was learned during training, and where in the network it is intended.
The team examines how chess knowledge is gradually acquired and represented using a sparse linear probing methodology to identify how AlphaZero represents a wide range of human chess concepts. They visualize this acquisition of conceptual knowledge by illustrating what concept is learned, when in training time, and where in the network in “what-when-where plots.”
After examining how internal representations change over time, the team then examines how these shifting representations give rise to altered behavior by measuring changes in tensile probability on a curated set of chess positions; and by comparing the development during self-game training with the development of movement choices in human play at the top level.
Finally, given the established AlphaZeros activations used to predict human concepts, the team examines these activations directly using non-negative matrix factorization (NMF) to decompose AlphaZero’s representations into several factors to obtain a complementary view of what is calculated by the AlphaZero network.
The team’s study of the progression of AlphaZero’s neural network from initialization to the end of training provides the following insights: 1) Many human concepts can be found in the AlphaZero network; 2) A detailed picture of knowledge acquisition during training emerges via the “what-when-where plots”; 3) The use of concepts and relative concept value develops over time, AlphaZero initially focuses primarily on material, where more complex and subtle concepts first emerge as important predictors of the value function relatively late in the training; 4) Comparison with historical human play reveals that there are remarkable differences in how human play has evolved, but also striking similarities in terms of the development of AlphaZero’s self-play policy.
The paper Acquisition of chess knowledge in AlphaZero is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you will not miss any news or research breakthroughs. Sign up for our popular newsletter Synchronized Global AI Weekly to get weekly AI updates.