AlphaZero

src: i.ytimg.com

AlphaZero is a computer program developed by Alphabet's AI research company, DeepMind, which uses an AlphaGo Zero-like approach to master not just Go but also chess and shogi. On December 5, 2017, the DeepMind team released a preprint introducing AlphaZero, who in 24 hours, reached the super-human level in three games by beating the world champion program, Stockfish, Elmo and a 3-day version of AlphaGo Zero, in each case utilizing a custom tensor processing unit (TPU) that the Google program is optimized for use. AlphaZero is trained only through self-play using 5,000 first-generation TPUs to produce second-generation TPU games and 64 to train nerve networks, all in parallel, with no access to open books or endgame tables. After just four hours of training, DeepMind estimates AlphaZero plays in the Elo rank higher than Stockfish; after 9 hours of training, the algorithm convincingly beat Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses and 72 series). Trained algorithms are played on one machine with four TPUs.

Video AlphaZero

Relation to AlphaGo Zero

AlphaZero (AZ) is a more general variant of the AlphaGo Zero algorithm (AGZ), and is capable of playing shogi and chess as well as Go. The differences between AZ and AGZ include:

AZ has hard-coded rules for search hyperparameter settings.
The neural network is now being updated continuously.
Go (unlike Chess) is symmetrical under certain reflections and rotations; AlphaGo Zero is programmed to take advantage of this symmetry. AlphaZero does not.
Chess can end with a series like Go; therefore, AlphaZero can consider the possibility of the game being drawn.

Maps AlphaZero

AlphaZero vs. Stockfish and elmo

Comparing the Monte Carlo tree search search, AlphaZero only tracks 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Milkfish and 35 million for elmo. AlphaZero compensates for a lower number of evaluations using deep neural networks to focus more selectively on the most promising variations.

Google Deep Mind AI Alpha Zero Refutes 1.e4 - YouTube

src: i.ytimg.com

Training

AlphaZero is trained only through self-play, using 5,000 first-generation TPUs to produce second-generation games and 64 TPUs to train neural networks. In parallel, AlphaZero in training periodically matches its benchmarks (Stockfish, elmo, or AlphaGo Zero) in a one-per-step short game to determine how well the training is going. DeepMind considers that AlphaZero's performance exceeds the benchmark of about four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.

AlphaZero: DeepMind's New Chess AI | Two Minute Papers #216 - YouTube

src: i.ytimg.com

Results

Chess

In the AlphaZero chess tournament against Stockfish 8 (world champion TCEC 2016), each program is given one minute thought time per movement. Stockfish allocated 64 threads and 1 GB hash size, setting the Stockfish's Tord Romstad to be criticized later as suboptimal. AlphaZero was trained on chess for a total of nine hours before the tournament. During the tournament, AlphaZero runs a single machine with four special TPU applications. In 100 games of normal starting position, AlphaZero won 25 matches as white, won 3 as black, and drew the remaining 72. In a series of twelve matches of 100 matches (unspecified time or resource constraints) against Stockfish ranging from 12 most human openings popular, AlphaZero won 290, draw 886 and lost 24.

Shogi

AlphaZero was trained in shogi for a total of twelve hours before the tournament. In a hundred shogi games against elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 quest), AlphaZero won ninety times, lost eight times and draw twice. As in a game of chess, each program gets one minute per movement, and elmo is given 64 threads and hash size of 1 GB.

Go

After 34 hours of self-study from Go and against AlphaGo Zero, AlphaZero won 60 matches and lost 40.

Analysis

DeepMind states in the preceding that "Chess Games represent the culmination of AI research for decades." Sophisticated programs are based on powerful machines that look for millions of positions, utilizing artificial domain expertise and advanced domain adaptations AlphaZero is a general strengthening learning algorithm - originally designed for Go games - who achieved superior results in a few hours, looking for a thousand times less position, not given domain knowledge except rules. "Demi Hassabis of DeepMind, a chess player himself, called the alien" AlphaZero "playing style: Sometimes winning by offering opposed sacrifices, such as offering a queen and a bishop to take advantage of positional advantages. "It's like chess from another dimension."

Given the difficulty in chess forcing a win against a strong opponent, the 28-72-0 result was a significant margin of victory. However, some grandmasters, such as Hikaru Nakamura and Komodo developer Larry Kaufman, downplayed AlphaZero's victory, arguing that the match would be closer if the program had access to the opening database (because Stockfish is optimized for that scenario). Romstad also pointed out that Stockfish is not optimized for fixed time stiff movements and the version used is already a year old.

Similarly, some shogi observers think that Elmo's hash size is too low, resignation settings and the "EnteringKingRule" setting (cf. shogi Ãƒ,Ã‚Â§ Entering King) may be inappropriate, and that elmo is outdated compared to newer programs.

src: pbs.twimg.com

Reactions and Criticism

The writings say that chess training only takes four hours: "It's managed in a little over time between breakfast and lunch." Wired pranked AlphaZero as "the first multi-skills AI game-board champion". AI expert Joanna Bryson noted that "Google's publicity talent" puts her in a strong position against challengers. "It's not just about hiring the best programmers, it's also very political, because it helps make Google as strong as possible while negotiating with governments and regulators who see the AI â€‹â€‹sector."

The chess man's grandmasters were very impressed by AlphaZero. Danish Grandmaster Peter Heine Nielsen likens the AlphaZero game with superior aliens. "Norwegian grandfather Jon Ludvig Hammer characterized the AlphaZero game as a" crazy attacking chess "with a deep understanding of his position.Former champion Garry Kasparov said" It's a remarkable achievement, even if we should expect it after AlphaGo. "

Grandmaster Hikaru Nakamura was less impressed, and stated "I do not have to put much credibility in the results just because of my understanding that AlphaZero is basically using Google supercomputers and Stockfish does not run on that hardware: Stockfish basically runs on what would be my laptop. You want to have a comparable match you should have Stockfish running on supercomputers as well. "

src: i.ytimg.com

Note

Alpha Zero vs Stockfish | Game 4 - YouTube

src: i.ytimg.com

References

AlphaZero vs Stockfish Chess Match: Game 3 - YouTube

src: i.ytimg.com

External links

Chess.com Youtube playlist for AlphaZero vs. Stockfish

Source of the article : Wikipedia

AlphaZero

Senin, 11 Juni 2018

AlphaZero

Video AlphaZero

Relation to AlphaGo Zero

Maps AlphaZero

AlphaZero vs. Stockfish and elmo

Training

Results