Search results
Results From The WOW.Com Content Network
Deep reinforcement learning, subfield of machine learning that is the basis of AlphaGo; Glossary of artificial intelligence; Go and mathematics; KataGo, the leading open-source Go program; Leela Zero, another open-source Go program; Matchbox Educable Noughts and Crosses Engine; Samuel's learning computer checkers (draughts) TD-Gammon ...
Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome. [10]
His recent work has focused on combining reinforcement learning with deep learning, including a program that learns to play Atari games directly from pixels. [12] Silver led the AlphaGo project, culminating in the first program to defeat a top professional player in the full-size game of Go. [13]
AlphaZero is a generic reinforcement learning algorithm – originally devised for the game of go – that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules."
AlphaGo is a computer program developed by Google DeepMind to play the board game Go. AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. The system's neural networks were initially bootstrapped from human game-play expertise.
The tantalizing unmet goal of defeating the best human players without a handicap, long thought unreachable, brought a burst of renewed interest. The key insight proved to be an application of machine learning and deep learning. DeepMind, a Google acquisition dedicated to AI research, produced AlphaGo in 2015 and announced it to the world in 2016.
A new reinforcement learning algorithm incorporated lookahead search inside the training loop. [64] AlphaGo Zero employed around 15 people and millions in computing resources. [65] Ultimately, it needed much less computing power than AlphaGo, running on four specialized AI processors (Google TPUs), instead of AlphaGo's 48. [66]
Various techniques exist to train policies to solve tasks with deep reinforcement learning algorithms, each having their own benefits. At the highest level, there is a distinction between model-based and model-free reinforcement learning, which refers to whether the algorithm attempts to learn a forward model of the environment dynamics.