It seems that everybody within the AI sector is at present honing their Reinforcement Studying (RL) abilities, particularly in Q-learning, following the current rumours about OpenAI’s new AI mannequin, Q* and I’m becoming a member of in too. Nevertheless, slightly than speculating about Q* or revisiting previous papers and examples for Q-learning, I’ve determined to make use of my enthusiasm for board video games to present an introduction to Q-learning 🤓
On this weblog put up, I’ll create a easy programme from scratch to show a mannequin the right way to play Tic-Tac-Toe (TTT). I’ll chorus from utilizing any RL libraries like Gym or Stable Baselines; every thing is hand-coded in native Python, and the script is merely 100 strains lengthy. In case you’re inquisitive about the right way to instruct an AI to play video games, hold studying.
You’ll find all of the code on GitHub at https://github.com/marshmellow77/tictactoe-q.
Instructing an AI to play Tic-Tac-Toe (TTT) may not appear all that vital. Nevertheless, it does present a (hopefully) clear and comprehensible introduction to Q-learning and RL, which is likely to be vital within the discipline of Generative AI (GenAI) since there was hypothesis that stand-alone GenAI fashions, akin to GPT-4, are inadequate for important developments. They’re restricted by the truth that they will solely ever predict the subsequent token and never having the ability to cause in any respect. RL is believed to have the ability to deal with this situation and probably improve the responses from GenAI fashions.
However whether or not you’re aiming to brush up in your RL abilities in anticipation of those developments, otherwise you’re merely searching for an enticing introduction to Q-learning, this tutorial is designed for each situations 🤗
At its core, Q-learning is an algorithm that learns the worth of an motion in a specific state, after which makes use of this info to seek out the most effective motion. Let’s think about the instance of the Frozen Lake recreation, a well-liked single-player recreation used to show Q-learning.