By Daneil TimarJul 8th 2020
One of the most enjoyable days I've had so far at TotallyMoney was this years Big Hack, when everyone is given 24 hours to build something that in the 'business as usual' setting would be difficult to allocate time to. Any idea that solves a real problem is welcome regardless of the applied technology (if any), as long as it is not related to the day-to-day work at TM.
On this year's Big Hack I decided to experiment with an area I've long found exciting, Reinforcement Learning (RL), and build a simple tic-tac-toe* machine that gets more and more skilful with experience. As tic-tac-toe is a very simple game with a relatively small number of potential board states (depending on the size of the board, naturally), training an AI that achieves super-human performance is feasible, even with the computational capacity of an ordinary laptop.
According to the principles of RI, the way one can teach a machine to master a given task is similar to the way humans can be motivated to learn: rewarding positive outcomes and punishing negative ones. The process of learning a game starts with machine (agent) taking steps at random, which almost certainly lead to a defeat against an opponent who, unlike the machine, understands the game's rules as well as winning strategies. After receiving negative scores for the steps leading to the defeat, the machine will be forced to explore alternative steps that might lead to positive outcomes (state). Following the first (accidental) victories, the received reward will motivate the machine to explore steps similar to the winning ones in the following rounds, eventually leading to exploring all the winning and losing strategies.
For those of you who think that playing with a computer that makes the dumbest steps for hundreds of rounds before it gets better incrementally is dull, there is good news: the above process works the same if the machine plays with another machine, so by setting a few parameters, such as the learning rate and the exploration-to-exploitation ratio, and pressing a button, we can have an invincible AI even in 15-20 minutes!
*or noughts, or crosses, depending on where you're from