Skip to main content


One of the most enjoyable days I've had so far at TotallyMoney was this years Big Hack, when everyone is given 24 hours to build something that in the 'business as usual' setting would be difficult to allocate time to. Any idea that solves a real problem is welcome regardless of the applied technology (if any), as long as it is not related to the day-to-day work at TM.

Reinforcement Learning

On this year's Big Hack I decided to experiment with an area I've long found exciting, Reinforcement Learning (RL), and build a simple tic-tac-toe* machine that gets more and more skilful with experience. As tic-tac-toe is a very simple game with a relatively small number of potential board states (depending on the size of the board, naturally), training an AI that achieves super-human performance is feasible, even with the computational capacity of an ordinary laptop.

According to the principles of RI, the way one can teach a machine to master a given task is similar to the way humans can be motivated to learn: rewarding positive outcomes and punishing negative ones. The process of learning a game starts with machine (agent) taking steps at random, which almost certainly lead to a defeat against an opponent who, unlike the machine, understands the game's rules as well as winning strategies. After receiving negative scores for the steps leading to the defeat, the machine will be forced to explore alternative steps that might lead to positive outcomes (state). Following the first (accidental) victories, the received reward will motivate the machine to explore steps similar to the winning ones in the following rounds, eventually leading to exploring all the winning and losing strategies.


For those of you who think that playing with a computer that makes the dumbest steps for hundreds of rounds before it gets better incrementally is dull, there is good news: the above process works the same if the machine plays with another machine, so by setting a few parameters, such as the learning rate and the exploration-to-exploitation ratio, and pressing a button, we can have an invincible AI even in 15-20 minutes!

If you want to learn more about the RI algorithm used, read this. If want to check the code, click here.

*or noughts, or crosses, depending on where you're from

We're on a mission to help everyone move their finances forward and gain financial momentum.

TotallyMoney is an independent credit broker, not a lender. Our comparison service works with most leading lenders, covering the majority of the market. Though we may be paid a fee by lenders or brokers this never influences how our products are ranked.

We don't provide financial advice. Product information is obtained from independent sources and rates displayed may vary depending on your personal circumstances. While we make every effort to ensure that information is up to date, you should always confirm the terms of the offer with the product provider.

TotallyMoney is owned and operated by TotallyMoney Limited which is registered in England and Wales (Company Registration Number 06205695). TotallyMoney Limited is an Appointed Representative of TM Connect Limited, which is registered in England and Wales (Company Registration Number 06967012) and authorised and regulated by the Financial Conduct Authority in respect of consumer credit related activities (FCA FRN: 511936). Trading Address and Registered Office: Chapter House, 16 Brunswick Place, London N1 6DZ. Credit is available, subject to status, only to UK residents aged 18 or over.

We use cookies as described in our Cookie Policy. Continue browsing or click to accept.