[INTELLIGENT MACHINES]

qlearning(Playing frozen lake via qlearning).

Source Code

Live Preview

30 Cups of coffee

Description

This project implements a type of reinforcement learning- q learning which uses a Qtable and a dynamic algorithm, the bellman algorithm for calculating the transitional probabbilities for moving from one state to another based on the values for Qtable.

This algorithm came into my knowledge when i just started into reinforcement learning and it just interested me when i read the algorithm and how such a simple algorithm could be so powerful.

The algorithm :

The goal of Q-Learning is to learn a policy, which tells an agent what action to take under what circumstances. Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over all successive steps, starting from the current state. "Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.

Using the above function, we get the values of Q for the cells in the table.When we start, all the values in the Q-table are zeros.There is an iterative process of updating the values. As we start to explore the environment, the Q-function gives us better and better approximations by continuously updating the Q-values in the table.

Frozen Lake :

Open AI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Ping Pong. It provides various tasks for benchmarking our algorithms and is a pretty useful tool to simulate different tasks before loading them to the real world. Frozen lake is a problem where given an array of different positions our main aim is to reach the destination (g) from the start(s). There are 2 types of intermidiate positions, either f or h where we are allowed to walk freely in the f blocks but h is a hole and if we land on it, the game is over. After training the algorithm for 15000 episodes(an episode is over when agent dies), the algorithm starts producing quite reasonable guesses.

Inspirations

My major interest in reinforcement learning just arose from the fact that in many cases i found myself to be stuck where i couldn't apply direct supervised algorithms but i knew that those problems could be solved. I found some cool applications of q learning in addition to the nature paper by google deepmind (https://www.nature.com/articles/nature14236) such as using deep q learning for improving free kicks in fifa (https://towardsdatascience.com/using-deep-q-learning-in-fifa-18-to-perfect-the-art-of-free-kicks-f2e4e979ee66), and playing doom using dqn (https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8)