Skip to content

This repo contains implementation of algorithms that I have learned in my course work of Reinforcment learning

Notifications You must be signed in to change notification settings

kumar-sanjeeev/reinforcement-learning

Repository files navigation

Reinforcement learning techniques

Implemented the VALUE ITERATION, POLICY ITERATION and Q LEARNING on the grid World Environment.

Table of content

Getting Started

  1. Clone the repo :
$ git clone https://github.com/kumar-sanjeeev/reinforcement-learning.git
  1. Install the dependencies by running requirements.txt file
$ pip install -r requirements.txt

Information about main code files

  • agent.py, RandomAgent.py : General class as template for agents and a completed RandomAgent
  • ValueIterationAgent.py : value iteration agent
  • PolicyIterationAgent.py : policy iteration agent
  • QLearningAgent.py : q learning agent
  • output_xxx_selftest.txt : output of specific runs to check your implementation
  • solution_xxx.py : result obtained
  • mdp.py : abstract clas for general MDPs
  • environmrnt.py : abstract class for general reinforcement learning environments
  • gridworld.py : gridworld main code and test harness
  • gridworldclass.py : implementation of gridworld internals
  • utils.py : some utility code

Quick Start GridWorld

To get started run the gridworld in interactive mode:

  • move the agent with arrow keys
python3 gridworld.py -m

image

  • Control the different aspects of the simulation. A full list is available by running:
python3 gridworld.py -h
Options:
  -h, --help            show this help message and exit
  -d DISCOUNT, --discount=DISCOUNT
                        Discount on future (default 0.9)
  -r R, --livingReward=R
                        Reward for living for a time step (default 0.0)
  -n P, --noise=P       How often action results in unintended direction
                        (default 0.2)
  -e E, --epsilon=E     Chance of taking a random action in q-learning
                        (default 0.3)
  -l P, --learningRate=P
                        TD learning rate (default 0.5)
  -i K, --iterations=K  Number of rounds of policy evaluation or value
                        iteration (default 10)
  -k K, --episodes=K    Number of epsiodes of the MDP to run (default 0)
  -g G, --grid=G        Grid to use (case sensitive; options are BookGrid,
                        BridgeGrid, CliffGrid, MazeGrid, CustomGrid, default
                        BookGrid)
  -w X, --windowSize=X  Request a window width of X pixels *per grid cell*
                        (default 150)
  -a A, --agent=A       Agent type (options are 'random', 'value' ,
                        'policyiter' and 'q', default random)
  -t, --text            Use text-only ASCII display
  -p, --pause           Pause GUI after each time step when running the MDP
  -q, --quiet           Skip display of any learning episodes
  -s S, --speed=S       Speed of animation, S > 1.0 is faster, 0.0 < S < 1.0
                        is slower (default 1.0)
  -m, --manual          Manually control agent (for lecture)

Policy Iteration

Run the policy iteration agent on the following paramters:

python3 gridworld.py -a policyiter

image

Final Result

State Values

image

Q-Values

image

Value Iteration

Run the Value iteration agent on the MazeGrid environment:

State Values

python3 gridworld.py -a value -g MazeGrid

image

Q Values

image

Q-Learning

Run the Value iteration agent on the MazeGrid environment:

State Values

python3 gridworld.py -a q -g MazeGrid -k 100 -q

image

Q Values

image

Resources

This problem was given in the homework assignment of REINFORCEMENT LEARNING and LEARNING BASED CONTROL COURSE at my university

About

This repo contains implementation of algorithms that I have learned in my course work of Reinforcment learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published