Reinforcement learning techniques
Implemented the VALUE ITERATION
, POLICY ITERATION
and Q LEARNING
on the grid World Environment.
- Getting Started
- Information about main code files
- Quick Start GridWorld
- Policy Iteration
- Value Iteration
- Q-Learning
- Resources
- Clone the repo :
$ git clone https://github.com/kumar-sanjeeev/reinforcement-learning.git
- Install the dependencies by running
requirements.txt
file
$ pip install -r requirements.txt
agent.py
,RandomAgent.py
: General class as template for agents and a completed RandomAgentValueIterationAgent.py
: value iteration agentPolicyIterationAgent.py
: policy iteration agentQLearningAgent.py
: q learning agentoutput_xxx_selftest.txt
: output of specific runs to check your implementationsolution_xxx.py
: result obtainedmdp.py
: abstract clas for general MDPsenvironmrnt.py
: abstract class for general reinforcement learning environmentsgridworld.py
: gridworld main code and test harnessgridworldclass.py
: implementation of gridworld internalsutils.py
: some utility code
To get started run the gridworld in interactive mode:
- move the agent with arrow keys
python3 gridworld.py -m
- Control the different aspects of the simulation. A full list is available by running:
python3 gridworld.py -h
Options:
-h, --help show this help message and exit
-d DISCOUNT, --discount=DISCOUNT
Discount on future (default 0.9)
-r R, --livingReward=R
Reward for living for a time step (default 0.0)
-n P, --noise=P How often action results in unintended direction
(default 0.2)
-e E, --epsilon=E Chance of taking a random action in q-learning
(default 0.3)
-l P, --learningRate=P
TD learning rate (default 0.5)
-i K, --iterations=K Number of rounds of policy evaluation or value
iteration (default 10)
-k K, --episodes=K Number of epsiodes of the MDP to run (default 0)
-g G, --grid=G Grid to use (case sensitive; options are BookGrid,
BridgeGrid, CliffGrid, MazeGrid, CustomGrid, default
BookGrid)
-w X, --windowSize=X Request a window width of X pixels *per grid cell*
(default 150)
-a A, --agent=A Agent type (options are 'random', 'value' ,
'policyiter' and 'q', default random)
-t, --text Use text-only ASCII display
-p, --pause Pause GUI after each time step when running the MDP
-q, --quiet Skip display of any learning episodes
-s S, --speed=S Speed of animation, S > 1.0 is faster, 0.0 < S < 1.0
is slower (default 1.0)
-m, --manual Manually control agent (for lecture)
Run the policy iteration agent on the following paramters:
python3 gridworld.py -a policyiter
State Values
Q-Values
Run the Value iteration agent on the MazeGrid environment:
State Values
python3 gridworld.py -a value -g MazeGrid
Q Values
Run the Value iteration agent on the MazeGrid environment:
State Values
python3 gridworld.py -a q -g MazeGrid -k 100 -q
Q Values
This problem was given in the homework assignment of REINFORCEMENT LEARNING and LEARNING BASED CONTROL COURSE
at my university