Reinforcement_Learning

Requirements :

Basic requirements

python >= 3.10
numpy >= 1.24.3
gymnasium == 0.29.1
pygame == 2.5.2

In case you want to use the torch or tensorflow agents make sure to get the libraries

tensorflow == 2.13.0
torch

Usage :

to initialize an agent, first generate a neural network that will act as its brain, you can use my DeepLearningNumpy project at this link to create a custom network : https://github.com/Bouscout/Deep_learning_numpy

or you can simply create a model from the module neural_v3

from neural_v3.network import network
from neural_v3.activations import softmax, linear
from agent.agent_numpy.policy_gradient_PPO import Agent_PPO

BUFFER = 1000
obs_space = 10
action_space = 4

brain = network([obs_space, 64, action_space], "relu", learning_rate=1e-4, optimizer="adam")
brain.activ_layers[-1] = softmax()

agent = Agent_PPO(action_space, obs_space, BUFFER, use_next_state=True)
agent.policy.model = brain

if your algorithm requires a baseline and a advantage function you can define them from their module

from agent.agent_numpy.advantages_func import GAE_estimations, discounted_rerward, td_estimations, monte_carlo_estimate

baseline_model = network([obs_space, 64, 1], "relu", learning_rate=1e-4, optimizer="adam")
baseline_model.activ_layers[-1] = linear()

reward_function = GAE_estimations()
reward_function.baseline = baseline_model

agent.policy.advantage_func = reward_function

training loop usage, the function observe will provide an action, but when call with the argument store=True, it will store the last observations inside the replay buffer

total_steps = 100_000
interval = 500

for step in range(1, total_steps) :
    action = agent.observe(state)

    next_state, reward, done, truncated, _ = env.step(action)

    # save observation
    agent.observe(next_state=next_state, reward=reward, is_done=done or truncated, store=True)

    state = next_state

    # training step
    if step % interval == 0 :
        agent.report()
        agent.clear_memory()

    
    if done or truncated :
        state = env.reset()

if you already have a saved model and just to see the agent interaction with the environment

for step in range(1, total_steps) :
    action = agent.observe(state)

    next_state, reward, done, truncated, _ = env.step(action)

    state = next_state

    if done or truncated :
        state = env.reset()

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agent		agent
models		models
neural_v3		neural_v3
snake_game		snake_game
README.md		README.md
main_double.py		main_double.py
ppo_snake_4x4.pickle		ppo_snake_4x4.pickle
ppo_snake_numpy.pickle		ppo_snake_numpy.pickle
ppo_snake_numpy_4_10.pickle		ppo_snake_numpy_4_10.pickle
ppo_snake_sparse_reward.pickle		ppo_snake_sparse_reward.pickle
train_gym.py		train_gym.py
train_multiprocess.py		train_multiprocess.py
train_snake.py		train_snake.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement_Learning

Requirements :

Usage :

About

Releases

Packages

Languages

Bouscout/Reinforcement_Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement_Learning

Requirements :

Usage :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages