Skip to content

A simple example of randomized ensembled double q learning

Notifications You must be signed in to change notification settings

sjYoondeltar/REDQ_simple_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

196 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REDQ simple example

2D vehicle control learning examples with REDQ

Reinforcement learning algorithm

  • Agent
    • Soft Actor Critic (SAC)
      • able to tune an update-to-data (UTD) ratio G
    • Randomized Ensembled Double Q learning (REDQ)
      • v1 : N critics and N critic optimizers
      • v2 : N critics and 1 critic optimizer
  • ETC
    • multi-step Q learning

Environment

  • Unicycle model

  • Lidar-like sensor model

  • Task : reach the goal point with the green circle while avoiding the collision with walls

  • Observation : different direction scan measurement values

    • the default number of scan value N : 9
    • maximum distance : 10m
    • historical window length H : H consecutive observations are concatenated with the shape (1, N*H)
    • angle range : [-120 deg, 120 deg]
    • minmax normalized to [0, 1]
    • add gaussian sensor noise : 0.2
  • Action : angular velocity

    • action range : [-pi/4 rad/s, pi/4 rad/s]
  • Linear velocity

    • train : 3m/s, constant
    • test : 1.5m/s, constant
  • Disturbance

    • zero mean gaussian noise with std 0.3 for linear velocities
    • zero mean gaussian noise with std 0.1 for angular velocities
  • Reward

    • -5 if collisions happens
    • 0.5 if forward distance measure > 5 else 0
  • screen shot
    screenshot

Train the agent

  • Soft actor critic
    • After the training of REDQ, the parameters of the agent are saved in the directory /maze_example/savefile/sac or /maze_example/savefile/sac_g20
cd REDQ_simple_example
python /maze_example/train_sac_agent.py --max_train_eps 100
python /maze_example/train_sac_agent.py --max_train_eps 100 --G 20
  • REDQ
    • After the training of REDQ, the parameters of the agent are saved in the directory /maze_example/savefile/redq or /maze_example/savefile/redq_v2
cd REDQ_simple_example
python /maze_example/train_redq_agent.py --max_train_eps 100 --version v2

Results

  • Early stop the training process when the agent reaches the target for 10 consecutive episodes.
  • In the complex environments, the performance gap between SAC-G and REDQ might widen further

comparison

Reference

About

A simple example of randomized ensembled double q learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages