VecEnc Support TD3 by acyclics · Pull Request #495 · hill-a/stable-baselines

acyclics · 2019-10-02T11:26:58Z

Implemented SubprocVecEnv for TD3.

acyclics · 2019-10-02T15:24:41Z

Ah, it seems like I would first have to make HER work with VecEnv. @araffin Any idea where to begin with that?

araffin · 2019-10-03T14:55:45Z

        self._save_to_file(save_path, data=data, params=params_to_save, cloudpickle=cloudpickle)
+
+
+class Runner(AbstractEnvRunner):


Why do you need a runner? It seems that you only need to save the obs variable.

I noticed that other implementations that uses VecEnv use a Runner. I used a Runner here as I feel like that best enables future developers to build on top of it.

PPO and A2C use a runner because some additional computation/transformations are needed (computation of the GAE notably) which is not the case of TD3 who only need to fill a replay buffer.
However, at some point, we will need to refactor and unify SAC/DDPG/TD3 which have a lot in common.

Ah cool. So perhaps some akin to a "runner" for all three?

Not really, more a common method collect_rollout that would be part of the OffPolicy class, but this is not the subject of this PR.

araffin · 2019-10-03T16:49:37Z

                else:
-                    action = self.policy_tf.step(obs[None]).flatten()
+                    action = self.policy_tf.step(prev_obs).flatten()
+                    action = [np.array([a]) for a in action]


why not removing the flatten instead?

Sounds good

araffin · 2019-10-03T16:49:55Z

                    # Add noise to the action, as the policy
                    # is deterministic, this is required for exploration
                    if self.action_noise is not None:
                        action = np.clip(action + self.action_noise(), -1, 1)


The noise should be different for each env

araffin · 2019-10-03T16:50:58Z

+                    episode_rewards[-1] += reward[i]
+                    if done[i]:
+                        if self.action_noise is not None:
+                            self.action_noise.reset()


same remark as before, I think you should have a action_noise object per env, maybe we need to create wrapper for that or modify the noise class to handle it better

araffin · 2019-10-03T16:52:25Z

+                    if step % self.train_freq == 0:
+                        mb_infos_vals = []
+                        # Update policy, critics and target networks
+                        for grad_step in range(self.gradient_steps):


By putting the update inside the for loop that is used to store new samples, it seems that you are changing the algorithm
Also be careful with step % train_freq when you don't increment step by 1

Ok, I will look into the algorithm part. As for the step % train_freq, I actually do increment step by 1 (at the end of the inner for-loop) so that should be fine.

ok then, but the for step in range(0, total_timesteps, self.n_envs): was misleading

I see your point. Alright, I will clarify that for-loop expression.

Update td3.py

6a59752

araffin reviewed Oct 3, 2019

View reviewed changes

Comment thread stable_baselines/td3/td3.py Outdated

Update td3.py

4eee71d

araffin changed the title ~~Update td3.py~~ VecEnc Support TD3 Oct 3, 2019

Removed Runner

6a012bf

araffin reviewed Oct 3, 2019

View reviewed changes

araffin mentioned this pull request Feb 4, 2020

[question] [feature request] DDPG VecEnv support #679

Closed

		self._save_to_file(save_path, data=data, params=params_to_save, cloudpickle=cloudpickle)


		class Runner(AbstractEnvRunner):

Conversation

acyclics commented Oct 2, 2019 • edited by araffin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acyclics commented Oct 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acyclics Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acyclics commented Oct 2, 2019 •

edited by araffin

Loading

acyclics commented Oct 2, 2019 •

edited

Loading

acyclics Oct 3, 2019 •

edited

Loading