-
Notifications
You must be signed in to change notification settings - Fork 514
Open
Description
In your pseudocode for calculating q*, if π is deterministic (as stated in initialization and in pseudocode given for v*), then you don't need to loop on all a∈A in step 2 and you don't need to a to ponderate on all a' for the Q(s,a) calculation.
Again, in step 3 you shouldn't loop on a because you get old-action with the deterministic policy.
Thanks for considering this fix ;) Have a nice day !
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels