Skip to content

[Ex 4.5] Deterministic policy #86

@Jonathan2021

Description

@Jonathan2021

In your pseudocode for calculating q*, if π is deterministic (as stated in initialization and in pseudocode given for v*), then you don't need to loop on all a∈A in step 2 and you don't need to a to ponderate on all a' for the Q(s,a) calculation.

Again, in step 3 you shouldn't loop on a because you get old-action with the deterministic policy.

Thanks for considering this fix ;) Have a nice day !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions