-
Notifications
You must be signed in to change notification settings - Fork 514
Open
Description
Hello!
I was checking your answer for exercise 3.29, and I think it might have a mistake. The final equation averages over all actions, whereas I think it should be the maximum of all actions - hence removing the policy function.
I believe it is a mistake because the backup diagram for q*(page 64) shows the maximum rather than the average.
Looking forward to hearing from you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels