Skip to content

Ex4.7-A possible bug #82

@khbalhandawi

Description

@khbalhandawi

In your python file Ex4.7-A.py line 51 I think it should read

temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( ((value_A_Changed, value_B_Changed),reward), 0 )

instead of

temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( (value_A_Changed, value_B_Changed), 0 )

The second line above will always return 0 because the key (value_A_Changed, value_B_Changed) does not exist in temp
I tried rerunning it with this change and could not reproduce the answer of the book. I am attaching the optimal policy map that I got

pi_4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions