-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Possible Issue
In bandit feedback, n_actions are set as int(self.action.max() + 1), which doesn't raise any error in above code,
assuming that logs generated by policy covered all possible actions.
However, to be more precise, I think n_actions should be explicitly given, rather than extracted from log data.
And if changed, the above code might raise error.
If 1000 possible actions and only 0~998 actions exist in bandit _feedback and somehow policy selected action 999,
this might raise out-of-index error.
Idea
-
BanditFeedback data is given
n_actionsexplicitly.
Rather than:
Lines 78 to 81 in 55ab57e
@property def n_actions(self) -> int: """Number of actions.""" return int(self.action.max() + 1) -
Use
n_actionsdirectly inconvert_to_action_dist
Rather than:
zr-obp/obp/simulator/simulator.py
Lines 75 to 78 in 55ab57e
action_dist = convert_to_action_dist( n_actions=bandit_feedback["action"].max() + 1, selected_actions=np.array(selected_actions_list), )