Skip to content
This repository was archived by the owner on Jan 15, 2026. It is now read-only.

Latest commit

 

History

History
44 lines (44 loc) · 2.98 KB

File metadata and controls

44 lines (44 loc) · 2.98 KB

Curiosity-driven Exploration by Self-supervised Prediction

  • image
  • image
  • More generally, curiosity is a way of learning new skills which might come handy for pursuing rewards in the future
  • Measuring “novelty” requires a statistical model of the dis- tribution of the environmental states, whereas measuring prediction error/uncertainty requires building a model of environmental dynamics that predicts the next state (st+1 ) given the current state (st ) and the action (at ) executed at time t.
  • This work belongs to the broad category of methods that generate an intrinsic reward signal based on how hard it is for the agent to predict the consequences of its own actions, i.e. predict the next state given the current state and the ex ecuted action
  • That is, instead of making predictions in the raw sensory space (e.g. pixels), we transform the sensory input into a feature space where only the information relevant to the action performed by the agent is represented. We learn this feature space using self-supervision – training a neural network on a proxy in- verse dynamics task of predicting the agent’s action given its current and next states.
  • We then use this feature space to train a forward dynamics model that predicts the feature representation of the next state, given the feature representation of the current state and the action. We provide the prediction error of the for- ward dynamics model to the agent as an intrinsic reward to encourage its curiosity.
  • In our opinion, cu- riosity has two other fundamental uses. Curiosity helps an agent explore its environment in the quest for new knowl- edge (a desirable characteristic of exploratory behavior is that it should improve as the agent gains more knowledge). Further, curiosity is a mechanism for an agent to learn skills that might be helpful in future scenarios.
  • image
  • image
  • image
  • image
  • image
  • image
  • image