Objectives¶
Objectives are pipes that take in the reward signal generated by the
environment and transform it into an objective for the agent to optimize
toward. This enables more complex training behaviour, but is not always
necessary. q2 comes with a Passthrough
objective that simply passes the
reward that the environment yielded directly to the agent. Passthrough
is
used by default unless you override it with your own.
Run:
q2 generate objective my_objective
to start working on your own objective. You need to implement the following interface:
-
class
q2.objectives.
Objective
¶ An abstract base class (interface) that specifies what the environment must implement. You will fill in your own definitions for each of these methods.
-
reset
()¶ This is called by the training regimen before each episode. It gives you the opportunity to wipe any accumulated state.
-
step
(state, action, reward) → float¶ Accepts a
state
andreward
from the environment, anaction
from the agent and returns the new objective for the agent to learn.
-