Objectives

Objectives are pipes that take in the reward signal generated by the environment and transform it into an objective for the agent to optimize toward. This enables more complex training behaviour, but is not always necessary. q2 comes with a Passthrough objective that simply passes the reward that the environment yielded directly to the agent. Passthrough is used by default unless you override it with your own.

Run:

q2 generate objective my_objective

to start working on your own objective. You need to implement the following interface:

class q2.objectives.Objective

An abstract base class (interface) that specifies what the environment must implement. You will fill in your own definitions for each of these methods.

reset()

This is called by the training regimen before each episode. It gives you the opportunity to wipe any accumulated state.

step(state, action, reward) → float

Accepts a state and reward from the environment, an action from the agent and returns the new objective for the agent to learn.