Regimens

Regimens are training algorithms. At its core, a training algorithm is responsible for setting up the environment and the agent and successively calling step on each one, while passing actions and states back and forth as appropriate. q2’s Regimen class implements this basic funtionality and provides hooks for you to customize any other behaviour as desired.

Run:

q2 generate regimen my_regimen

to generate a new regimen from template, then fill in the implementations of whichever event hooks you need.

class q2.regimens.Regimen
before_training()
after_training()

Called once at the start and end of training respectively.

before_epoch(epoch:int)
after_epoch(epoch:int)

Called before and after each epoch.

before_episode(episode:int)
after_episode(episode:int)

Called before and after each episode.

before_step(step)
after_step(step)

Called before and after each step of the environment and agent.

on_error(step, exception)

Called when an exception occurs. If this method returns True then propagation of the exception is stopped, which can be useful when certain exceptions are expected to occur.

plugins() → List[Plugin]

A list of plugins to be used by your regimen.

log(msg:str)

Add a message to the logging output for the current timestep. This method is implemented by q2 and provided as a convenience, you should not override it with your own implementation.

agent: Agent
env: Environment
sess: tf.Session

The tensorflow session.

objective: Objective
action_space

The action_space of the environment.

observation_space

The observation_space of the environment.

agent_constructor: Type[Agent]

Callable that constructs a new agent.

env_maker: Maker

Callable that creates a new environment.

Plugins

When implementing your own regimens, you might find that you want to re-use the same morsels of useful behaviour in multiple different regimens. You can achieve this by implementing a Plugin. For example, q2 comes with a DisplayFramerate plugin that lets any regimen display a nice framerate message in the logs without polluting the core logic of the regimen.

The interface of a plugin is identical to that of a Regimen except that each method takes as first argument a Regimen which it can inspect, interact with or modify. Note that the plugin event hooks are always called before the regimen’s hooks, so the regimen always has final say over any state before the next step is run. You should write your plugins to account for the possibility that the regimen itself might change some state before the next event happens.

You can use a plugin by calling adding it to the list returned by regimen.plugins().