Environments¶
Environments evolve over time as the agent interacts with it, typically according to predetermined rules so that the environment presents consistent behaviour. They may also define end conditions that cause the episode to end.
q2 comes with a set of environments from OpenAI Gym and OpenAI Retro. On the command line, run:
q2 train random --env gym.CartPole-v1 --episodes 10 --render
to see OpenAI Gym’s cart-pole environment in action.
Any OpenAI gym environment can be referenced from the command line tool as
gym.<environment>
, and similarly any OpenAI Retro can be specified as
retro.<game>
. Note that some of the Retro environments require special
ROMs [1].
However, eventually you may want to design your own training environment. To
do so, you need to make a Python class that inherits from the abstract base
class q2.environments.Environment
, which means that you will implement
the following interface:
-
class
q2.environments.
Environment
¶ An abstract base class (interface) that specifies what the environment must implement. You will fill in your own definitions for each of these methods.
-
reset
() → state¶ Reset the environment to its initial state and return it.
-
step
(action) → next_state, reward, done, info¶ Given the agent’s choice of action, move the environment forward one time step and return a 4-tuple that includes the new state, the reward as a
float
, whether the environment has entered an end state as abool
, and an arbitrary infodict
which may be used to output debugging information.The agents action must match the spec in
action_space
and the new state returned must match the spec inobservation_space
.
-
render
()¶ Optional to implement, ignore it if you don’t care about seeing the environment on the screen. If you do want to implement it, a good practice is to create a new window the first time
render
is called, drawing to it on every subsequent call and then closing the window from withinclose()
.
-
close
()¶ Shutdown the environment, freeing any resources (e.g. rendering windows).
-
observation_space, action_space
These attributes must be set to instances of
gym.Space
. They provide a specification of the allowable actions and environment states, which agents may use to define their internal models.All states returned by the environment must match the spec in
observation_space
.
-
Footnotes
[1] | See the https://github.com/openai/retro#roms for details. |