Environment requirements¶

The environment is crucial in the learning procedure. A good trained agent requires an adequate environment. In order to fit the implementation of the package marl, the environment must follow some simple rules.

OpenAI Gym based environment¶

MARL-API project is related to OpenAI Gym project https://gym.openai.com/ . To be in accordance with our implementation, the environment used must inherit or reimplement the following methods (specific to OpenAI Gym environments):

reset() : Reset the environment to an intial state. This method is called when starting a new episode and return an observation.
step(action) : Update the state of the environment given an action (possibly a joint action for multi-agent training). The output of this method consist in four elements (next observation(s), reward(s), boolean(s) indicating whether the episode is done or not, extra informations)
render() : Display the environment (only used for testing with parameter display=True)

Moreover, it is recommended that environments have two attributes:

observation_space (gym.Spaces): Defines the observation space of the agent(s)
action_space (gym.Spaces): Defines the action space of the agent(s)

At the time only Discrete and Box spaces are admitted.

Markov Games formalism¶

MARL-API project is based on the formalism of Markov games. Thus, in the multi-agent case, we consider that each agent perceive a specific reward and we do not consider explicit communication channel.

Warning

Markov games formalism implies that the next_observation, the reward and the is_done returned by step function in the environment (see above) are of type list and are not single values.

In order to work with other formalisms such as Dec-POMDP or Dec-POMDP-Com, we need to adapt the environment to fit above requirements. For instance, transform Dec-POMDP formalism into Markov Game one consists in giving to each and every agents the common reward.