Environment requirements¶
The environment is crucial in the learning procedure. A good trained agent requires an adequate environment. In order to fit the implementation of the package marl, the environment must follow some simple rules.
OpenAI Gym based environment¶
MARL-API project is related to OpenAI Gym project https://gym.openai.com/ . To be in accordance with our implementation, the environment used must inherit or reimplement the following methods (specific to OpenAI Gym environments):
reset()
: Reset the environment to an intial state. This method is called when starting a new episode and return an observation.step(action)
: Update the state of the environment given an action (possibly a joint action for multi-agent training). The output of this method consist in four elements (next observation(s), reward(s), boolean(s) indicating whether the episode is done or not, extra informations)render()
: Display the environment (only used for testing with parameterdisplay=True
)
Moreover, it is recommended that environments have two attributes:
observation_space
(gym.Spaces): Defines the observation space of the agent(s)action_space
(gym.Spaces): Defines the action space of the agent(s)
At the time only Discrete
and Box
spaces are admitted.
Markov Games formalism¶
MARL-API project is based on the formalism of Markov games. Thus, in the multi-agent case, we consider that each agent perceive a specific reward and we do not consider explicit communication channel.
Warning
Markov games formalism implies that the next_observation, the reward and the is_done returned by step
function in the environment (see above) are of type list
and are not single values.
In order to work with other formalisms such as Dec-POMDP or Dec-POMDP-Com, we need to adapt the environment to fit above requirements. For instance, transform Dec-POMDP formalism into Markov Game one consists in giving to each and every agents the common reward.