.. _environment: Environment requirements ========================= The environment is crucial in the learning procedure. A good trained agent requires an adequate environment. In order to fit the implementation of the package **marl**, the environment must follow some simple rules. OpenAI Gym based environment ---------------------------- MARL-API project is related to OpenAI Gym project https://gym.openai.com/ . To be in accordance with our implementation, the environment used must inherit or reimplement the following methods (specific to OpenAI Gym environments): * ``reset()`` : Reset the environment to an intial state. This method is called when starting a new episode and return an observation. * ``step(action)`` : Update the state of the environment given an action (possibly a joint action for multi-agent training). The output of this method consist in four elements (next observation(s), reward(s), boolean(s) indicating whether the episode is done or not, extra informations) * ``render()`` : Display the environment (only used for testing with parameter ``display=True``) Moreover, it is recommended that environments have two attributes: * ``observation_space`` (gym.Spaces): Defines the observation space of the agent(s) * ``action_space`` (gym.Spaces): Defines the action space of the agent(s) At the time only ``Discrete`` and ``Box`` spaces are admitted. Markov Games formalism ------------------------ MARL-API project is based on the formalism of Markov games. Thus, in the multi-agent case, we consider that each agent perceive a specific reward and we do not consider explicit communication channel. .. warning:: Markov games formalism implies that the *next_observation*, the *reward* and the *is_done* returned by ``step`` function in the environment (see above) are of type ``list`` and are not single values. In order to work with other formalisms such as **Dec-POMDP** or **Dec-POMDP-Com**, we need to adapt the environment to fit above requirements. For instance, transform **Dec-POMDP** formalism into **Markov Game** one consists in giving to each and every agents the common reward.