Skip to content

echigot/coevo-games

Repository files navigation

Agent and Environment Co-evolution

A work about co-evolving populations of agents and environments, using the RL framework Griddly in Python.

Introduction

This GitHub repository contains the classes and the scripts necessary to run co-evolution on the videogame Zelda. It also contains some useful tests to run classic evolution or simple unitary tests.

The co-evolution in this project revolves around neuro-evolution: learning is based on evolutionary strategies as opposed to gradient-based algorithms.

The interest of the co-evolution here is to encourage the complexity and diversity of both agents policies and environment structure.

General structure of the repo

At the root of the repo you can find some general files for the repo, as well as two config files for the game Labyrinth and Zelda. The rest of the ressources is divided into 5 folders:

  1. coevo: This is the main folder of this project. It contains all the classes involved in making this code work.

  2. figures: Contains useful figures and plots.

  3. save: Contains some neural networks worth saving. They are stored in a pytorch format allowing the use of torch.save() and torch.load().

  4. test: This is an important folder, as it contains a lot of scripts and unitary tests showing how the code works.

  5. videos: Contains a few interesting videos of experiment results.

Installation

  1. Install Griddly
  2. Run pip install coevo in a terminal

Agents

An agent in this project represents the character in Zelda, and is driven by a neural network. More precisely, this neural network is made of 3 elements: a convolutional part, a recurrent layer and a final dense layer. Its weights are evolved through a canonical algorithm, depending of the fitness value the agent gets each run.

At the moment, a run consists in Individual.nb_steps_max = 100 actions. Each one is defined by a call to the associated NN (get_result()) and is composed of a type of action and its direction. The input of the NN is the whole map.

The fitness value is computed at the end of each game, and is the sum of the rewards for each step. If a single agent plays more than one game, then the fitness value is the sum of all the game rewards. A population's size depends on the size of the genes. So, a population of agents is around 50 individuals, but would be higher if the NN was bigger.

Environnements

An environment corresponds to the map of a Zelda level. In this project, it is generated by a cellular automaton driven by a simple neural network (EnvCell()) in place of a fixed set of rules. That way, it can be evolved with the same algorithm as the agents.

For the time being, the level size is fixed by EnvInd.height and EnvInd.width which determines the size of the CA. To update (evolve()) the CA, if it is empty then a key (of value 1) is put in the middle of the grid allowing the CA to start the process. If it's not, then the NN will be called on each cell to get its future value with a map of the closest neighbors as an input.

The encoding on the CA side is ordinal: to each cell of the automaton corresponds a value, representing a game element. On griddly's side, elements are represented by characters which means that a conversion is necessary (env_to_string()). On a side note, an agent is not put on the map by the CA, and is instead put manually at the end of the map generation to be sure there is one and only one agent.

As of today, an environment fitness value corresponds to the difference between the maximum score and the minimum score that agents did on it. So, it is more suited to a co-evolution than a classic evolution.

Also, a population comprises around 20 environments due to the NN small size.

Co-evolution and population

To manage co-evolution more easily, a dedicated class has been created (Population()). It creates an evolutionary strategy population and possesses some useful functions that evolve, evaluate, play and generally perform operations on a whole population. Also, it has a save function allowing to store interesting individuals.

The general process of one co-evolution cycle in this context is:

  1. Generate both populations
  2. Pre-evolve one population or the other (not mandatory)
  3. Eliminate unwanted environments (categorized as unplayable)
  4. Run agents on the remaining environments
  5. Evaluate and evolve

Elimination criteria (is_bad_env()) are quite simple and arbitrary. At the moment, a map should contain between 10% and 50% of walls, and other objects should not cover more than 50% of the environment.

Future improvements

Fitness functions and the evolutionary strategy chosen (canonical) will potentially be changed in the future. Also, this project might be mixed with the work done on SoKoEvolution.

About

Official implementation of "Coevolution of neural networks for agents and environments".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages