Learning RL by implementing and analysing different RL methods from scratch.
| Directory | Game | Number of agents | RL method |
|---|---|---|---|
| nim-dqn | Nim-21 | 2 | Deep Q-network |
| nim-a2c | Nim-21 | 2 | Advantage Actor Critic |
| matching-pennies-a2c | Matching Pennies | 2 | Advantage Actor Critic |
| snake-a2c | Snake | 1 | Advantage Actor Critic |
| snake-ppo | Snake | 1 | Proximal Policy Optimisation |
I'm also using this project to learn more about MLFlow. Some of the train scripts depend on an actively running tracking server. Please check MLFlow documentation on how to start a tracking server and set the MLFLOW_URI environment variable to the correct tracking server URL.

