🤖 AI Educator RL Dashboard

A completely self-contained, interactive Mini Reinforcement Learning (RL) simulation built for Hackathons and educational demonstrations. This project provides a real-time, highly visual Streamlit dashboard where users can watch an AI agent iteratively learn to solve a 1D grid world from scratch.

✨ Features

Custom RL Environment: A 1D boundary-contained framework styled akin to OpenAI Gymnasium (🟥 🤖 🟩 🏁). Features built-in penalty and reward systems.
Deep RL Agent (REINFORCE): Implements PyTorch-based Policy Gradient (REINFORCE) algorithm with dynamic Epsilon-Greedy exploration parameters.
Live "Educator" Commentary: The dashboard explicitly surfaces Neural Network probability matrices to the frontend at every step. It highlights how the agent is thinking and why it made its specific moves.
Dynamic Training Metrics: Tracks multi-episode averages, visual plotting updates, and progressive 'phase' indicators (Exploration > Learning > Mastery).

📁 Repository Structure

RL-Pytorch/
│
├── streamlit_app.py   # Primary dashboard UI & Streamlit frontend execution
├── train.py           # Alternate CLI/headless training pipeline
├── agent.py           # Core Policy Gradient algorithm & Action Extractor logic
├── model.py           # PyTorch Multi-layer Perceptron (Policy Network)
├── custom_env.py      # The custom 1D grid interaction environment rules engine
├── config.py          # Centralized Global Configuration and Hyperparameters
├── utils.py           # Supplemental chart & logging helper functions
└── README.md          # Project roadmap

🚀 Running Locally

Assuming you have python and standard data science libraries installed, running this project is a breeze.

All UI controls are baked into the system visually, completely averting command-line fiddling for non-technical evaluators.

1. Install Core Dependencies

pip install torch numpy pandas matplotlib streamlit gymnasium

2. Start the AI Dashboard

streamlit run streamlit_app.py

🧠 What The Architecture Does

The MiniGridEnv forces the Agent to start at coordinate 0.
The Policy Neural Network evaluates the state natively. In early episodes, random "Exploration" takes over.
If it hits the start boundary (wall), the step yields a -5 penalty. If it takes a standard step, it yields a general -1 timeout drain. When it reaches the Flag, it is granted a +10 reward and the simulation halts.
Using Gradient Ascent, the Pytorch agent isolates the highest reward pathways backwards via discounted cumulative gains and rewires the probability bias locally.
You instantly witness the improvement curve visually over X episodes live in the browser dynamically!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AI Educator RL Dashboard

✨ Features

📁 Repository Structure

🚀 Running Locally

🧠 What The Architecture Does

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
config.py		config.py
custom_env.py		custom_env.py
model.py		model.py
streamlit_app.py		streamlit_app.py
train.py		train.py
training_progress.png		training_progress.png
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Educator RL Dashboard

✨ Features

📁 Repository Structure

🚀 Running Locally

🧠 What The Architecture Does

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages