GitHub - Estabraq-makiyah/Estabraq: Self-Healing Intent-Based Networking A Reinforcement Learning Framework for Fault Recovery in SDN

Self-Healing Intent-Based Networking A Reinforcement Learning Framework for Fault Recovery in SDN This paper presents a reinforcement learning-based self-healing framework for Software-Defined Networking (SDN) that autonomously manages diverse network faults in a realistically emulated environment. A Mininet testbed controlled by the Faucet SDN controller is instrumented with Prometheus to collect multi-source telemetry, while an automated fault injector and congestion generator produce link, port, flow and controller events alongside UDP-induced bottlenecks to create rich training data. Network features—including controller CPU and memory usage, OpenFlow statistics, port status and explicit fault labels—are periodically scraped and aggregated into a structured dataset that forms the state space of a custom Gym-compatible environment. A Proximal Policy Optimisation (PPO) agent with a multilayer perceptron policy learns discrete self-healing actions such as no-op, port resets, switch restarts and bespoke recovery procedures, guided by a reward function that penalises persistent faults and unnecessary interventions while rewarding timely and appropriate recovery. Experimental evaluation over multiple PPO training runs shows stable optimisation behaviour and high episodic rewards with long episode lengths, indicating that the agent successfully distinguishes healthy from faulty conditions and maintains effective long-term fault management. Compared with existing RL-based approaches that focus primarily on link failure recovery or service function chain reconfiguration, the proposed framework handles a broader spectrum of SDN fault types and integrates control-plane, data-plane and congestion indicators, thereby offering a more general and robust self-healing capability for operational SDN environments

The codes include two IBN minient topologies with injected faults. It should be run for a while along with both rl_training and feature_collection codes to train the reinforcement learning agent.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Python‑based IoT–edge–cloud MEC simulator.ipynb		Python‑based IoT–edge–cloud MEC simulator.ipynb
Python‑based_IoT–edge–cloud_MEC_simulator.ipynb		Python‑based_IoT–edge–cloud_MEC_simulator.ipynb
Python‑based_IoT–edge–cloud_MEC_simulator_Senario_1_moderate_load.ipynb		Python‑based_IoT–edge–cloud_MEC_simulator_Senario_1_moderate_load.ipynb
Python‑based_IoT–edge–cloud_MEC_simulator_Senario_2_heavy_load.ipynb		Python‑based_IoT–edge–cloud_MEC_simulator_Senario_2_heavy_load.ipynb
README.md		README.md
baseline_comparison.py		baseline_comparison.py
collect_rl_features.py		collect_rl_features.py
redundant_faucet_topo.py		redundant_faucet_topo.py
rl_training2.py		rl_training2.py
rl_training2_time.py		rl_training2_time.py
topo_fattree_16sw.py		topo_fattree_16sw.py
topo_linear_8sw.py		topo_linear_8sw.py
topo_with_congestion.py		topo_with_congestion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages