Skip to content

SCLP-Association/SCIMAI-Gym

 
 

Repository files navigation

SCIMAI-Gym

arXiv made-with-python PEP8 GitHub license PRs Welcome

Authors Information

TITLE: SCIMAI Gym
AUTHORS: Francesco Stranieri
INSTITUTION: University of Milano-Bicocca/Polytechnic University of Turin
EMAIL: francesco.stranieri@unimib.it

Requirements

To install (and import) necessary libraries, run the section:

Environment Setup

The code was tested on:

Supply Chain Environment

To set up the Supply Chain Environment, run the section:

Reinforcement Learning Classes

📋 To change the configuration of the Supply Chain Environment (such as the number of product types, the number of distribution warehouses, costs, or capacities), edit the sub-section:

Supply Chain Environment Class

📋 To change the global parameters (such as the seed for reproducibility, the number of episodes for the simulations, or the dir to save plots), edit (and run) the section:

Global Parameters

Then to initialize the Supply Chain Environment effectively, run the section:

Supply Chain Environment Initialization

❗️ The output of this section will have the following format (verify that the values are the same as the ones you defined):

--- SupplyChainEnvironment --- __init__
product_types_num is 1
distr_warehouses_num is 1
T is 25
d_max is [10]
d_var is [2]
sale_prices is [15]
production_costs is [5]
storage_capacities is [[ 5]
 [10]]
storage_costs is [[2]
 [1]]
transportation_costs is [[0.25]]
penalty_costs is [22.5]

Finally, to have some fundamental methods (such as the operational simulator or the plotting methods), run the section:

Methods

Baselines

To assess the DRL algorithms' performance, we established two different baselines. To initialize the Oracle and the (s, Q)-policy, run the sections:

Oracle
(s, Q)-Policy Class
(s, Q)-Policy Config [Ax]

📋 To change the (s, Q)-policy parameters (such as the total trials for the optimization or the number of episodes for each trial), edit the sub-section:

Parameters [Ax]

Finally, to have some fundamental methods (such as the methods for the Bayesian Optimization (BO) training or the plotting methods), run the section:

(s, Q)-Policy Methods [Ax]

Train BO Agents

To train the BO agents, run the section:

(s, Q)-Policy Optimize [Ax]

DRL Config

To change the DRL algorithms' parameters (such as the training episodes or the grace period for the ASHA scheduler), edit (and run) the sub-section:

Parameters [Tune]

📋 To change the DRL algorithms' hyperparameters (such as the neural network structure, the learning rate, or the batch size), edit (and run) the sub-sections:

Algorithms [Tune]
A3C Config [Tune]
PG Config [Tune]
PPO Config [Tune]

Finally, to have some fundamental methods (such as the methods for the DRL agents' training or the plotting methods), run the section:

Reinforcement Learning Methods [Tune]

Train DRL Agents

To train the DRL agents, run the section:

Reinforcement Learning Train Agents [Tune]

❗️ We upload the checkpoints of the best training instance for each approach and experiment, which can be used as a pre-trained model. For example, the checkpoint related to the Exp 1 of the 1P3W scenario for the A3C algorithm is available on /Paper_Results_1P3W/1P3W/Exp_1/1P3W_2021-09-22_15-55-24/ray_results/A3C_2021-09-22_19-56-24/A3C_SupplyChain_2a2cf_00024_24_grad_clip=20.0,lr=0.001,fcnet_hiddens=[64, 64],rollout_fragment_length=100,train_batch_size=2000_2021-09-22_22-34-50/checkpoint_000286/checkpoint-286.

Results

To output (and save) the performance (in terms of cumulative profit) and the training time (in minutes) of the DRL algorithms, run the section:

Final Results

❗️ We save the plots of the best training instance for each approach and experiment. For example, the plots related to the Exp 1 of the 1P3W scenario are available on /Paper_Results_1P3W/1P3W/Exp_1/1P3W_2021-09-22_15-55-24/plots.

The results obtained should be comparable with those in the paper. For example, for the 1P1W scenario, we achieve the following performance:

A3C PPO VPG BO Oracle
Exp 1 870±67 1213±68 885±66 1226±71 1474±45
Exp 2 1066±94 1163±66 1100±77 1224±60 1289±68
Exp 3 −36±74 195±43 12±61 101±50 345±18
Exp 4 1317±60 1600±62 883±95 1633±39 2046±37
Exp 5 736±45 838±58 789±51 870±67 966±55

About

A Deep Reinforcement Learning Approach to Supply Chain Inventory Management

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%