Distributional reinforcement learning for inventory management in multi-echelon supply chains

Abstract

Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.

Publication
Digital Chemical Engineering
Miguel Ángel de Carvalho Servia
Miguel Ángel de Carvalho Servia
Automated Knowledge Discovery Methods in Reaction Engineering

Miguel Carvalho is a chemical engineering PhD candidate at Imperial College London, affiliated with the EPSRC CDT in Next Generation Synthesis and Reaction Technology. He earned an MEng in chemical engineering from The University of Manchester (2021) and an MRes in advanced molecular synthesis from Imperial College London (2022). His research intersects data science, chemical engineering, and chemistry, concentrating on algorithmic solutions for automated knowledge discovery in reaction engineering and catalysis.