--

Thank you for the feedback, Jeff! Reading back, the article indeed glossed over that component rather quickly. I have made some adjustments and added more examples; I hope that clarifies the concept.

The artificial constraints are typically designed manually, reflecting aspects of the post-decision state arising after the decision. As such, the values \phi_f stem from some (linear) function (e.g., number of containers after shipping). The corresponding weights \theta_f are learned based on observations.

--

--

Wouter van Heeswijk, PhD
Wouter van Heeswijk, PhD

Written by Wouter van Heeswijk, PhD

Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.

No responses yet