--

Thank you!

Deep Deterministic Policy Gradient (DDPG) works with deterministic policies in continuous environments, essentially assuming the Q-value is differentiable:

https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Additionally, you might be interested in policy perturbation methods. Here, you add noise to policy parameters, allowing to measure performance differences between 'variants' of the policy at hand.

--

--

Wouter van Heeswijk, PhD
Wouter van Heeswijk, PhD

Written by Wouter van Heeswijk, PhD

Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.

No responses yet