Dec 20, 2022
Thank you!
Deep Deterministic Policy Gradient (DDPG) works with deterministic policies in continuous environments, essentially assuming the Q-value is differentiable:
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Additionally, you might be interested in policy perturbation methods. Here, you add noise to policy parameters, allowing to measure performance differences between 'variants' of the policy at hand.