Hi Daniel, thanks for your comment. As the problem setting is a one-shot game, there is indeed no backpropagation of rewards over time (or technically, just one step). If you are looking for an code… - Wouter van Heeswijk, PhD - Medium

HI thanks for the post.
1
Daniel Tung
Wouter van Heeswijk, PhD
·Follow
Sep 14, 2022
--
Hi Daniel, thanks for your comment. As the problem setting is a one-shot game, there is indeed no backpropagation of rewards over time (or technically, just one step). If you are looking for an code example that includes a time horizon and backpropagation of rewards, the following article might interest you: https://towardsdatascience.com/cliff-walking-problem-with-the-discrete-policy-gradient-algorithm-59d1900d80d8
--
--
Written by Wouter van Heeswijk, PhD1.7K Followers
·36 Following
Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams