Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 1 补充材料的一个小问题 #97

Open
hkr04 opened this issue Jul 21, 2024 · 1 comment
Open

Chapter 1 补充材料的一个小问题 #97

hkr04 opened this issue Jul 21, 2024 · 1 comment

Comments

@hkr04
Copy link

hkr04 commented Jul 21, 2024

这里 $Q_\pi(s_t^n, a_t^n)$ 期望形式中的上标应该是 $l-t$ 而不是 $l$,因为 $Q_\pi(s_t^n, a_t^n)$ 是从时间步 $t$ 的角度进行累积的,外面已经乘上了对于时间步 $0$ 而言的折扣因子,不应该重复做折扣。
UV 85(LGMVWUQ$TZ40BS_MI

@puyuan1996
Copy link
Contributor

感谢你的指出!你是对的,Q^{\pi}(s_{n,t}, a_{n,t}) 定义中的上标确实应该是 l - t,而不是 l。因为 Q^{\pi}(s_{n,t}, a_{n,t}) 表示的是从时间步 t 开始的累积回报, 在第t步 reward的折扣应该是1。我们将会尽快修正相关内容,再次感谢你的细心反馈!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants