We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这里 $Q_\pi(s_t^n, a_t^n)$ 期望形式中的上标应该是 $l-t$ 而不是 $l$,因为 $Q_\pi(s_t^n, a_t^n)$ 是从时间步 $t$ 的角度进行累积的,外面已经乘上了对于时间步 $0$ 而言的折扣因子,不应该重复做折扣。
The text was updated successfully, but these errors were encountered:
感谢你的指出!你是对的,Q^{\pi}(s_{n,t}, a_{n,t}) 定义中的上标确实应该是 l - t,而不是 l。因为 Q^{\pi}(s_{n,t}, a_{n,t}) 表示的是从时间步 t 开始的累积回报, 在第t步 reward的折扣应该是1。我们将会尽快修正相关内容,再次感谢你的细心反馈!
Sorry, something went wrong.
No branches or pull requests
这里$Q_\pi(s_t^n, a_t^n)$ 期望形式中的上标应该是 $l-t$ 而不是 $l$ ,因为 $Q_\pi(s_t^n, a_t^n)$ 是从时间步 $t$ 的角度进行累积的,外面已经乘上了对于时间步 $0$ 而言的折扣因子,不应该重复做折扣。
The text was updated successfully, but these errors were encountered: