Skip to content

oliii20/PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

PPO

一种策略梯度算法 learning PPO P8 https://www.bilibili.com/video/BV18H4y167SD?p=9&spm_id_from=pageDriver&vd_source=cf16088b7296d0c8d01e3b00cbd71a9e 首先: 强化学习的基本要素:环境、智能体、当前环境或智能体的状态、动作等

1f87d3da8e477ff5defec28332f3c23 98ba9660d0f059edb533aa016a8d73a 8981ac227b17c1249cb8095fd4c3355 761d742bb10f3f85774cff4ac2c502f 2803fd64211087f01f8c5963b0e21ef a006ab30951b57eccce19b680f3f3d4 核心思想:是对一般的策略梯度算法的改进,因为on policy每一批数据(一个回合)迭代一次,训练起来太慢。off policy找一个网络参数的替身,达到同一批数据能够多次训练网络。 组成:一个Actor、critic网络。actor输出最优策略,critic输出value判断网络好坏。

About

learing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published