Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
13.09. Implementing PPO-Clipped Method.ipynb		13.09. Implementing PPO-Clipped Method.ipynb
README.md		README.md

README.md

13. TRPO, PPO and ACKTR Methods

13.1 Trust Region Policy Optimization
13.2. Math Essentials
- 13.2.1. Taylor series
- 13.2.2. Trust Region method
- 13.2.3. Conjugate Gradient Method
- 13.2.4. Lagrange Multiplier
- 13.2.5. Importance Sampling
13.3. Designing the TRPO Objective Function
- 13.3.1. Parameterizing the Policy
- 13.3.2. Sample Based Estimation
13.4. Solving the TRPO Objective Function
- 13.4.1. Computing the Search Direction
- 13.4.2. Perform Line Search in the Search Direction
13.5. Algorithm - TRPO
13.6. Proximal Policy Optimization
13.7. PPO with Clipped Objective
- 13.8. Algorithm - PPO-Clipped
13.9. Implementing PPO-Clipped Method
13.10. PPO with Penalized Objective
- 13.10.1. Algorithm - PPO-Penalty
13.11. Actor Critic using Kronecker Factored Trust Region
13.12. Math Essentials
- 13.12.1. Block Matrix
- 13.12.2. Block Diagonal Matrix
- 13.12.3. Kronecker Product
- 13.12.4. Vec Operator
- 13.12.5. Properties of Kronecker Product
13.13. Kronecker-Factored Approximate Curvature (K-FAC)
13.14. K-FAC in Actor Critic
- 13.14.1 Incorporating Trust Region