Skip to content

Latest commit

 

History

History

13. TRPO, PPO and ACKTR Methods

13. TRPO, PPO and ACKTR Methods

  • 13.1 Trust Region Policy Optimization
  • 13.2. Math Essentials
    • 13.2.1. Taylor series
    • 13.2.2. Trust Region method
    • 13.2.3. Conjugate Gradient Method
    • 13.2.4. Lagrange Multiplier
    • 13.2.5. Importance Sampling
  • 13.3. Designing the TRPO Objective Function
    • 13.3.1. Parameterizing the Policy
    • 13.3.2. Sample Based Estimation
  • 13.4. Solving the TRPO Objective Function
    • 13.4.1. Computing the Search Direction
    • 13.4.2. Perform Line Search in the Search Direction
  • 13.5. Algorithm - TRPO
  • 13.6. Proximal Policy Optimization
  • 13.7. PPO with Clipped Objective
    • 13.8. Algorithm - PPO-Clipped
  • 13.9. Implementing PPO-Clipped Method
  • 13.10. PPO with Penalized Objective
    • 13.10.1. Algorithm - PPO-Penalty
  • 13.11. Actor Critic using Kronecker Factored Trust Region
  • 13.12. Math Essentials
    • 13.12.1. Block Matrix
    • 13.12.2. Block Diagonal Matrix
    • 13.12.3. Kronecker Product
    • 13.12.4. Vec Operator
    • 13.12.5. Properties of Kronecker Product
  • 13.13. Kronecker-Factored Approximate Curvature (K-FAC)
  • 13.14. K-FAC in Actor Critic
    • 13.14.1 Incorporating Trust Region