Reinforcement Learning with Python will help you to master basic reinforcement learning algorithms to the advanced deep reinforcement learning algorithms.
The book starts with an introduction to Reinforcement Learning followed by OpenAI and Tensorflow. You will then explore various RL algorithms and concepts such as the Markov Decision Processes, Monte-Carlo methods, and dynamic programming, including value and policy iteration. This example-rich guide will introduce you to deep learning, covering various deep learning algorithms. You will then explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will master various deep reinforcement learning algorithms such as DQN, Double DQN. Dueling DQN, DRQN, A3C, DDPG, TRPO, and PPO. You will also learn about recent advancements in reinforcement learning such as imagination augmented agents, learn from human preference, DQfD, HER and many more.
The book is also translated into chinese and you can get it from here (这本书也被翻译成中文,你可以从这里得到它):https://item.jd.com/12506442.html
- 1.1. What is Reinforcement Learning?
- 1.2. Reinforcement Learning Cycle
- 1.3. How RL differs from other ML Paradigms?
- 1.4. Elements of Reinforcement Learning
- 1.5. Agent Environment Interface
- 1.6. Types of RL Environments
- 1.7. Reinforcement Learning Platforms
- 1.8. Applications of Reinforcement Learning
- 2.1. Setting Up Your Machine
- 2.2. Installing Anaconda
- 2.3. Installing Docker
- 2.4. Installing OpenAI Gym and Universe
- 2.5. Common Error Fixes
- 2.6. OpenAI Gym
- 2.7. Basic Simulations
- 2.8. Training a Robot to walk
- 2.9. Building a Video Game Bot
- 2.10. Tensorflow Fundamentals
- 2.11. Tensorboard
- 3.1. Markov Chain and Markov Process
- 3.2. Markov Decision Process
- 3.3. Rewards and Returns
- 3.4. Episodic and Continous Tasks
- 3.5. Policy Function
- 3.6. State Value Function
- 3.7. State-Action Value Function (Q Function)
- 3.8. Bellman Equation and Optimality
- 3.9. Deriving Bellman Equation for Value and Q functions
- 3.10. Solving the Bellman Equation
- 3.11. Dynamic Programming
- 3.12. Solving Frozen Lake Problem using Value Iteration
- 3.13. Solving Frozen Lake Problem using Policy Iteration
- 4.1. Monte Carlo Methods
- 4.2. Estimating Value of Pi Using Monte Carlo
- 4.3. Monte Carlo Prediction
- 4.4. First visit Monte Carlo
- 4.5. Every visit Monte Carlo
- 4.6. BlackJack with Monte Carlo
- 4.7. Monte Carlo Control
- 4.8. Monte Carlo Exploration Starts
- 4.9. On Policy Monte Carlo Control
- 4.10. Off Policy Monte Carlo Control
- 5.1. Temporal Difference Learning
- 5.2. TD Prediction
- 5.3. TD Control
- 5.4. Q Learning
- 5.5. Solving the Taxi Problem using Q learning
- 5.6. SARSA
- 5.7. Solving the Taxi Problem using SARSA
- 5.8. Difference Between Q learning and SARSA
- 6.1. Multi-armed Bandit Problem
- 6.2. Epsilon-Greedy Algorithm
- 6.3. Softmax Exploration Algorithm
- 6.4. Upper Confidence Bound Algorithm
- 6.5. Thompson Sampling Algorithm
- 6.6. Applications of MAB
- 6.7. Identifying Right Advertisement Banner Using MAB
- 6.8. Contextual Bandits
- 7.1. Artificial Neurons
- 7.2. Artificial Neural Network
- 7.3. Activation Functions
- 7.4. Deep Dive into ANN
- 7.5. Gradient Descent
- 7.6. Neural Networks in Tensorflow
- 7.7. Recurrent Neural Network
- 7.8. Backpropagation Through Time
- 7.9. Long Short Term Memory RNN
- 7.10. Generating Song Lyrics using LSTM RNN
- 7.11. Convolutional Neural Networks
- 7.12. CNN Architecture
- 7.13. Classifying Fashion Products Using CNN
- 8.1. What is Deep Q network
- 8.2. Architecture of DQN
- 8.3. Convolutional Network
- 8.4. Experience Replay
- 8.5. Target Network
- 8.6. Clipping Rewards
- 8.7. DQN Algorithm
- 8.8. Building an Agent to Play Atari Games
- 8.9. Double DQN
- 8.10. Dueling Architecture
- 9.1. Deep Recurrent Q Network
- 9.2. Partially Observable MDP
- 9.3. Architecture of DRQN
- 9.4. Basic Doom Game
- 9.5. Build an Agent to Play Doom Game using DRQN
- 9.6. Deep Attention Recurrent Q Network
- 10.1. Asynchronous Actor Critic Algorithm
- 10.2. The three A's
- 10.3. Architecture of A3C
- 10.4. Working of A3C
- 10.5. Drive up the Mountain with A3C
- 10.6. Visualization in Tensorboard
- 11.1. Policy Gradient
- 11.2. Lunar Lander Using Policy Gradient
- 11.3. Deep Deterministic Policy Gradient
- 11.4. Swinging up the Pendulum using DDPG
- 11.5. Trust Region Policy Optimizatio
- 11.6. Proximal Policy Optimization
- 12.1. Environment Wrapper Functions
- 12.2. Dueling Network
- 12.3. Replay Buffer
- 12.4. Training the Network
- 12.5. Car Racing
- 13.1. Imagination Augmented Agents
- 13.2. Learning From Human Preference
- 13.3. Deep Q Learning From Demonstrations
- 13.4. Hindsight Experience Replay
- 13.5. Hierarchical Reinforcement Learning
- 13.6. Inverse Reinforcement Learning