From e7b9203cedf689946ea4fe0e314aa223b8692612 Mon Sep 17 00:00:00 2001 From: Prajyot Jadhav <92448515+Arcane-01@users.noreply.github.com> Date: Sat, 30 Dec 2023 22:05:10 +0530 Subject: [PATCH] Added notes on Learning high-speed flight in Reinforcement Learning --- reinforcement_learning/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/reinforcement_learning/README.md b/reinforcement_learning/README.md index 0ce8e21..ddc4a80 100644 --- a/reinforcement_learning/README.md +++ b/reinforcement_learning/README.md @@ -2,6 +2,7 @@ | Paper | Notes | Author | Summary | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------:|:---------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| [Learning high-speed flight in the wild](https://www.science.org/doi/full/10.1126/scirobotics.abg5810) | [HackMD](https://hackmd.io/@Arcane-01/H1cvyQMwT) | [Prajyot](https://github.com/Arcane-01) |This paper presents an end-to-end approach using privileged learning to enable high-speed autonomous flight for quadrotors in complex, real-world environments by directly mapping noisy sensory observations to collision free trajectories.| | [DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION](https://arxiv.org/pdf/1912.01603.pdf) (ICLR '20) | [HackMD](https://hackmd.io/@iGBkTz2JQ2eBRM83nuhCuA/Hk9dpK0vd) | [Raj](https://github.com/RajGhugare19) |This paper focuses to learn long-horizon behaviors by propagating analytic value gradients through imagined trajectories using a recurrent state space model (PlaNet, haffner et al) | | [The Value Equivalence Principle for Model-Based Reinforcement Learning](https://arxiv.org/abs/2011.03506) (NeurIPS '20) | [HackMD](https://hackmd.io/@Raj-Ghugare/HkEY6o9MP) | [Raj](https://github.com/RajGhugare19) |This paper introduces and studies the concept of equivalence for Reinforcement Learning models with respect to a set of policies and value functions. It further shows that this principle can be leveraged to find models constrained by representational capacity, which are better than their maximum likelihood counterparts. | | [Stackelberg Actor-critic: A game theoretic perspective](https://hackmd.io/@FtbpSED3RQWclbmbmkChEA/rJFUQA1QO) | [HackMD](https://hackmd.io/@FtbpSED3RQWclbmbmkChEA/rJFUQA1QO) | [Sharath](https://sharathraparthy.github.io/) | This paper formulates the interaction between the actor and critic ans a stackelberg games and leverages the implicit function theorem to calculate the accurate gradient updates for actor and critic. | @@ -17,4 +18,4 @@ |[Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/pdf/1710.02298.pdf)|[HackMD](https://hackmd.io/@HnlvODbMQIiAlpHchdZpDQ/BkYl3IkaK)|[Om](https://github.com/DigZator)| The paper discusses add-ons to the DQN and A3C that can improve their performance, namely Double DQN, Prioritized Experience Replay, Dueling Network Architecture, Distributional Q-Learning, Noisy DQN. | | [The Option-Critic Architecture](https://arxiv.org/abs/1609.05140) | [HackMD](https://hackmd.io/@HnlvODbMQIiAlpHchdZpDQ/SyI7nv7_q) | [Om](https://github.com/DigZator) | Paper discusses the hierarchical reinforcement learning method implimentation based on temporal abstractions. | | [Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets](https://offline-rl-neurips.github.io/pdf/13.pdf) | [HackMD](https://hackmd.io/@HnlvODbMQIiAlpHchdZpDQ/rkxHo6LL5) | [Om](https://github.com/DigZator) | The paper suggests and provides experimental justification for methods to tackle Distribution Shift. | -| [FeUdal Networks for Hierarchical Reinforcement Learning](https://arxiv.org/abs/1703.01161) | [HackMD](https://hackmd.io/@HnlvODbMQIiAlpHchdZpDQ/HJoIiDw_c) | [Om](https://github.com/DigZator) | This paper describes the FeUdal Network model. Employs a manager-worker hierarchy. | \ No newline at end of file +| [FeUdal Networks for Hierarchical Reinforcement Learning](https://arxiv.org/abs/1703.01161) | [HackMD](https://hackmd.io/@HnlvODbMQIiAlpHchdZpDQ/HJoIiDw_c) | [Om](https://github.com/DigZator) | This paper describes the FeUdal Network model. Employs a manager-worker hierarchy. |