Skip to content

Updated-daily/Over-200-Papers/CoT/VLM/Memory/Grounding/Human-Intelligence/Prompt/Reasoning/Robot/Agent/Planning/Reinforcement Learning/Feedback/In-Context-Learning/Instruction-Tuning/PEFT/RLHF/VLM/RAG/Embodied/VQA

Notifications You must be signed in to change notification settings



Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


Static Badge Static Badge GitHub Repo stars

We provide awesome papers and repos on very comprehensive topics as follows.

CoT / VLM / Quantization / Grounding / Text2IMG&VID / Prompt Engineering / Reasoning / Robot / Agent / Planning / Reinforcement-Learning / Feedback / In-Context-Learning / InstructionTuning / PEFT / RLHF / RAG / Embodied / VQA / Hallucination / Diffusion / Scaling / Context-Window / WorldModel / Memory / Zero-Shot / RoPE / Speech / Perception / Survey / Segmentation / Learge Action Model / Foundation / RoPE / LoRA

We strongly recommend checking our Notion table for interactive experience.


Number of papers and repos in total: 443

Category Title Links Date
3D, GPT4, VLM GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation ArXiv
3D, Open-source, Perception, Robot 3D-LLM: Injecting the 3D World into Large Language Models ArXiv 2023/07/24
AGI, Agent OpenAGI: When LLM Meets Domain Experts ArXiv, GitHub 2023/04/10
AGI, Awesome Repo, Survey Awesome-LLM-Papers-Toward-AGI GitHub
AGI, Brain When Brain-inspired AI Meets AGI
AGI, Brain Divergences between Language Models and Human Brains
AGI, Survey Levels of AGI: Operationalizing Progress on the Path to AGI
APIs, Agent, Tool Gorilla: Large Language Model Connected with Massive APIs ArXiv
Action-Generation, Generation, Prompting Prompt a Robot to Walk with Large Language Models
Action-Model, Agent, LAM LaVague GitHub
Agent LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem ArXiv
Agent AIOS: LLM Agent Operating System ArXiv
Agent Cognitive Architectures for Language Agents ArXiv
Agent PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Agent AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Agent ScreenAgent: A Vision Language Model-driven Computer Control Agent
Agent swarms GitHub
Agent Agents: An Open-source Framework for Autonomous Language Agents
Agent MindAgent: Emergent Gaming Interaction
Agent InfiAgent: A Multi-Tool Agent for AI Operating Systems
Agent Predictive Minds: LLMs As Atypical Active Inference Agents
Agent XAgent: An Autonomous Agent for Complex Task Solving
Agent LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
Agent AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors ArXiv
Agent Agents: An Open-source Framework for Autonomous Language Agents ArXiv, GitHub
Agent AutoAgents: A Framework for Automatic Agent Generation GitHub
Agent DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines ArXiv
Agent AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Agent CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society
Agent XAgent: An Autonomous Agent for Complex Task Solving ArXiv
Agent Generative Agents: Interactive Simulacra of Human Behavior ArXiv
Agent LLM+P: Empowering Large Language Models with Optimal Planning Proficiency ArXiv 2023/04/22
Agent AgentSims: An Open-Source Sandbox for Large Language Model Evaluation ArXiv 2023/08/08
Agent, Awesome Repo Awesome LLM-Powered Agent GitHub
Agent, Awesome Repo LLM Agents Papers GitHub
Agent, Awesome Repo Awesome Large Multimodal Agents GitHub
Agent, Awesome Repo Awesome-Papers-Autonomous-Agent GitHub
Agent, Awesome Repo Autonomous Agents GitHub
Agent, Awesome Repo Awesome AI Agents GitHub
Agent, Awesome Repo, Embodied, Grounding XLang Paper Reading GitHub
Agent, Awesome Repo, LLM CoALA: Awesome Language Agents GitHub
Agent, Awesome Repo, LLM Awesome-Embodied-Agent-with-LLMs GitHub
Agent, Blog LLM Powered Autonomous Agents ArXiv
Agent, Code-LLM TaskWeaver: A Code-First Agent Framework
Agent, Code-LLM, Code-as-Policies, Survey If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents ArXiv
Agent, Code-as-Policies Executable Code Actions Elicit Better LLM Agents ArXiv 2024/01/24
Agent, Embodied Embodied Task Planning with Large Language Models
Agent, Embodied Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Agent, Embodied Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld ArXiv
Agent, Embodied LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Agent, Embodied OpenAgents: An Open Platform for Language Agents in the Wild ArXiv, GitHub
Agent, Embodied, Robot OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
Agent, Embodied, Robot AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents ArXiv
Agent, Embodied, Survey Application of Pretrained Large Language Models in Embodied Artificial Intelligence ArXiv
Agent, End2End, Game, Robot An Interactive Agent Foundation Model ArXiv
Agent, Feedback, Reinforcement-Learning AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback ArXiv 2023/09/29
Agent, Feedback, Reinforcement-Learning, Robot Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models ArXiv 2023/11/04
Agent, GPT4, Web GPT-4V(ision) is a Generalist Web Agent, if Grounded
Agent, GUI SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Agent, GUI ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model GitHub
Agent, GUI CogAgent: A Visual Language Model for GUI Agents
Agent, GUI, MobileApp You Only Look at Screens: Multimodal Chain-of-Action Agents
Agent, GUI, MobileApp Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Agent, GUI, MobileApp AppAgent: Multimodal Agents as Smartphone Users
Agent, GUI, Web "What’s important here?": Opportunities and Challenges of Using LLMs in Retrieving Informatio from Web Interfaces
Agent, Instruction-Turning AgentTuning: Enabling Generalized Agent Abilities For LLMs ArXiv
Agent, LLM, Planning LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Agent, Memory, Minecraft JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models ArXiv 2023/11/10
Agent, Memory, RAG RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents ArXiv 2024/02/06
Agent, Minecraft Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory01
Agent, Minecraft S-Agents: Self-organizing Agents in Open-ended Environment
Agent, Minecraft Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Agent, Minecraft LARP: Language-Agent Role Play for Open-World Games
Agent, Minecraft Voyager: An Open-Ended Embodied Agent with Large Language Models ArXiv 2023/05/25
Agent, Minecraft Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents ArXiv 2023/02/03
Agent, Minecraft, Reinforcement-Learning RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds
Agent, MobileApp You Only Look at Screens: Multimodal Chain-of-Action Agents GitHub
Agent, Multi War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars ArXiv
Agent, Multimodal, Robot A Generalist Agent ArXiv 2022/05/12
Agent, Reasoning Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Agent, Reasoning, Zero-shot Agent Instructs Large Language Models to be General Zero-Shot Reasoners ArXiv 2023/10/05
Agent, Reinforcement-Learning Language Instructed Reinforcement Learning for Human-AI Coordination ArXiv 2023/04/13
Agent, Reinforcement-Learning Eureka: Human-Level Reward Design via Coding Large Language Models ArXiv 2023/10/19
Agent, Reinforcement-Learning Guiding Pretraining in Reinforcement Learning with Large Language Models ArXiv 2023/02/13
Agent, Reinforcement-Learning Language to Rewards for Robotic Skill Synthesis ArXiv 2023/06/14
Agent, Reinforcement-Learning, Reward EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL ArXiv 2022/06/20
Agent, Reinforcement-Learning, Reward Reward Design with Language Models ArXiv 2023/02/27
Agent, Reinforcement-Learning, Reward Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning ArXiv 2023/09/20
Agent, Soft-Dev Communicative Agents for Software Development GitHub
Agent, Soft-Dev MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Agent, Survey Large Multimodal Agents: A Survey
Agent, Survey Agent AI: Surveying the Horizons of Multimodal Interaction
Agent, Survey A Survey on LLM-based Autonomous Agents GitHub
Agent, Survey The Rise and Potential of Large Language Model Based Agents: A Survey ArXiv 2023/09/14
Agent, Survey A Survey on Large Language Model based Autonomous Agents ArXiv 2023/08/22
Agent, Tool ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Agent, Tool Gorilla: Large Language Model Connected with Massive APIs
Agent, Video-for-Agent Video as the New Language for Real-World Decision Making
Agent, Web OS-Copilot: Towards Generalist Computer Agents with Self-Improvement ArXiv
Agent, Web OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web ArXiv
Agent, Web WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Agent, Web WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Agent-Project, Code-LLM open-interpreter GitHub
Anything, CLIP, Perception SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Anything, Caption, Perception, Segmentation Segment and Caption Anything ArXiv
Anything, Depth Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Anything, Perception, Segmentation Segment Anything ArXiv
Audio Robust Speech Recognition via Large-Scale Weak Supervision
Audio2Video, Diffusion, Generation, Video EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Automate, Chain-of-Thought, Reasoning Automatic Chain of Thought Prompting in Large Language Models ArXiv 2022/10/07
Automate, Prompting Large Language Models Are Human-Level Prompt Engineers ArXiv 2022/11/03
Awesome Repo, Chain-of-Thought Chain-of-ThoughtsPapers GitHub
Awesome Repo, Chinese Awesome-Chinese-LLM GitHub
Awesome Repo, Compress Awesome LLM Compression GitHub
Awesome Repo, Diffusion Awesome-Diffusion-Models GitHub
Awesome Repo, Embodied Awesome Embodied Vision GitHub
Awesome Repo, Hallucination, Survey A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions ArXiv, GitHub
Awesome Repo, IROS, Robot IROS2023PaperList GitHub
Awesome Repo, In-Context-Learning Paper List for In-context Learning GitHub
Awesome Repo, Japanese, LLM 日本語LLMまとめ GitHub
Awesome Repo, Korean awesome-korean-llm GitHub
Awesome Repo, LLM Awesome-LLM GitHub
Awesome Repo, LLM, Leaderboard LLM-Leaderboard GitHub
Awesome Repo, LLM, Robot Everything-LLMs-And-Robotics GitHub
Awesome Repo, LLM, Survey Awesome-LLM-Survey GitHub
Awesome Repo, LLM, VLM Multimodal & Large Language Models GitHub
Awesome Repo, LLM, Vision LLM-in-Vision GitHub
Awesome Repo, Multimodal Awesome-Multimodal-LLM GitHub
Awesome Repo, Multimodal Awesome-Multimodal-Large-Language-Models GitHub
Awesome Repo, Package Awesome LLMOps GitHub
Awesome Repo, Perception, VLM Awesome Vision-Language Navigation GitHub
Awesome Repo, RLHF, Reinforcement-Learning Awesome RLHF (RL with Human Feedback) GitHub
Awesome Repo, Reasoning Awesome-Reasoning-Foundation-Models GitHub
Awesome Repo, Reasoning Awesome LLM Reasoning GitHub
Awesome Repo, Robot Awesome-LLM-Robotics GitHub
Awesome Repo, Survey LLMSurvey GitHub
Benchmark, GPT4 Sparks of Artificial General Intelligence: Early experiments with GPT-4
Benchmark, In-Context-Learning PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change ArXiv 2022/06/21
Benchmark, In-Context-Learning ARB: Advanced Reasoning Benchmark for Large Language Models ArXiv 2023/07/25
Benchmark, Sora, Text-to-Video LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models01
Brain A Neuro-Mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization
Brain LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Brain, Conscious Could a Large Language Model be Conscious? ArXiv 2023/03/04
Brain, Conscious Could a Large Language Model be Conscious? ArXiv 2023/03/04
Brain, Instruction-Turning Instruction-tuning Aligns LLMs to the Human Brain
CRAG, RAG Corrective Retrieval Augmented Generation ArXiv
Caption, VLM, VQA Caption Anything: Interactive Image Description with Diverse Multimodal Controls ArXiv 2023/05/04
Chain-of-Thought, Code-as-Policies Chain of Code: Reasoning with a Language Model-Augmented Code Emulator ArXiv
Chain-of-Thought, Code-as-Policies,
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought ArXiv
Chain-of-Thought, Embodied, PersonalCitation,
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought ArXiv 2023/05/24
Chain-of-Thought, Embodied, Robot EgoCOT: Embodied Chain-of-Thought Dataset for Vision Language Pre-training
Chain-of-Thought, GPT4, Reasoning, Robot Look Before You Leap: Unveiling the Power ofGPT-4V in Robotic Vision-Language Planning ArXiv 2023/11/29
Chain-of-Thought, In-Context-Learning Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding ArXiv
Chain-of-Thought, In-Context-Learning Reasoning with Language Model is Planning with World Model ArXiv 2023/05/24
Chain-of-Thought, In-Context-Learning Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models ArXiv 2023/05/06
Chain-of-Thought, In-Context-Learning Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations ArXiv 2022/05/24
Chain-of-Thought, In-Context-Learning PAL: Program-aided Language Models ArXiv 2022/11/18
Chain-of-Thought, In-Context-Learning Self-Refine: Iterative Refinement with Self-Feedback ArXiv 2023/03/30
Chain-of-Thought, In-Context-Learning Complexity-Based Prompting for Multi-Step Reasoning ArXiv 2022/10/03
Chain-of-Thought, In-Context-Learning Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models ArXiv 2023/08/20
Chain-of-Thought, In-Context-Learning Least-to-Most Prompting Enables Complex Reasoning in Large Language Models ArXiv 2022/05/21
Chain-of-Thought, In-Context-Learning, Self Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement ArXiv 2023/05/23
Chain-of-Thought, In-Context-Learning, Self Measuring and Narrowing the Compositionality Gap in Language Models ArXiv 2022/10/07
Chain-of-Thought, Planning, Reasoning SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning ArXiv 2023/08/01
Chain-of-Thought, Prompting Chain-of-Thought Reasoning Without Prompting
Chain-of-Thought, Reasoning Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
Chain-of-Thought, Reasoning Multimodal Chain-of-Thought Reasoning in Language Models ArXiv 2023/02/02
Chain-of-Thought, Reasoning Self-Consistency Improves Chain of Thought Reasoning in Language Models ArXiv 2022/03/21
Chain-of-Thought, Reasoning Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding ArXiv 2023/07/28
Chain-of-Thought, Reasoning Rethinking with Retrieval: Faithful Large Language Model Inference ArXiv 2022/12/31
Chain-of-Thought, Reasoning Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance ArXiv 2023/05/26
Chain-of-Thought, Reasoning Tree of Thoughts: Deliberate Problem Solving with Large Language Models ArXiv 2023/05/17
Chain-of-Thought, Reasoning Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework ArXiv 2023/05/05
Chain-of-Thought, Reasoning Chain-of-Thought Prompting Elicits Reasoning in Large Language Models ArXiv 2022/01/28
Chain-of-Thought, Reasoning, Survey Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters ArXiv 2023/12/20
Chain-of-Thought, Reasoning, Survey A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future ArXiv 2023/09/27
Chain-of-Thought, Reasoning, Table Chain-of-table: Evolving tables in the reasoning chain for table understanding
Code-LLM StarCoder 2 and The Stack v2: The Next Generation
Code-LLM, Front-End Design2Code: How Far Are We From Automating Front-End Engineering?
Code-as-Policies, Embodied, PersonalCitation,
Inner Monologue: Embodied Reasoning through Planning with Language Models ArXiv
Code-as-Policies, Embodied, PersonalCitation,
Code as Policies: Language Model Programs for Embodied Control ArXiv 2022/09/16
Code-as-Policies, Multimodal, OpenGVLab,
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model ArXiv 2023/05/18
Code-as-Policies, PersonalCitation, Robot ChatGPT for Robotics: Design Principles and Model Abilities
Code-as-Policies, PersonalCitation, Robot RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Code-as-Policies, PersonalCitation, Robot RoboCodeX:Multi-modal Code Generation forRobotic Behavior Synthesis ArXiv
Code-as-Policies, PersonalCitation, Robot ProgPrompt: Generating Situated Robot Task Plans using Large Language Models ArXiv 2022/09/22
Code-as-Policies, PersonalCitation, Robot,
Statler: State-Maintaining Language Models for Embodied Reasoning ArXiv 2023/06/30
Code-as-Policies, PersonalCitation, Robot,
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language ArXiv 2022/04/01
Code-as-Policies, Reasoning Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Code-as-Policies, Reasoning, VLM, VQA ViperGPT: Visual Inference via Python Execution for Reasoning ArXiv 2023/03/14
Code-as-Policies, Reinforcement-Learning, Reward Code as Reward: Empowering Reinforcement Learning with VLMs ArXiv
Code-as-Policies, Robot Creative Robot Tool Use with Large Language Models
Code-as-Policies, Robot RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Code-as-Policies, Robot Executable Code Actions Elicit Better LLM Agents
Code-as-Policies, Robot SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models ArXiv 2023/09/18
Code-as-Policies, VLM, VQA Visual Programming: Compositional visual reasoning without training ArXiv 2022/11/18
Compress, Prompting Learning to Compress Prompts with Gist Tokens ArXiv
Compress, Quantization, Survey A Survey on Model Compression for Large Language Models ArXiv
Compress, Scaling (Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression ArXiv
Context-Window RoFormer: Enhanced Transformer with Rotary Position Embedding
Context-Window, Foundation Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Context-Window, Foundation, Gemini, LLM, Scaling Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Context-Window, LLM, RoPE, Scaling LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens ArXiv
Context-Window, Reasoning, RoPE, Scaling Resonance RoPE: Improving Context Length Generalization of Large Language Models
Context-Window, Scaling LONGNET: Scaling Transformers to 1,000,000,000 Tokens ArXiv 2023/07/01
Context-Window, Scaling Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Data-generation, Robot RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation ArXiv 2023/11/02
Data-generation, Robot GenSim: Generating Robotic Simulation Tasks via Large Language Models ArXiv 2023/10/02
Datatset, Instruction-Turning Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Datatset, LLM, Survey A Survey on Data Selection for Language Models
Demonstration, GPT4, PersonalCitation, Robot GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Diffusion A latent text-to-image diffusion model
Diffusion, Robot 3D Diffusion Policy ArXiv
Diffusion, Speech NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Diffusion, Survey On the Design Fundamentals of Diffusion Models: A Survey ArXiv
Diffusion, Text-to-Image Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Distilling Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes01
Distilling, Survey A Survey on Knowledge Distillation of Large Language Models
Drive, Survey A Survey on Multimodal Large Language Models for Autonomous Driving ArXiv
Driving, Spacial GPT-Driver: Learning to Drive with GPT ArXiv 2023/10/02
Embodied, LLM, Robot, Survey The Development of LLMs for Embodied Navigation ArXiv 2023/11/01
Embodied, Reasoning, Robot Natural Language as Polices: Reasoning for Coordinate-Level Embodied Control with LLMs ArXiv, GitHub 2024/03/20
Embodied, Robot Large Language Models as Generalizable Policies for Embodied Tasks
Embodied, Robot, Task-Decompose Embodied Task Planning with Large Language Models ArXiv 2023/07/04
Embodied, World-model Language Models Meet World Models: Embodied Experiences Enhance Language Models
Enbodied Embodied Question Answering ArXiv
End2End, Multimodal, Robot VIMA: General Robot Manipulation with Multimodal Prompts ArXiv 2022/10/06
End2End, Multimodal, Robot PaLM-E: An Embodied Multimodal Language Model ArXiv 2023/03/06
End2End, Multimodal, Robot Physically Grounded Vision-Language Models for Robotic Manipulation ArXiv 2023/09/05
Evaluation, LLM, Survey A Survey on Evaluation of Large Language Models ArXiv
Feedback, In-Context-Learning, Robot InCoRo: In-Context Learning for Robotics Control with Feedback Loops
Feedback, Robot Correcting Robot Plans with Natural Language Feedback ArXiv
Feedback, Robot Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Feedback, Robot REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction ArXiv 2023/06/27
Foundation, LLM, Open-source Code Llama: Open Foundation Models for Code
Foundation, LLM, Open-source LLaMA: Open and Efficient Foundation Language Models ArXiv 2023/02/27
Foundation, LLaMA, Vision VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Foundation, Robot, Survey Foundation Models in Robotics: Applications, Challenges, and the Future ArXiv 2023/12/13
GPT4, Gemini, LLM Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases ArXiv 2023/12/22
GPT4, LLM GPT-4 Technical Report ArXiv 2023/03/15
Generation, Robot, Zero-shot Towards Generalizable Zero-Shot Manipulationvia Translating Human Interaction Plans
Generation, Robot, Zero-shot Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models ArXiv
Generation, Survey Advances in 3D Generation: A Survey
Grounding GLaMM: Pixel Grounding Large Multimodal Model
Grounding V-IRL: Grounding Virtual Intelligence in Real Life
Grounding, Reasoning Visually Grounded Reasoning across Languages and Cultures
Grounding, Reinforcement-Learning Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Gym, PPO, Reinforcement-Learning, Survey Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym
Hallucination, Survey Combating Misinformation in the Age of LLMs: Opportunities and Challenges ArXiv
Image, LLaMA, Perception LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
In-Context-Learning Can large language models explore in-context?
In-Context-Learning What does CLIP know about a red circle? Visual prompt engineering for VLMs
In-Context-Learning ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate ArXiv 2023/08/14
In-Context-Learning ReAct: Synergizing Reasoning and Acting in Language Models ArXiv 2023/03/20
In-Context-Learning Generative Agents: Interactive Simulacra of Human Behavior ArXiv 2023/04/07
In-Context-Learning Small Models are Valuable Plug-ins for Large Language Models ArXiv 2023/05/15
In-Context-Learning Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models ArXiv 2022/06/09
In-Context-Learning, Instruction-Turning In-Context Instruction Learning
In-Context-Learning, Perception, Vision Visual In-Context Prompting
In-Context-Learning, Prompt-Tuning Visual Prompt Tuning
In-Context-Learning, Reinforcement-Learning AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
In-Context-Learning, Scaling Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale 2022/03/06
In-Context-Learning, Scaling Structured Prompting: Scaling In-Context Learning to 1,000 Examples 2020/03/06
In-Context-Learning, Survey A Survey on In-context Learning ArXiv
In-Context-Learning, VQA VisualCOMET: Reasoning about the Dynamic Context of a Still Image ArXiv 2020/04/22
In-Context-Learning, VQA SINC: Self-Supervised In-Context Learning for Vision-Language Tasks ArXiv 2023/07/15
In-Context-Learning, Video Prompting Visual-Language Models for Efficient Video Understanding
In-Context-Learning, Vision Visual Prompting via Image Inpainting
In-Context-Learning, Vision What Makes Good Examples for Visual In-Context Learning?
Instruction-Turning Tuna: Instruction Tuning using Feedback from Large Language Models ArXiv 2023/03/06
Instruction-Turning Exploring the Benefits of Training Expert Language Models over Instruction Tuning ArXiv 2023/02/06
Instruction-Turning Exploring Format Consistency for Instruction Tuning
Instruction-Turning A Closer Look at the Limitations of Instruction Tuning
Instruction-Turning, LLM Training language models to follow instructions with human feedback ArXiv 2022/03/04
Instruction-Turning, LLM Self-Instruct: Aligning Language Models with Self-Generated Instructions ArXiv 2022/12/20
Instruction-Turning, LLM MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models ArXiv 2023/04/20
Instruction-Turning, LLM, PEFT LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention ArXiv 2023/03/28
Instruction-Turning, LLM, PEFT Visual Instruction Tuning ArXiv 2023/04/17
Instruction-Turning, LLM, Survey Instruction Tuning for Large Language Models: A Survey
Instruction-Turning, LLM, Zero-shot Finetuned Language Models Are Zero-Shot Learners ArXiv 2021/09/03
Instruction-Turning, Self Self-Instruct: Aligning Language Models with Self-Generated Instructions
Instruction-Turning, Survey A Survey on Data Selection for LLM Instruction Tuning
Instruction-Turning, Survey A Closer Look at the Limitations of Instruction Tuning ArXiv
Instruction-Turning, Survey Vision-Language Instruction Tuning: A Review and Analysis
Instruction-Turning, Survey Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning
Intaractive, OpenGVLab, VLM InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language ArXiv 2023/05/09
LLM Language Models are Few-Shot Learners ArXiv 2020/05/28
LLM, Memory MemoryBank: Enhancing Large Language Models with Long-Term Memory ArXiv 2023/05/17
LLM, Open-source A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device. GitHub
LLM, Open-source InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning ArXiv 2023/05/11
LLM, Open-source ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst ArXiv 2023/05/25
LLM, Open-source OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models ArXiv 2023/08/02
LLM, Open-source, Perception, Segmentation Segment Anything ArXiv 2023/04/05
LLM, PersonalCitation, Robot Tree-Planner: Efficient Close-loop Task Planning with Large Language Models01
LLM, PersonalCitation, Robot, Zero-shot Language Models as Zero-Shot Trajectory Generators ArXiv
LLM, Quantization The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits ArXiv
LLM, Reasoning, Survey Towards Reasoning in Large Language Models: A Survey ArXiv 2022/12/20
LLM, Robot, Survey Large Language Models for Robotics: A Survey
LLM, Robot, Task-Decompose Do As I Can, Not As I Say: Grounding Language in Robotic Affordances ArXiv 2022/04/04
LLM, Scaling BitNet: Scaling 1-bit Transformers for Large Language Models ArXiv
LLM, Spacial Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning ArXiv 2023/10/05
LLM, Survey A Survey of Large Language Models ArXiv 2023/03/31
LLM, Temporal Logics NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models ArXiv 2023/05/12
LLM, Zero-shot GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? ArXiv 2023/11/27
LLaMA, Lightweight, Open-source MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
LLaVA, VLM TinyLLaVA: A Framework of Small-scale Large Multimodal Models ArXiv
Lab Imperial College London - Zeroshot trajectory
Lab OpenGVLab GitHub
Lab Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University - CogVLM
Lab Rutgers University, AGI Research - OpenAGI
Lab XLANG NLP Lab - OpenAgents
Lab OpenBMB - ChatDev, XAgent, AgentVerse
Lab Reworkd AI - AgentGPT
Lab DeepWisdom - MetaGPT
Lab Tencent AI Lab - AppAgent, WebVoyager
LoRA, Scaling Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements ArXiv
LoRA, Scaling LoRA: Low-Rank Adaptation of Large Language Models
Low-level-action, Robot SayTap: Language to Quadrupedal Locomotion ArXiv 2023/06/13
Low-level-action, Robot Prompt a Robot to Walk with Large Language Models ArXiv 2023/09/18
Math, Reasoning DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Memory, Reinforcement-Learning Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Memory, Robot LLM as A Robotic Brain: Unifying Egocentric Memory and Control ArXiv 2023/04/19
MoE Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity ArXiv
Multimodal, Robot Flamingo: a Visual Language Model for Few-Shot Learning ArXiv 2022/04/29
Multimodal, Robot Open-World Object Manipulation using Pre-trained Vision-Language Models ArXiv 2023/03/02
Multimodal, Robot MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation ArXiv 2023/08/07
Natural-Language-as-Polices, Robot RT-H: Action Hierarchies Using Language ArXiv
Navigation, Reasoning, Vision NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Open-source Gemma: Introducing new state-of-the-art open models ArXiv
Open-source, Perception Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Open-source, VLM OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models ArXiv 2023/08/02
PPO, RLHF, Reinforcement-Learning Secrets of RLHF in Large Language Models Part I: PPO ArXiv 2024/02/01
Package Alpaca-LoRA GitHub
Package Dify GitHub
Package h2oGPT GitHub
Package LangChain GitHub
Package LlamaIndex GitHub
Perception SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Perception Simple Open-Vocabulary Object Detection with Vision Transformers ArXiv
Perception Recognize Anything: A Strong Image Tagging Model ArXiv
Perception DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Perception Grounded Language-Image Pre-training ArXiv 2021/12/07
Perception Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection ArXiv 2023/03/09
Perception PointCLIP: Point Cloud Understanding by CLIP ArXiv 2021/12/04
Perception Simple Open-Vocabulary Object Detection with Vision Transformers ArXiv 2022/05/12
Perception, Reasoning Lenna: Language Enhanced Reasoning Detection Assistant ArXiv
Perception, Reasoning DetGPT: Detect What You Need via Reasoning ArXiv
Perception, Reasoning, Robot Reasoning Grasping via Multimodal Large Language Model ArXiv
Perception, Robot Language Segment-Anything
Perception, Robot LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding ArXiv 2023/12/21
Perception, Task-Decompose DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment ArXiv 2023/07/01
Perception, Video, Vision CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval ArXiv
PersonalCitation, Robot Text2Motion: From Natural Language Instructions to Feasible Plans ArXiv
Prompting Contrastive Chain-of-Thought Prompting
Prompting, Survey A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
Quantization, Scaling SliceGPT: Compress Large Language Models by Deleting Rows and Columns
RAG Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity ArXiv
RAG RAFT: Adapting Language Model to Domain Specific RAG ArXiv
RAG RAG-Fusion: a New Take on Retrieval-Augmented Generation ArXiv
RAG Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
RAG Training Language Models with Memory Augmentation
RAG, Survey Retrieval-Augmented Generation for Large Language Models: A Survey
RAG, Survey Retrieval-Augmented Generation for Large Language
RAG, Survey Large Language Models for Information Retrieval: A Survey
RAG, Temporal Logics FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation ArXiv
RLHF Secrets of RLHF in Large Language Models Part II: Reward Modeling
RLHF, Reinforcement-Learning, Survey A Survey of Reinforcement Learning from Human Feedback
Reasoning The Impact of Reasoning Step Length on Large Language Models
Reasoning STaR: Bootstrapping Reasoning With Reasoning ArXiv 2022/05/28
Reasoning Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Reasoning Rephrase and Respond(RaR)
Reasoning Contrastive Chain-of-Thought Prompting
Reasoning Chain-of-Thought Reasoning Without Prompting ArXiv
Reasoning Self-Discover: Large Language Models Self-Compose Reasoning Structures
Reasoning Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning ArXiv
Reasoning ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs.
Reasoning, Reinforcement-Learning ReFT: Reasoning with Reinforced Fine-Tuning
Reasoning, Robot AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Reasoning, Survey Reasoning with Language Model Prompting: A Survey ArXiv
Reasoning, Symbolic Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning
Reasoning, Table Large Language Models are few(1)-shot Table Reasoners ArXiv
Reasoning, VLM, VQA MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action ArXiv 2023/03/20
Reasoning, Zero-shot Large Language Models are Zero-Shot Reasoners
Reinforcement-Learning Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Reinforcement-Learning RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents
Resource [Resource] arxiv-sanity ArXiv
Resource [Resource] AlphaSignal ArXiv
Resource [Resource] Semanticscholar ArXiv
Resource [Resource] Connectedpapers ArXiv
Resource [Resource] dailyarxiv ArXiv
Resource [Resource] huggingface ArXiv
Resource [Resource] Paperswithcode ArXiv
RoPE RoFormer: Enhanced Transformer with Rotary Position Embedding ArXiv
Robot DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies ArXiv
Robot OCI-Robotics: Object-Centric Instruction Augmentation for Robotic Manipulation ArXiv
Robot PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs ArXiv
Robot Introspective Tips: Large Language Model for In-Context Decision Making ArXiv
Robot RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Robot Generative Expressive Robot Behaviors using Large Language Models
Robot OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Robot RoCo: Dialectic Multi-Robot Collaboration with Large Language Models ArXiv
Robot Interactive Language: Talking to Robots in Real Time
Robot Reflexion: Language Agents with Verbal Reinforcement Learning ArXiv 2023/03/20
Robot, Survey Real-World Robot Applications of Foundation Models: A Review
Robot, Survey Language-conditioned Learning for Robotic Manipulation: A Survey ArXiv 2023/12/17
Robot, Survey Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis ArXiv 2023/12/14
Robot, Survey Robot Learning in the Era of Foundation Models: A Survey ArXiv 2023/11/24
Robot, Task-Decompose SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning ArXiv 2023/07/12
Robot, Task-Decompose, Zero-shot Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents ArXiv 2022/01/18
Robot, Zero-shot Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Robot, Zero-shot Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting
Robot, Zero-shot Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Robot, Zero-shot BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning ArXiv
Sora, Text-to-Video Mora: Enabling Generalist Video Generation via A Multi-Agent Framework ArXiv
Sora, Text-to-Video Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Survey Efficient Large Language Models: A Survey ArXiv, GitHub
Survey, TimeSeries Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Survey, Training Understanding LLMs: A Comprehensive Overview from Training to Inference
Survey, VLM MM-LLMs: Recent Advances in MultiModal Large Language Models
Survey, Video Video Understanding with Large Language Models: A Survey
Temporal Explorative Inbetweening of Time and Space ArXiv
Tex2Img Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation ArXiv
Text-to-Image, World-model World Model on Million-Length Video And Language With RingAttention
VLM ScreenAI: A Vision-Language Model for UI and Infographics Understanding
VLM PaLM: Scaling Language Modeling with Pathways ArXiv 2022/04/05
VLM, VQA DeepSeek-VL: Towards Real-World Vision-Language Understanding01
VLM, VQA CogVLM: Visual Expert for Pretrained Language Models ArXiv 2023/11/06
VLM, VQA Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models ArXiv 2023/04/19
VLM, World-model Large World Model ArXiv
ViFM, Video InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding ArXiv, GitHub
World-model Learning to Model the World with Language ArXiv
World-model Diffusion World Model ArXiv
World-model Learning to Model the World with Language ArXiv
World-model Language Models Meet World Models ArXiv
World-model Learning and Leveraging World Models in Visual Representation Learning
World-model Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Zero-shot Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?


Updated-daily/Over-200-Papers/CoT/VLM/Memory/Grounding/Human-Intelligence/Prompt/Reasoning/Robot/Agent/Planning/Reinforcement Learning/Feedback/In-Context-Learning/Instruction-Tuning/PEFT/RLHF/VLM/RAG/Embodied/VQA






No releases published


No packages published