We provide awesome papers and repos on very comprehensive topics as follows.
CoT / VLM / Quantization / Grounding / Text2IMG&VID / Prompt Engineering / Reasoning / Robot / Agent / Planning / Reinforcement-Learning / Feedback / In-Context-Learning / InstructionTuning / PEFT / RLHF / RAG / Embodied / VQA / Hallucination / Diffusion / Scaling / Context-Window / WorldModel / Memory / Zero-Shot / RoPE / Speech / Perception / Survey / Segmentation / Learge Action Model / Foundation / RoPE / LoRA
We strongly recommend checking our Notion table for interactive experience.
Category | Title | Links | Date |
---|---|---|---|
3D, GPT4, VLM | GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation | ArXiv | |
3D, Open-source, Perception, Robot | 3D-LLM: Injecting the 3D World into Large Language Models | ArXiv | 2023/07/24 |
AGI, Agent | OpenAGI: When LLM Meets Domain Experts | ArXiv, GitHub | 2023/04/10 |
AGI, Awesome Repo, Survey | Awesome-LLM-Papers-Toward-AGI | GitHub | |
AGI, Brain | When Brain-inspired AI Meets AGI | ||
AGI, Brain | Divergences between Language Models and Human Brains | ||
AGI, Survey | Levels of AGI: Operationalizing Progress on the Path to AGI | ||
APIs, Agent, Tool | Gorilla: Large Language Model Connected with Massive APIs | ArXiv | |
Action-Generation, Generation, Prompting | Prompt a Robot to Walk with Large Language Models | ||
Action-Model, Agent, LAM | LaVague | GitHub | |
Agent | LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem | ArXiv | |
Agent | AIOS: LLM Agent Operating System | ArXiv | |
Agent | Cognitive Architectures for Language Agents | ArXiv | |
Agent | PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization | ||
Agent | AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn | ||
Agent | ScreenAgent: A Vision Language Model-driven Computer Control Agent | ||
Agent | swarms | GitHub | |
Agent | Agents: An Open-source Framework for Autonomous Language Agents | ||
Agent | MindAgent: Emergent Gaming Interaction | ||
Agent | InfiAgent: A Multi-Tool Agent for AI Operating Systems | ||
Agent | Predictive Minds: LLMs As Atypical Active Inference Agents | ||
Agent | XAgent: An Autonomous Agent for Complex Task Solving | ||
Agent | LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination | ||
Agent | AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | ArXiv | |
Agent | Agents: An Open-source Framework for Autonomous Language Agents | ArXiv, GitHub | |
Agent | AutoAgents: A Framework for Automatic Agent Generation | GitHub | |
Agent | DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | ArXiv | |
Agent | AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation | ||
Agent | CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society | ||
Agent | XAgent: An Autonomous Agent for Complex Task Solving | ArXiv | |
Agent | Generative Agents: Interactive Simulacra of Human Behavior | ArXiv | |
Agent | LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | ArXiv | 2023/04/22 |
Agent | AgentSims: An Open-Source Sandbox for Large Language Model Evaluation | ArXiv | 2023/08/08 |
Agent, Awesome Repo | Awesome LLM-Powered Agent | GitHub | |
Agent, Awesome Repo | LLM Agents Papers | GitHub | |
Agent, Awesome Repo | Awesome Large Multimodal Agents | GitHub | |
Agent, Awesome Repo | Awesome-Papers-Autonomous-Agent | GitHub | |
Agent, Awesome Repo | Autonomous Agents | GitHub | |
Agent, Awesome Repo | Awesome AI Agents | GitHub | |
Agent, Awesome Repo, Embodied, Grounding | XLang Paper Reading | GitHub | |
Agent, Awesome Repo, LLM | CoALA: Awesome Language Agents | GitHub | |
Agent, Awesome Repo, LLM | Awesome-Embodied-Agent-with-LLMs | GitHub | |
Agent, Blog | LLM Powered Autonomous Agents | ArXiv | |
Agent, Code-LLM | TaskWeaver: A Code-First Agent Framework | ||
Agent, Code-LLM, Code-as-Policies, Survey | If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | ArXiv | |
Agent, Code-as-Policies | Executable Code Actions Elicit Better LLM Agents | ArXiv | 2024/01/24 |
Agent, Embodied | Embodied Task Planning with Large Language Models | ||
Agent, Embodied | Octopus: Embodied Vision-Language Programmer from Environmental Feedback | ||
Agent, Embodied | Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld | ArXiv | |
Agent, Embodied | LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | ||
Agent, Embodied | OpenAgents: An Open Platform for Language Agents in the Wild | ArXiv, GitHub | |
Agent, Embodied, Robot | OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following | ||
Agent, Embodied, Robot | AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents | ArXiv | |
Agent, Embodied, Survey | Application of Pretrained Large Language Models in Embodied Artificial Intelligence | ArXiv | |
Agent, End2End, Game, Robot | An Interactive Agent Foundation Model | ArXiv | |
Agent, Feedback, Reinforcement-Learning | AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback | ArXiv | 2023/09/29 |
Agent, Feedback, Reinforcement-Learning, Robot | Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models | ArXiv | 2023/11/04 |
Agent, GPT4, Web | GPT-4V(ision) is a Generalist Web Agent, if Grounded | ||
Agent, GUI | SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents | ||
Agent, GUI | ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model | GitHub | |
Agent, GUI | CogAgent: A Visual Language Model for GUI Agents | ||
Agent, GUI, MobileApp | You Only Look at Screens: Multimodal Chain-of-Action Agents | ||
Agent, GUI, MobileApp | Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception | ||
Agent, GUI, MobileApp | AppAgent: Multimodal Agents as Smartphone Users | ||
Agent, GUI, Web | "What’s important here?": Opportunities and Challenges of Using LLMs in Retrieving Informatio from Web Interfaces | ||
Agent, Game | LEARNING EMBODIED VISION-LANGUAGE PRO- GRAMMING FROM INSTRUCTION, EXPLORATION, AND ENVIRONMENTAL FEEDBACK | ||
Agent, Instruction-Turning | AgentTuning: Enabling Generalized Agent Abilities For LLMs | ArXiv | |
Agent, LLM, Planning | LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | ||
Agent, Memory, Minecraft | JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | ArXiv | 2023/11/10 |
Agent, Memory, RAG | RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | ArXiv | 2024/02/06 |
Agent, Minecraft | Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory01 | ||
Agent, Minecraft | S-Agents: Self-organizing Agents in Open-ended Environment | ||
Agent, Minecraft | Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds | ||
Agent, Minecraft | LARP: Language-Agent Role Play for Open-World Games | ||
Agent, Minecraft | Voyager: An Open-Ended Embodied Agent with Large Language Models | ArXiv | 2023/05/25 |
Agent, Minecraft | Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | ArXiv | 2023/02/03 |
Agent, Minecraft, Reinforcement-Learning | RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds | ||
Agent, MobileApp | You Only Look at Screens: Multimodal Chain-of-Action Agents | GitHub | |
Agent, Multi | War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | ArXiv | |
Agent, Multimodal, Robot | A Generalist Agent | ArXiv | 2022/05/12 |
Agent, Reasoning | AGENT INSTRUCTS LARGE LANGUAGE MODELS TO BE GENERAL ZERO-SHOT REASONERS | ||
Agent, Reasoning | Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | ||
Agent, Reasoning, Zero-shot | Agent Instructs Large Language Models to be General Zero-Shot Reasoners | ArXiv | 2023/10/05 |
Agent, Reinforcement-Learning | STARLING: SELF-SUPERVISED TRAINING OF TEXTBASED REINFORCEMENT LEARNING AGENT WITH LARGE LANGUAGE MODELS | ||
Agent, Reinforcement-Learning | Language Instructed Reinforcement Learning for Human-AI Coordination | ArXiv | 2023/04/13 |
Agent, Reinforcement-Learning | Eureka: Human-Level Reward Design via Coding Large Language Models | ArXiv | 2023/10/19 |
Agent, Reinforcement-Learning | Guiding Pretraining in Reinforcement Learning with Large Language Models | ArXiv | 2023/02/13 |
Agent, Reinforcement-Learning | Language to Rewards for Robotic Skill Synthesis | ArXiv | 2023/06/14 |
Agent, Reinforcement-Learning, Reward | EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL | ArXiv | 2022/06/20 |
Agent, Reinforcement-Learning, Reward | Reward Design with Language Models | ArXiv | 2023/02/27 |
Agent, Reinforcement-Learning, Reward | Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning | ArXiv | 2023/09/20 |
Agent, Soft-Dev | Communicative Agents for Software Development | GitHub | |
Agent, Soft-Dev | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | ||
Agent, Survey | Large Multimodal Agents: A Survey | ||
Agent, Survey | Agent AI: Surveying the Horizons of Multimodal Interaction | ||
Agent, Survey | A Survey on LLM-based Autonomous Agents | GitHub | |
Agent, Survey | The Rise and Potential of Large Language Model Based Agents: A Survey | ArXiv | 2023/09/14 |
Agent, Survey | A Survey on Large Language Model based Autonomous Agents | ArXiv | 2023/08/22 |
Agent, Tool | ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | ||
Agent, Tool | Gorilla: Large Language Model Connected with Massive APIs | ||
Agent, Video-for-Agent | Video as the New Language for Real-World Decision Making | ||
Agent, Web | OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | ArXiv | |
Agent, Web | OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | ArXiv | |
Agent, Web | WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | ||
Agent, Web | WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | ||
Agent-Project, Code-LLM | open-interpreter | GitHub | |
Anything, CLIP, Perception | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding | ||
Anything, Caption, Perception, Segmentation | Segment and Caption Anything | ArXiv | |
Anything, Depth | Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | ||
Anything, Perception, Segmentation | Segment Anything | ArXiv | |
Audio | Robust Speech Recognition via Large-Scale Weak Supervision | ||
Audio2Video, Diffusion, Generation, Video | EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | ||
Automate, Chain-of-Thought, Reasoning | Automatic Chain of Thought Prompting in Large Language Models | ArXiv | 2022/10/07 |
Automate, Prompting | Large Language Models Are Human-Level Prompt Engineers | ArXiv | 2022/11/03 |
Awesome Repo, Chain-of-Thought | Chain-of-ThoughtsPapers | GitHub | |
Awesome Repo, Chinese | Awesome-Chinese-LLM | GitHub | |
Awesome Repo, Compress | Awesome LLM Compression | GitHub | |
Awesome Repo, Diffusion | Awesome-Diffusion-Models | GitHub | |
Awesome Repo, Embodied | Awesome Embodied Vision | GitHub | |
Awesome Repo, Hallucination, Survey | A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | ArXiv, GitHub | |
Awesome Repo, IROS, Robot | IROS2023PaperList | GitHub | |
Awesome Repo, In-Context-Learning | Paper List for In-context Learning | GitHub | |
Awesome Repo, Japanese, LLM | 日本語LLMまとめ | GitHub | |
Awesome Repo, Korean | awesome-korean-llm | GitHub | |
Awesome Repo, LLM | Awesome-LLM | GitHub | |
Awesome Repo, LLM, Leaderboard | LLM-Leaderboard | GitHub | |
Awesome Repo, LLM, Robot | Everything-LLMs-And-Robotics | GitHub | |
Awesome Repo, LLM, Survey | Awesome-LLM-Survey | GitHub | |
Awesome Repo, LLM, VLM | Multimodal & Large Language Models | GitHub | |
Awesome Repo, LLM, Vision | LLM-in-Vision | GitHub | |
Awesome Repo, Multimodal | Awesome-Multimodal-LLM | GitHub | |
Awesome Repo, Multimodal | Awesome-Multimodal-Large-Language-Models | GitHub | |
Awesome Repo, Package | Awesome LLMOps | GitHub | |
Awesome Repo, Perception, VLM | Awesome Vision-Language Navigation | GitHub | |
Awesome Repo, RLHF, Reinforcement-Learning | Awesome RLHF (RL with Human Feedback) | GitHub | |
Awesome Repo, Reasoning | Awesome-Reasoning-Foundation-Models | GitHub | |
Awesome Repo, Reasoning | Awesome LLM Reasoning | GitHub | |
Awesome Repo, Robot | Awesome-LLM-Robotics | GitHub | |
Awesome Repo, Survey | LLMSurvey | GitHub | |
Benchmark, GPT4 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | ||
Benchmark, In-Context-Learning | PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change | ArXiv | 2022/06/21 |
Benchmark, In-Context-Learning | ARB: Advanced Reasoning Benchmark for Large Language Models | ArXiv | 2023/07/25 |
Benchmark, Sora, Text-to-Video | LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models01 | ||
Brain | A Neuro-Mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization | ||
Brain | LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model | ||
Brain, Conscious | Could a Large Language Model be Conscious? | ArXiv | 2023/03/04 |
Brain, Conscious | Could a Large Language Model be Conscious? | ArXiv | 2023/03/04 |
Brain, Instruction-Turning | Instruction-tuning Aligns LLMs to the Human Brain | ||
CRAG, RAG | Corrective Retrieval Augmented Generation | ArXiv | |
Caption, VLM, VQA | Caption Anything: Interactive Image Description with Diverse Multimodal Controls | ArXiv | 2023/05/04 |
Chain-of-Thought, Code-as-Policies | Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | ArXiv | |
Chain-of-Thought, Code-as-Policies, PersonalCitation, Robot |
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought | ArXiv | |
Chain-of-Thought, Embodied, PersonalCitation, Robot, Task-Decompose |
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought | ArXiv | 2023/05/24 |
Chain-of-Thought, Embodied, Robot | EgoCOT: Embodied Chain-of-Thought Dataset for Vision Language Pre-training | ||
Chain-of-Thought, GPT4, Reasoning, Robot | Look Before You Leap: Unveiling the Power ofGPT-4V in Robotic Vision-Language Planning | ArXiv | 2023/11/29 |
Chain-of-Thought, In-Context-Learning | Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding | ArXiv | |
Chain-of-Thought, In-Context-Learning | Reasoning with Language Model is Planning with World Model | ArXiv | 2023/05/24 |
Chain-of-Thought, In-Context-Learning | Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | ArXiv | 2023/05/06 |
Chain-of-Thought, In-Context-Learning | Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations | ArXiv | 2022/05/24 |
Chain-of-Thought, In-Context-Learning | PAL: Program-aided Language Models | ArXiv | 2022/11/18 |
Chain-of-Thought, In-Context-Learning | Self-Refine: Iterative Refinement with Self-Feedback | ArXiv | 2023/03/30 |
Chain-of-Thought, In-Context-Learning | Complexity-Based Prompting for Multi-Step Reasoning | ArXiv | 2022/10/03 |
Chain-of-Thought, In-Context-Learning | Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models | ArXiv | 2023/08/20 |
Chain-of-Thought, In-Context-Learning | Least-to-Most Prompting Enables Complex Reasoning in Large Language Models | ArXiv | 2022/05/21 |
Chain-of-Thought, In-Context-Learning, Self | Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement | ArXiv | 2023/05/23 |
Chain-of-Thought, In-Context-Learning, Self | Measuring and Narrowing the Compositionality Gap in Language Models | ArXiv | 2022/10/07 |
Chain-of-Thought, Planning, Reasoning | SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | ArXiv | 2023/08/01 |
Chain-of-Thought, Prompting | Chain-of-Thought Reasoning Without Prompting | ||
Chain-of-Thought, Reasoning | Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation | ||
Chain-of-Thought, Reasoning | Multimodal Chain-of-Thought Reasoning in Language Models | ArXiv | 2023/02/02 |
Chain-of-Thought, Reasoning | Self-Consistency Improves Chain of Thought Reasoning in Language Models | ArXiv | 2022/03/21 |
Chain-of-Thought, Reasoning | Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding | ArXiv | 2023/07/28 |
Chain-of-Thought, Reasoning | Rethinking with Retrieval: Faithful Large Language Model Inference | ArXiv | 2022/12/31 |
Chain-of-Thought, Reasoning | Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance | ArXiv | 2023/05/26 |
Chain-of-Thought, Reasoning | Tree of Thoughts: Deliberate Problem Solving with Large Language Models | ArXiv | 2023/05/17 |
Chain-of-Thought, Reasoning | Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework | ArXiv | 2023/05/05 |
Chain-of-Thought, Reasoning | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | ArXiv | 2022/01/28 |
Chain-of-Thought, Reasoning, Survey | Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters | ArXiv | 2023/12/20 |
Chain-of-Thought, Reasoning, Survey | A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future | ArXiv | 2023/09/27 |
Chain-of-Thought, Reasoning, Table | Chain-of-table: Evolving tables in the reasoning chain for table understanding | ||
Code-LLM | StarCoder 2 and The Stack v2: The Next Generation | ||
Code-LLM, Front-End | Design2Code: How Far Are We From Automating Front-End Engineering? | ||
Code-as-Policies, Embodied, PersonalCitation, Reasoning, Robot, Task-Decompose |
Inner Monologue: Embodied Reasoning through Planning with Language Models | ArXiv | |
Code-as-Policies, Embodied, PersonalCitation, Robot |
Code as Policies: Language Model Programs for Embodied Control | ArXiv | 2022/09/16 |
Code-as-Policies, Multimodal, OpenGVLab, PersonalCitation, Robot |
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model | ArXiv | 2023/05/18 |
Code-as-Policies, PersonalCitation, Robot | ChatGPT for Robotics: Design Principles and Model Abilities | ||
Code-as-Policies, PersonalCitation, Robot | RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks | ||
Code-as-Policies, PersonalCitation, Robot | RoboCodeX:Multi-modal Code Generation forRobotic Behavior Synthesis | ArXiv | |
Code-as-Policies, PersonalCitation, Robot | ProgPrompt: Generating Situated Robot Task Plans using Large Language Models | ArXiv | 2022/09/22 |
Code-as-Policies, PersonalCitation, Robot, State-Manage |
Statler: State-Maintaining Language Models for Embodied Reasoning | ArXiv | 2023/06/30 |
Code-as-Policies, PersonalCitation, Robot, Zero-shot |
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language | ArXiv | 2022/04/01 |
Code-as-Policies, Reasoning | Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | ||
Code-as-Policies, Reasoning, VLM, VQA | ViperGPT: Visual Inference via Python Execution for Reasoning | ArXiv | 2023/03/14 |
Code-as-Policies, Reinforcement-Learning, Reward | Code as Reward: Empowering Reinforcement Learning with VLMs | ArXiv | |
Code-as-Policies, Robot | Creative Robot Tool Use with Large Language Models | ||
Code-as-Policies, Robot | RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation | ||
Code-as-Policies, Robot | Executable Code Actions Elicit Better LLM Agents | ||
Code-as-Policies, Robot | SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models | ArXiv | 2023/09/18 |
Code-as-Policies, VLM, VQA | Visual Programming: Compositional visual reasoning without training | ArXiv | 2022/11/18 |
Compress, Prompting | Learning to Compress Prompts with Gist Tokens | ArXiv | |
Compress, Quantization, Survey | A Survey on Model Compression for Large Language Models | ArXiv | |
Compress, Scaling | (Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression | ArXiv | |
Context-Window | RoFormer: Enhanced Transformer with Rotary Position Embedding | ||
Context-Window, Foundation | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | ||
Context-Window, Foundation, Gemini, LLM, Scaling | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | ||
Context-Window, LLM, RoPE, Scaling | LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | ArXiv | |
Context-Window, Reasoning, RoPE, Scaling | Resonance RoPE: Improving Context Length Generalization of Large Language Models | ||
Context-Window, Scaling | LONGNET: Scaling Transformers to 1,000,000,000 Tokens | ArXiv | 2023/07/01 |
Context-Window, Scaling | Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens | ||
Data-generation, Robot | RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation | ArXiv | 2023/11/02 |
Data-generation, Robot | GenSim: Generating Robotic Simulation Tasks via Large Language Models | ArXiv | 2023/10/02 |
Datatset, Instruction-Turning | Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | ||
Datatset, Instruction-Turning | REVO-LION: EVALUATING AND REFINING VISION LANGUAGE INSTRUCTION TUNING DATASETS | ||
Datatset, LLM, Survey | A Survey on Data Selection for Language Models | ||
Demonstration, GPT4, PersonalCitation, Robot | GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration | ||
Diffusion | A latent text-to-image diffusion model | ||
Diffusion, Robot | 3D Diffusion Policy | ArXiv | |
Diffusion, Speech | NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models | ||
Diffusion, Survey | On the Design Fundamentals of Diffusion Models: A Survey | ArXiv | |
Diffusion, Text-to-Image | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | ||
Distilling | Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes01 | ||
Distilling, Survey | A Survey on Knowledge Distillation of Large Language Models | ||
Drive, Survey | A Survey on Multimodal Large Language Models for Autonomous Driving | ArXiv | |
Driving, Spacial | GPT-Driver: Learning to Drive with GPT | ArXiv | 2023/10/02 |
Embodied, LLM, Robot, Survey | The Development of LLMs for Embodied Navigation | ArXiv | 2023/11/01 |
Embodied, Reasoning, Robot | Natural Language as Polices: Reasoning for Coordinate-Level Embodied Control with LLMs | ArXiv, GitHub | 2024/03/20 |
Embodied, Robot | Large Language Models as Generalizable Policies for Embodied Tasks | ||
Embodied, Robot, Task-Decompose | Embodied Task Planning with Large Language Models | ArXiv | 2023/07/04 |
Embodied, World-model | Language Models Meet World Models: Embodied Experiences Enhance Language Models | ||
Enbodied | Embodied Question Answering | ArXiv | |
End2End, Multimodal, Robot | VIMA: General Robot Manipulation with Multimodal Prompts | ArXiv | 2022/10/06 |
End2End, Multimodal, Robot | PaLM-E: An Embodied Multimodal Language Model | ArXiv | 2023/03/06 |
End2End, Multimodal, Robot | Physically Grounded Vision-Language Models for Robotic Manipulation | ArXiv | 2023/09/05 |
Evaluation, LLM, Survey | A Survey on Evaluation of Large Language Models | ArXiv | |
Feedback, In-Context-Learning, Robot | InCoRo: In-Context Learning for Robotics Control with Feedback Loops | ||
Feedback, Robot | Correcting Robot Plans with Natural Language Feedback | ArXiv | |
Feedback, Robot | Learning to Learn Faster from Human Feedback with Language Model Predictive Control | ||
Feedback, Robot | REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction | ArXiv | 2023/06/27 |
Foundation, LLM, Open-source | Code Llama: Open Foundation Models for Code | ||
Foundation, LLM, Open-source | LLaMA: Open and Efficient Foundation Language Models | ArXiv | 2023/02/27 |
Foundation, LLaMA, Vision | VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | ||
Foundation, Robot, Survey | Foundation Models in Robotics: Applications, Challenges, and the Future | ArXiv | 2023/12/13 |
GPT4, Gemini, LLM | Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases | ArXiv | 2023/12/22 |
GPT4, Instruction-Turning | INSTRUCTION TUNING WITH GPT-4 | ArXiv | |
GPT4, LLM | GPT-4 Technical Report | ArXiv | 2023/03/15 |
Generation, Robot, Zero-shot | Towards Generalizable Zero-Shot Manipulationvia Translating Human Interaction Plans | ||
Generation, Robot, Zero-shot | Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models | ArXiv | |
Generation, Survey | Advances in 3D Generation: A Survey | ||
Grounding | GLaMM: Pixel Grounding Large Multimodal Model | ||
Grounding | V-IRL: Grounding Virtual Intelligence in Real Life | ||
Grounding, Reasoning | Visually Grounded Reasoning across Languages and Cultures | ||
Grounding, Reinforcement-Learning | Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning | ||
Gym, PPO, Reinforcement-Learning, Survey | Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym | ||
Hallucination, Survey | Combating Misinformation in the Age of LLMs: Opportunities and Challenges | ArXiv | |
Image, LLaMA, Perception | LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models | ||
In-Context-Learning | Can large language models explore in-context? | ||
In-Context-Learning | What does CLIP know about a red circle? Visual prompt engineering for VLMs | ||
In-Context-Learning | ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | ArXiv | 2023/08/14 |
In-Context-Learning | ReAct: Synergizing Reasoning and Acting in Language Models | ArXiv | 2023/03/20 |
In-Context-Learning | Generative Agents: Interactive Simulacra of Human Behavior | ArXiv | 2023/04/07 |
In-Context-Learning | Small Models are Valuable Plug-ins for Large Language Models | ArXiv | 2023/05/15 |
In-Context-Learning | Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | ArXiv | 2022/06/09 |
In-Context-Learning, Instruction-Turning | In-Context Instruction Learning | ||
In-Context-Learning, Perception, Vision | Visual In-Context Prompting | ||
In-Context-Learning, Prompt-Tuning | Visual Prompt Tuning | ||
In-Context-Learning, Reinforcement-Learning | AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents | ||
In-Context-Learning, Scaling | Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale | 2022/03/06 | |
In-Context-Learning, Scaling | Structured Prompting: Scaling In-Context Learning to 1,000 Examples | 2020/03/06 | |
In-Context-Learning, Survey | A Survey on In-context Learning | ArXiv | |
In-Context-Learning, VQA | VisualCOMET: Reasoning about the Dynamic Context of a Still Image | ArXiv | 2020/04/22 |
In-Context-Learning, VQA | SINC: Self-Supervised In-Context Learning for Vision-Language Tasks | ArXiv | 2023/07/15 |
In-Context-Learning, Video | Prompting Visual-Language Models for Efficient Video Understanding | ||
In-Context-Learning, Vision | Visual Prompting via Image Inpainting | ||
In-Context-Learning, Vision | What Makes Good Examples for Visual In-Context Learning? | ||
Instruction-Turning | Tuna: Instruction Tuning using Feedback from Large Language Models | ArXiv | 2023/03/06 |
Instruction-Turning | Exploring the Benefits of Training Expert Language Models over Instruction Tuning | ArXiv | 2023/02/06 |
Instruction-Turning | Exploring Format Consistency for Instruction Tuning | ||
Instruction-Turning | A Closer Look at the Limitations of Instruction Tuning | ||
Instruction-Turning, LLM | Training language models to follow instructions with human feedback | ArXiv | 2022/03/04 |
Instruction-Turning, LLM | Self-Instruct: Aligning Language Models with Self-Generated Instructions | ArXiv | 2022/12/20 |
Instruction-Turning, LLM | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | ArXiv | 2023/04/20 |
Instruction-Turning, LLM, PEFT | LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention | ArXiv | 2023/03/28 |
Instruction-Turning, LLM, PEFT | Visual Instruction Tuning | ArXiv | 2023/04/17 |
Instruction-Turning, LLM, Survey | Instruction Tuning for Large Language Models: A Survey | ||
Instruction-Turning, LLM, Zero-shot | Finetuned Language Models Are Zero-Shot Learners | ArXiv | 2021/09/03 |
Instruction-Turning, Self | Self-Instruct: Aligning Language Models with Self-Generated Instructions | ||
Instruction-Turning, Survey | A Survey on Data Selection for LLM Instruction Tuning | ||
Instruction-Turning, Survey | A Closer Look at the Limitations of Instruction Tuning | ArXiv | |
Instruction-Turning, Survey | Vision-Language Instruction Tuning: A Review and Analysis | ||
Instruction-Turning, Survey | Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning | ||
Intaractive, OpenGVLab, VLM | InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language | ArXiv | 2023/05/09 |
LLM | Language Models are Few-Shot Learners | ArXiv | 2020/05/28 |
LLM, Memory | MemoryBank: Enhancing Large Language Models with Long-Term Memory | ArXiv | 2023/05/17 |
LLM, Open-source | A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device. | GitHub | |
LLM, Open-source | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | ArXiv | 2023/05/11 |
LLM, Open-source | ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst | ArXiv | 2023/05/25 |
LLM, Open-source | OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models | ArXiv | 2023/08/02 |
LLM, Open-source, Perception, Segmentation | Segment Anything | ArXiv | 2023/04/05 |
LLM, PersonalCitation, Robot | Tree-Planner: Efficient Close-loop Task Planning with Large Language Models01 | ||
LLM, PersonalCitation, Robot, Zero-shot | Language Models as Zero-Shot Trajectory Generators | ArXiv | |
LLM, Quantization | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | ArXiv | |
LLM, Reasoning, Survey | Towards Reasoning in Large Language Models: A Survey | ArXiv | 2022/12/20 |
LLM, Robot, Survey | Large Language Models for Robotics: A Survey | ||
LLM, Robot, Task-Decompose | Do As I Can, Not As I Say: Grounding Language in Robotic Affordances | ArXiv | 2022/04/04 |
LLM, Scaling | BitNet: Scaling 1-bit Transformers for Large Language Models | ArXiv | |
LLM, Spacial | Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning | ArXiv | 2023/10/05 |
LLM, Survey | A Survey of Large Language Models | ArXiv | 2023/03/31 |
LLM, Temporal Logics | NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models | ArXiv | 2023/05/12 |
LLM, Zero-shot | GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? | ArXiv | 2023/11/27 |
LLaMA, Lightweight, Open-source | MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT | ||
LLaVA, VLM | TinyLLaVA: A Framework of Small-scale Large Multimodal Models | ArXiv | |
Lab | Imperial College London - Zeroshot trajectory | ||
Lab | OpenGVLab | GitHub | |
Lab | Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University - CogVLM | ||
Lab | Rutgers University, AGI Research - OpenAGI | ||
Lab | XLANG NLP Lab - OpenAgents | ||
Lab | OpenBMB - ChatDev, XAgent, AgentVerse | ||
Lab | Reworkd AI - AgentGPT | ||
Lab | DeepWisdom - MetaGPT | ||
Lab | Tencent AI Lab - AppAgent, WebVoyager | ||
LoRA, Scaling | Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements | ArXiv | |
LoRA, Scaling | LoRA: Low-Rank Adaptation of Large Language Models | ||
Low-level-action, Robot | SayTap: Language to Quadrupedal Locomotion | ArXiv | 2023/06/13 |
Low-level-action, Robot | Prompt a Robot to Walk with Large Language Models | ArXiv | 2023/09/18 |
Math, Reasoning | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | ||
Memory, Reinforcement-Learning | Semantic HELM: A Human-Readable Memory for Reinforcement Learning | ||
Memory, Robot | LLM as A Robotic Brain: Unifying Egocentric Memory and Control | ArXiv | 2023/04/19 |
MoE | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | ArXiv | |
Multimodal, Robot | Flamingo: a Visual Language Model for Few-Shot Learning | ArXiv | 2022/04/29 |
Multimodal, Robot | Open-World Object Manipulation using Pre-trained Vision-Language Models | ArXiv | 2023/03/02 |
Multimodal, Robot | MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation | ArXiv | 2023/08/07 |
Natural-Language-as-Polices, Robot | RT-H: Action Hierarchies Using Language | ArXiv | |
Navigation, Reasoning, Vision | NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models | ||
Open-source | Gemma: Introducing new state-of-the-art open models | ArXiv | |
Open-source, Perception | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | ||
Open-source, VLM | OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models | ArXiv | 2023/08/02 |
PPO, RLHF, Reinforcement-Learning | Secrets of RLHF in Large Language Models Part I: PPO | ArXiv | 2024/02/01 |
Package | Alpaca-LoRA | GitHub | |
Package | Dify | GitHub | |
Package | h2oGPT | GitHub | |
Package | LangChain | GitHub | |
Package | LlamaIndex | GitHub | |
Perception | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding | ||
Perception | Simple Open-Vocabulary Object Detection with Vision Transformers | ArXiv | |
Perception | Recognize Anything: A Strong Image Tagging Model | ArXiv | |
Perception | DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection | ||
Perception | Grounded Language-Image Pre-training | ArXiv | 2021/12/07 |
Perception | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | ArXiv | 2023/03/09 |
Perception | PointCLIP: Point Cloud Understanding by CLIP | ArXiv | 2021/12/04 |
Perception | Simple Open-Vocabulary Object Detection with Vision Transformers | ArXiv | 2022/05/12 |
Perception, Reasoning | Lenna: Language Enhanced Reasoning Detection Assistant | ArXiv | |
Perception, Reasoning | DetGPT: Detect What You Need via Reasoning | ArXiv | |
Perception, Reasoning, Robot | Reasoning Grasping via Multimodal Large Language Model | ArXiv | |
Perception, Robot | Language Segment-Anything | ||
Perception, Robot | LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding | ArXiv | 2023/12/21 |
Perception, Task-Decompose | DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment | ArXiv | 2023/07/01 |
Perception, Video, Vision | CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval | ArXiv | |
PersonalCitation, Robot | Text2Motion: From Natural Language Instructions to Feasible Plans | ArXiv | |
Prompting | Contrastive Chain-of-Thought Prompting | ||
Prompting, Survey | A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications | ||
Quantization, Scaling | SliceGPT: Compress Large Language Models by Deleting Rows and Columns | ||
RAG | Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity | ArXiv | |
RAG | RAFT: Adapting Language Model to Domain Specific RAG | ArXiv | |
RAG | RAG-Fusion: a New Take on Retrieval-Augmented Generation | ArXiv | |
RAG | Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | ||
RAG | Training Language Models with Memory Augmentation | ||
RAG, Survey | Retrieval-Augmented Generation for Large Language Models: A Survey | ||
RAG, Survey | Retrieval-Augmented Generation for Large Language | ||
RAG, Survey | Large Language Models for Information Retrieval: A Survey | ||
RAG, Temporal Logics | FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | ArXiv | |
RLHF | Secrets of RLHF in Large Language Models Part II: Reward Modeling | ||
RLHF, Reinforcement-Learning, Survey | A Survey of Reinforcement Learning from Human Feedback | ||
Reasoning | The Impact of Reasoning Step Length on Large Language Models | ||
Reasoning | STaR: Bootstrapping Reasoning With Reasoning | ArXiv | 2022/05/28 |
Reasoning | Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models | ||
Reasoning | Rephrase and Respond(RaR) | ||
Reasoning | Contrastive Chain-of-Thought Prompting | ||
Reasoning | Chain-of-Thought Reasoning Without Prompting | ArXiv | |
Reasoning | Self-Discover: Large Language Models Self-Compose Reasoning Structures | ||
Reasoning | Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning | ArXiv | |
Reasoning | ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs. | ||
Reasoning, Reinforcement-Learning | ReFT: Reasoning with Reinforced Fine-Tuning | ||
Reasoning, Robot | AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation | ||
Reasoning, Survey | Reasoning with Language Model Prompting: A Survey | ArXiv | |
Reasoning, Symbolic | Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning | ||
Reasoning, Table | Large Language Models are few(1)-shot Table Reasoners | ArXiv | |
Reasoning, VLM, VQA | MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action | ArXiv | 2023/03/20 |
Reasoning, Zero-shot | Large Language Models are Zero-Shot Reasoners | ||
Reinforcement-Learning | Large Language Models Are Semi-Parametric Reinforcement Learning Agents | ||
Reinforcement-Learning | RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents | ||
Resource | [Resource] arxiv-sanity | ArXiv | |
Resource | [Resource] AlphaSignal | ArXiv | |
Resource | [Resource] Semanticscholar | ArXiv | |
Resource | [Resource] Connectedpapers | ArXiv | |
Resource | [Resource] dailyarxiv | ArXiv | |
Resource | [Resource] huggingface | ArXiv | |
Resource | [Resource] Paperswithcode | ArXiv | |
RoPE | RoFormer: Enhanced Transformer with Rotary Position Embedding | ArXiv | |
Robot | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | ArXiv | |
Robot | OCI-Robotics: Object-Centric Instruction Augmentation for Robotic Manipulation | ArXiv | |
Robot | PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | ArXiv | |
Robot | Introspective Tips: Large Language Model for In-Context Decision Making | ArXiv | |
Robot | RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation | ||
Robot | Generative Expressive Robot Behaviors using Large Language Models | ||
Robot | OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics | ||
Robot | RoCo: Dialectic Multi-Robot Collaboration with Large Language Models | ArXiv | |
Robot | Interactive Language: Talking to Robots in Real Time | ||
Robot | Reflexion: Language Agents with Verbal Reinforcement Learning | ArXiv | 2023/03/20 |
Robot, Survey | Real-World Robot Applications of Foundation Models: A Review | ||
Robot, Survey | Language-conditioned Learning for Robotic Manipulation: A Survey | ArXiv | 2023/12/17 |
Robot, Survey | Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis | ArXiv | 2023/12/14 |
Robot, Survey | Robot Learning in the Era of Foundation Models: A Survey | ArXiv | 2023/11/24 |
Robot, Task-Decompose | SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning | ArXiv | 2023/07/12 |
Robot, Task-Decompose, Zero-shot | Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | ArXiv | 2022/01/18 |
Robot, Zero-shot | Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning | ||
Robot, Zero-shot | Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting | ||
Robot, Zero-shot | Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots | ||
Robot, Zero-shot | BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning | ArXiv | |
Sora, Text-to-Video | Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | ArXiv | |
Sora, Text-to-Video | Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | ||
Survey | Efficient Large Language Models: A Survey | ArXiv, GitHub | |
Survey, TimeSeries | Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook | ||
Survey, Training | Understanding LLMs: A Comprehensive Overview from Training to Inference | ||
Survey, VLM | MM-LLMs: Recent Advances in MultiModal Large Language Models | ||
Survey, Video | Video Understanding with Large Language Models: A Survey | ||
Temporal | Explorative Inbetweening of Time and Space | ArXiv | |
Tex2Img | Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation | ArXiv | |
Text-to-Image, World-model | World Model on Million-Length Video And Language With RingAttention | ||
VLM | ScreenAI: A Vision-Language Model for UI and Infographics Understanding | ||
VLM | PaLM: Scaling Language Modeling with Pathways | ArXiv | 2022/04/05 |
VLM, VQA | DeepSeek-VL: Towards Real-World Vision-Language Understanding01 | ||
VLM, VQA | CogVLM: Visual Expert for Pretrained Language Models | ArXiv | 2023/11/06 |
VLM, VQA | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | ArXiv | 2023/04/19 |
VLM, World-model | Large World Model | ArXiv | |
ViFM, Video | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | ArXiv, GitHub | |
World-model | Learning to Model the World with Language | ArXiv | |
World-model | Diffusion World Model | ArXiv | |
World-model | Learning to Model the World with Language | ArXiv | |
World-model | Language Models Meet World Models | ArXiv | |
World-model | Learning and Leveraging World Models in Visual Representation Learning | ||
World-model | Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | ||
Zero-shot | Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation? |