diff --git a/docs/docs/cloud/concepts/api.md b/docs/docs/cloud/concepts/api.md
index 249ac9649..3e5cd0742 100644
--- a/docs/docs/cloud/concepts/api.md
+++ b/docs/docs/cloud/concepts/api.md
@@ -1,6 +1,6 @@
 # API Concepts
 
-This page describes the high-level concepts of the LangGraph Cloud API. The conceptual guide of LangGraph (Python library) is [here](../../concepts/index.md).
+This page describes the high-level concepts of the LangGraph Cloud API. The conceptual guide of LangGraph (Python library) is [here](../../concepts/high_level.md).
 
 ## Data Models
 
@@ -22,7 +22,7 @@ A thread contains the accumulated state of a group of runs. If a run is executed
 
 The state of a thread at a particular point in time is called a checkpoint.
 
-For more on threads and checkpoints, see this section of the [LangGraph conceptual guide](../../concepts/low_level.md#checkpointer).
+For more on threads and checkpoints, see this section of the [LangGraph conceptual guide](../../concepts/low_level.md#persistence).
 
 The LangGraph Cloud API provides several endpoints for creating and managing threads and thread state. See the [API reference](../reference/api/api_ref.html#tag/threadscreate) for more details.
 
diff --git a/docs/docs/cloud/quick_start.md b/docs/docs/cloud/quick_start.md
index 755ce0df4..08e42ac9c 100644
--- a/docs/docs/cloud/quick_start.md
+++ b/docs/docs/cloud/quick_start.md
@@ -30,7 +30,7 @@ This tutorial will use:
         |-- langgraph.json      # configuration file for LangGraph
         |-- .env                # environment files with API keys
 
-2.  The `agent.py`/`agent.ts` file should contain code for defining your graph. The following code is a simple example, the important thing is that at some point in your file you compile your graph and assign the compiled graph to a variable (in this case the `graph` variable). This example code uses `create_react_agent`, a prebuilt agent. You can read more about it [here](../concepts/agentic_concepts.md#react-agent).
+2.  The `agent.py`/`agent.ts` file should contain code for defining your graph. The following code is a simple example, the important thing is that at some point in your file you compile your graph and assign the compiled graph to a variable (in this case the `graph` variable). This example code uses `create_react_agent`, a prebuilt agent. You can read more about it [here](../concepts/agentic_concepts.md#react-implementation).
 
 === "Python"
 
diff --git a/docs/docs/concepts/agentic_concepts.md b/docs/docs/concepts/agentic_concepts.md
index bd3c3c1f6..e03f347eb 100644
--- a/docs/docs/concepts/agentic_concepts.md
+++ b/docs/docs/concepts/agentic_concepts.md
@@ -1,118 +1,126 @@
-# Common Agentic Patterns
+# Agent architectures
 
-## Structured Output
+Many LLM applications implement a particular control flow of steps before and / or after LLM calls. As an example, [RAG](https://github.com/langchain-ai/rag-from-scratch) performs retrieval of relevant documents to a question, and passes those documents to an LLM in order to ground the model's response. 
 
-It's pretty common to want LLMs inside nodes to return structured output when building agents. This is because that structured output can often be used to route to the next step (e.g. choose between two different edges) or update specific keys of the state.
+Instead of hard-coding a fixed control flow, we sometimes want LLM systems that can pick its own control flow to solve more complex problems! This is one definition of an [agent](https://blog.langchain.dev/what-is-an-agent/): *an agent is a system that uses an LLM to decide the control flow of an application.* There are many ways that an LLM can control application:
 
-Since LangGraph nodes can be arbitrary Python functions, you can do this however you want. If you want to use LangChain, [this how-to guide](https://python.langchain.com/v0.2/docs/how_to/structured_output/) is a starting point.
+- An LLM can route between two potential paths
+- An LLM can decide which of many tools to call
+- An LLM can decide whether the generated answer is sufficient or more work is needed
 
-## Tool calling
+As a result, there are many different types of [agent architectures](https://blog.langchain.dev/what-is-a-cognitive-architecture/), which given an LLM varying levels of control. 
 
-It's extremely common to want agents to do tool calling. Tool calling refers to choosing from several available tools, and specifying which ones to call and what the inputs should be. This is extremely common in agents, as you often want to let the LLM decide which tools to call and then call those tools.
+![Agent Types](img/agent_types.png)
 
-Since LangGraph nodes can be arbitrary Python functions, you can do this however you want. If you want to use LangChain, [this how-to guide](https://python.langchain.com/v0.2/docs/how_to/tool_calling/) is a starting point.
+## Router
 
-## Memory
+A router allows an LLM to select a single step from a specified set of options. This is an agent architecture that exhibits a relatively limited level of control because the LLM usually governs a single decision and can return a narrow set of outputs. Routers typically employ a few different concepts to achieve this.
 
-Memory is a key concept to agentic applications. Memory is important because end users often expect the application they are interacting with remember previous interactions. The most simple example of this is chatbots - they clearly need to remember previous messages in a conversation.
+### Structured Output
 
-LangGraph is perfectly suited to give you full control over the memory of your application. With user defined [`State`](./low_level.md#state) you can specify the exact schema of the memory you want to retain. With [checkpointers](./low_level.md#checkpointer) you can store checkpoints of previous interactions and resume from there in follow up interactions.
+Structured outputs with LLMs work by providing a specific format or schema that the LLM should follow in its response. This is similar to tool calling, but more general. While tool calling typically involves selecting and using predefined functions, structured outputs can be used for any type of formatted response. Common methods to achieve structured outputs include:
 
-See [this guide](../how-tos/persistence.ipynb) for how to add memory to your graph.
+1. Prompt engineering: Instructing the LLM to respond in a specific format.
+2. Output parsers: Using post-processing to extract structured data from LLM responses.
+3. Tool calling: Leveraging built-in tool calling capabilities of some LLMs to generate structured outputs.
 
-## Human-in-the-loop
+Structured outputs are crucial for routing as they ensure the LLM's decision can be reliably interpreted and acted upon by the system. Learn more about [structured outputs in this how-to guide](https://python.langchain.com/v0.2/docs/how_to/structured_output/).
 
-Agentic systems often require some human-in-the-loop (or "on-the-loop") interaction patterns. This is because agentic systems are still not super reliable, so having a human involved is required for any sensitive tasks/actions. These are all easily enabled in LangGraph, largely due to [checkpointers](./low_level.md#checkpointer). The reason a checkpointer is necessary is that a lot of these interaction patterns involve running a graph up until a certain point, waiting for some sort of human feedback, and then continuing. When you want to "continue" you will need to access the state of the graph previous to getting interrupted, and checkpointers are a built in, highly convenient way to do that.
+## Tool calling agent
 
-There are a few common human-in-the-loop interaction patterns we see emerging.
+While a router allows an LLM to make a single decision, more complex agent architectures expand the LLM's control in two key ways:
 
-### Approval
+1. Multi-step decision making: The LLM can control a sequence of decisions rather than just one.
+2. Tool access: The LLM can choose from and use a variety of tools to accomplish tasks.
 
-A basic one is to have the agent wait for approval before executing certain tools. This may be all tools, or just a subset of tools. This is generally recommend for more sensitive actions (like writing to a database). This can easily be done in LangGraph by setting a [breakpoint](./low_level.md#breakpoints) before specific nodes.
+[ReAct](https://arxiv.org/abs/2210.03629) is a popular general purpose agent architecture that combines these expansions, integrating three core concepts. 
 
-See [this guide](../how-tos/human_in_the_loop/breakpoints.ipynb) for how do this in LangGraph.
+1. `Tool calling`: Allowing the LLM to select and use various tools as needed.
+2. `Memory`: Enabling the agent to retain and use information from previous steps.
+3. `Planning`: Empowering the LLM to create and follow multi-step plans to achieve goals.
 
-### Wait for input
+This architecture allows for more complex and flexible agent behaviors, going beyond simple routing to enable dynamic problem-solving across multiple steps. You can use it with [`create_react_agent`](../reference/prebuilt.md#create_react_agent).
 
-A similar one is to have the agent wait for human input. This can be done by:
+### Tool calling
 
-1. Create a node specifically for human input
-2. Add a breakpoint before the node
-3. Get user input
-4. Update the state with that user input, acting as that node
-5. Resume execution
+Tools are useful whenever you want an agent to interact with external systems. External systems (e.g., APIs) often require a particular input schema or payload, rather than natural language. When we bind an API, for example, as a tool we given the model awareness of the required input schema. The model will choose to call a tool based upon the natural language input from the user and  it will return an output that adheres to the tool's schema. 
 
-See [this guide](../how-tos/human_in_the_loop/wait-user-input.ipynb) for how do this in LangGraph.
+[Many LLM providers support tool calling](https://python.langchain.com/v0.1/docs/integrations/chat/) and [tool calling interface](https://blog.langchain.dev/improving-core-tool-interfaces-and-docs-in-langchain/) in LangChain is simple: you can simply pass any Python `function` into `ChatModel.bind_tools(function)`.
 
-### Edit agent actions
+![Tools](img/tool_call.png)
 
-This is a more advanced interaction pattern. In this interaction pattern the human can actually edit some of the agent's previous decisions. This can be done either during the flow (after a [breakpoint](./low_level.md#breakpoints), part of the [approval](#approval) flow) or after the fact (as part of [time-travel](#time-travel))
+### Memory
 
-See [this guide](../how-tos/human_in_the_loop/edit-graph-state.ipynb) for how do this in LangGraph.
+Memory is crucial for agents, enabling them to retain and utilize information across multiple steps of problem-solving. It operates on different scales:
 
-### Time travel
+1. Short-term memory: Allows the agent to access information acquired during earlier steps in a sequence.
+2. Long-term memory: Enables the agent to recall information from previous interactions, such as past messages in a conversation.
 
-This is a pretty advanced interaction pattern. In this interaction pattern, the human can look back at the list of previous checkpoints, find one they like, optionally [edit it](#edit-agent-actions), and then resume execution from there.
+LangGraph provides full control over memory implementation:
 
-See [this guide](../how-tos/human_in_the_loop/time-travel.ipynb) for how to do this in LangGraph.
+- [`State`](./low_level.md#state): User-defined schema specifying the exact structure of memory to retain.
+- [`Checkpointers`](./persistence.md): Mechanism to store state at every step across different interactions.
 
-## Review Tool Calls
+This flexible approach allows you to tailor the memory system to your specific agent architecture needs. For a practical guide on adding memory to your graph, see [this tutorial](../how-tos/persistence.ipynb).
 
-This is a specific type of human-in-the-loop interaction but it's worth calling out because it is so common. A lot of agent decisions are made via tool calling, so having a clear UX for reviewing tool calls is handy.
+Effective memory management enhances an agent's ability to maintain context, learn from past experiences, and make more informed decisions over time.
 
-A tool call consists of:
-- The name of the tool to call
-- Arguments to pass to the tool
+### Planning
 
-Note that these tool calls can obviously be used for actually calling functions, but they can also be used for other purposes, like to route the agent in a specific direction.
-You will want to review the tool call for both of these use cases.
+In the ReAct architecture, an LLM is called repeatedly in a while-loop. At each step the agent decides which tools to call, and what the inputs to those tools should be. Those tools are then executed, and the outputs are fed back into the LLM as observations. The while-loop terminates when the agent decides it is not worth calling any more tools.
 
-When reviewing tool calls, there are few actions you may want to take.
+### ReAct implementation 
 
-1. Approve the tool call (and let the agent continue on its way)
-2. Manually change the tool call, either the tool name or the tool arguments (and let the agent continue on its way after that)
-3. Leave feedback on the tool call. This differs from (2) in that you are not changing the tool call directly, but rather leaving natural language feedback suggesting the LLM call it differently (or call a different tool). You could do this by either adding a `ToolMessage` and having the feedback be the result of the tool call, or by adding a `ToolMessage` (that simulates an error) and then a `HumanMessage` (with the feedback).
+There are several differences between this paper and the pre-built [`create_react_agent`](../reference/prebuilt.md#create_react_agent) implementation:
 
-See [this guide](../how-tos/human_in_the_loop/review-tool-calls.ipynb) for how to do this in LangGraph.
+- First, we use [tool-calling](#tool-calling) to have LLMs call tools, whereas the paper used prompting + parsing of raw output. This is because tool calling did not exist when the paper was written, but is generally better and more reliable.
+- Second, we use messages to prompt the LLM, whereas the paper used string formatting. This is because at the time of writing, LLMs didn't even expose a message-based interface, whereas now that's the only interface they expose.
+- Third, the paper required all inputs to the tools to be a single string. This was largely due to LLMs not being super capable at the time, and only really being able to generate a single input. Our implementation allows for using tools that require multiple inputs.
+- Fourth, the paper only looks at calling a single tool at the time, largely due to limitations in LLMs performance at the time. Our implementation allows for calling multiple tools at a time.
+- Finally, the paper asked the LLM to explicitly generate a "Thought" step before deciding which tools to call. This is the "Reasoning" part of "ReAct". Our implementation does not do this by default, largely because LLMs have gotten much better and that is not as necessary. Of course, if you wish to prompt it do so, you certainly can.
 
-## Map-Reduce
+## Custom agent architectures
 
-A common pattern in agents is to generate a list of objects, do some work on each of those objects, and then combine the results. This is very similar to the common [map-reduce](https://en.wikipedia.org/wiki/MapReduce) operation. This can be tricky for a few reasons. First, it can be tough to define a structured graph ahead of time because the length of the list of objects may be unknown. Second, in order to do this map-reduce you need multiple versions of the state to exist... but the graph shares a common shared state, so how can this be?
+While routers and tool-calling agents (like ReAct) are common, [customizing agent architectures](https://blog.langchain.dev/why-you-should-outsource-your-agentic-infrastructure-but-own-your-cognitive-architecture/) often leads to better performance for specific tasks. LangGraph offers several powerful features for building tailored agent systems:
 
-LangGraph supports this via the [Send](./low_level.md#send) api. This can be used to allow a conditional edge to Send multiple different states to multiple nodes. The state it sends can be different from the state of the core graph.
+### Human-in-the-loop
 
-See a how-to guide for this [here](../how-tos/map-reduce.ipynb)
+Human involvement can significantly enhance agent reliability, especially for sensitive tasks. This can involve:
 
-## Multi-agent
+- Approving specific actions
+- Providing feedback to update the agent's state
+- Offering guidance in complex decision-making processes
 
-A term you may have heard is "multi-agent" architectures. What exactly does this mean?
+Human-in-the-loop patterns are crucial when full automation isn't feasible or desirable. Learn more in our [human-in-the-loop guide](./human_in_the_loop.md).
 
-Given that it is hard to even define an "agent", it's almost impossible to exactly define a "multi-agent" architecture. When most people talk about a multi-agent architecture, they typically mean a system where there are multiple different LLM-based systems. These LLM-based systems can be as simple as a prompt and an LLM call, or as complex as a [ReAct agent](#react-agent).
+### Parallelization 
 
-The big question in multi-agent systems is how they communicate. This involves both the schema of how they communicate, as well as the sequence in which they communicate. LangGraph is perfect for orchestrating these types of systems. It allows you to define multiple agents (each one is a node) an arbitrary state (to encapsulate the schema of how they communicate) as well as the edges (to control the sequence in which they communicate).
+Parallel processing is vital for efficient multi-agent systems and complex tasks. LangGraph supports parallelization through its [Send](./low_level.md#send) API, enabling:
 
-## Planning
+- Concurrent processing of multiple states
+- Implementation of map-reduce-like operations
+- Efficient handling of independent subtasks
 
-One of the big things that agentic systems struggle with is long term planning. A common technique to overcome this is to have an explicit planning this. This generally involves calling an LLM to come up with a series of steps to execute. From there, the system then tries to execute the series of tasks (this could use a sub-agent to do so). Optionally, you can revisit the plan after each step and update it if needed.
+For practical implementation, see our [map-reduce tutorial](../how-tos/map-reduce.ipynb).
 
-## Reflection
+### Sub-graphs
 
-Agents often struggle to produce reliable results. Therefore, it can be helpful to check whether the agent has completed a task correctly or not. If it has - then you can finish. If it hasn't - then you can take the feedback on why it's not correct and pass it back into another iteration of the agent.
+Sub-graphs are essential for managing complex agent architectures, particularly in multi-agent systems. They allow:
 
-This "reflection" step often uses an LLM, but doesn't have to. A good example of where using an LLM may not be necessary is in coding, when you can try to compile the generated code and use any errors as the feedback.
+- Isolated state management for individual agents
+- Hierarchical organization of agent teams
+- Controlled communication between agents and the main system
 
-## ReAct Agent
+Sub-graphs communicate with the parent graph through overlapping keys in the state schema. This enables flexible, modular agent design. For implementation details, refer to our [sub-graph tutorial](../how-tos/subgraph.ipynb).
 
-One of the most common agent architectures is what is commonly called the ReAct agent architecture. In this architecture, an LLM is called repeatedly in a while-loop. At each step the agent decides which tools to call, and what the inputs to those tools should be. Those tools are then executed, and the outputs are fed back into the LLM as observations. The while-loop terminates when the agent decides it is not worth calling any more tools.
+### Reflection
 
-One of the few high level, pre-built agents we have in LangGraph - you can use it with [`create_react_agent`](../reference/prebuilt.md#create_react_agent)
+Reflection mechanisms can significantly improve agent reliability by:
 
-This is named after and based on the [ReAct](https://arxiv.org/abs/2210.03629) paper. However, there are several differences between this paper and our implementation:
+1. Evaluating task completion and correctness
+2. Providing feedback for iterative improvement
+3. Enabling self-correction and learning
 
-- First, we use [tool-calling](#tool-calling) to have LLMs call tools, whereas the paper used prompting + parsing of raw output. This is because tool calling did not exist when the paper was written, but is generally better and more reliable.
-- Second, we use messages to prompt the LLM, whereas the paper used string formatting. This is because at the time of writing, LLMs didn't even expose a message-based interface, whereas now that's the only interface they expose.
-- Third, the paper required all inputs to the tools to be a single string. This was largely due to LLMs not being super capable at the time, and only really being able to generate a single input. Our implementation allows for using tools that require multiple inputs.
-- Forth, the paper only looks at calling a single tool at the time, largely due to limitations in LLMs performance at the time. Our implementation allows for calling multiple tools at a time.
-- Finally, the paper asked the LLM to explicitly generate a "Thought" step before deciding which tools to call. This is the "Reasoning" part of "ReAct". Our implementation does not do this by default, largely because LLMs have gotten much better and that is not as necessary. Of course, if you wish to prompt it do so, you certainly can.
+While often LLM-based, reflection can also use deterministic methods. For instance, in coding tasks, compilation errors can serve as feedback. This approach is demonstrated in [this video using LangGraph for self-corrective code generation](https://www.youtube.com/watch?v=MvNdgmM7uyc).
 
-See [this guide](../how-tos/human_in_the_loop/time-travel.ipynb) for a full walkthrough of how to use the prebuilt ReAct agent.
+By leveraging these features, LangGraph enables the creation of sophisticated, task-specific agent architectures that can handle complex workflows, collaborate effectively, and continuously improve their performance.
diff --git a/docs/docs/concepts/high_level.md b/docs/docs/concepts/high_level.md
index 69d76f78f..7146fed9c 100644
--- a/docs/docs/concepts/high_level.md
+++ b/docs/docs/concepts/high_level.md
@@ -1,55 +1,58 @@
-# LangGraph for Agentic Applications
+# Why LangGraph?
 
-## What does it mean to be agentic?
+LLMs are extremely powerful, particularly when connected to other systems such as a retriever or APIs. This is why many LLM applications use a control flow of steps before and / or after LLM calls. As an example [RAG](https://github.com/langchain-ai/rag-from-scratch) performs retrieval of relevant documents to a question, and passes those documents to an LLM in order to ground the response. Often a control flow of steps before and / or after an LLM is called a "chain." Chains are a popular paradigm for programming with LLMs and offer a high degree of reliability; the same set of steps runs with each chain invocation.
 
-Other people may talk about a system being an "agent" - we prefer to talk about systems being "agentic". But what does this actually mean?
-
-When we talk about systems being "agentic", we are talking about systems that use an LLM to decide the control flow of an application. There are different levels that an LLM can be used to decide the control flow, and this spectrum of "agentic" makes more sense to us than defining an arbitrary cutoff for what is or isn't an agent.
-
-Examples of using an LLM to decide the control of an application:
+However, we often want LLM systems that can pick their own control flow! This is one definition of an [agent](https://blog.langchain.dev/what-is-an-agent/): an agent is a system that uses an LLM to decide the control flow of an application. Unlike a chain, an agent given an LLM some degree of control over the sequence of steps in the application. Examples of using an LLM to decide the control of an application:
 
 - Using an LLM to route between two potential paths
 - Using an LLM to decide which of many tools to call
 - Using an LLM to decide whether the generated answer is sufficient or more work is need
 
-The more times these types of decisions are made inside an application, the more agentic it is.
-If these decisions are being made in a loop, then its even more agentic!
+There are many different types of [agent architectures](https://blog.langchain.dev/what-is-a-cognitive-architecture/) to consider, which given an LLM varying levels of control. On one extreme, a router allows an LLM to select a single step from a specified set of options and, on the other extreme, a fully autonomous long-running agent may have complete freedom to select any sequence of steps that it wants for a given problem. 
+
+![Agent Types](img/agent_types.png)
 
-There are other concepts often associated with being agentic, but we would argue these are a by-product of the above definition:
+Several concepts are utilized in many agent architectures:
 
 - [Tool calling](agentic_concepts.md#tool-calling): this is often how LLMs make decisions
 - Action taking: often times, the LLMs' outputs are used as the input to an action
 - [Memory](agentic_concepts.md#memory): reliable systems need to have knowledge of things that occurred
 - [Planning](agentic_concepts.md#planning): planning steps (either explicit or implicit) are useful for ensuring that the LLM, when making decisions, makes them in the highest fidelity way.
 
-## Why LangGraph?
+## Challenges
+
+In practice, there is often a trade-off between control and reliability. As we give LLMs more control, the application often become less reliable. This can be due to factors such as LLM non-determinism and / or errors in selecting tools (or steps) that the agent uses (takes).
+
+![Agent Challenge](img/challenge.png)
 
-LangGraph has several core principles that we believe make it the most suitable framework for building agentic applications:
+## Core Principles
 
-- [Controllability](../how-tos/index.md#controllability)
-- [Human-in-the-Loop](../how-tos/index.md#human-in-the-loop)
-- [Streaming First](../how-tos/index.md#streaming)
+The motivation of LangGraph is to help bend the curve, preserving higher reliability as we give the agent more control over the application. We'll outline a few specific pillars of LangGraph that make it well suited for building reliable agents. 
+
+![Langgraph](img/langgraph.png)
 
 **Controllability**
 
-LangGraph is extremely low level. This gives you a high degree of control over what the system you are building actually does. We believe this is important because it is still hard to get agentic systems to work reliably, and we've seen that the more control you exercise over them, the more likely it is that they will "work".
+LangGraph gives the developer a high degree of [control](../how-tos/index.md#controllability) by expressing the flow of the application as a set of nodes and edges. All nodes can access and modify a common state (memory). The control flow of the application can set using edges that connect nodes, either deterministically or via conditional logic. 
+
+**Persistence**
+
+LangGraph gives the developer many options for [persisting](../how-tos/index.md#persistence) graph state using short-term or long-term (e.g., via a database) memory. 
 
 **Human-in-the-Loop**
 
-LangGraph comes with a built-in persistence layer as a first-class concept. This enables several different human-in-the-loop interaction patterns. We believe that "Human-Agent Interaction" patterns will be the new "Human-Computer Interaction", and have built LangGraph with built in persistence to enable this.
+The persistence layer enables several different [human-in-the-loop](../how-tos/index.md#human-in-the-loop) interaction patterns with agents; for example, it's possible to pause an agent, review its state, edit it state, and approve a follow-up step. 
 
-**Streaming First**
+**Streaming**
 
-LangGraph comes with first class support for streaming. Agentic applications often take a while to run, and so giving the user some idea of what is happening is important, and streaming is a great way to do that. LangGraph supports streaming of both events ([like a tool call being taken](../how-tos/stream-updates.ipynb)) as well as of [tokens that an LLM may emit](../how-tos/streaming-tokens.ipynb).
+LangGraph comes with first class support for [streaming](../how-tos/index.md#streaming), which can expose state to the user (or developer) over the course of agent execution. LangGraph supports streaming of both events ([like a tool call being taken](../how-tos/stream-updates.ipynb)) as well as of [tokens that an LLM may emit](../how-tos/streaming-tokens.ipynb).
 
-## Deployment
+## Debugging
 
-So you've built your LangGraph object - now what?
+Once you've built a graph, you often want to test and debug it. [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio?tab=readme-ov-file) is a specialized IDE for visualization and debugging of LangGraph applications.
 
-Now you need to deploy it. 
-There are many ways to deploy LangGraph objects, and the right solution depends on your needs and use case.
-We'll highlight two ways here: using [LangGraph Cloud](../cloud/index.md) or rolling your own solution.
+![Langgraph Studio](img/lg_studio.png)
 
-[LangGraph Cloud](../cloud/index.md) is an opinionated way to deploy LangGraph objects from the LangChain team. Please see the [LangGraph Cloud documentation](../cloud/index.md) for all the details about what it involves, to see if it is a good fit for you.
+## Deployment
 
-If it is not a good fit, you may want to roll your own deployment. In this case, we would recommend using [FastAPI](https://fastapi.tiangolo.com/) to stand up a server. You can then call this graph from inside the FastAPI server as you see fit.
\ No newline at end of file
+Once you have confidence in your LangGraph application, many developers want an easy path to deployment. [LangGraph Cloud](../cloud/index.md) is an opinionated, simple way to deploy LangGraph objects from the LangChain team. Of course, you can also use services like [FastAPI](https://fastapi.tiangolo.com/) and call your graph from inside the FastAPI server as you see fit.
\ No newline at end of file
diff --git a/docs/docs/concepts/human_in_the_loop.md b/docs/docs/concepts/human_in_the_loop.md
new file mode 100644
index 000000000..932ab35e9
--- /dev/null
+++ b/docs/docs/concepts/human_in_the_loop.md
@@ -0,0 +1,63 @@
+# Human-in-the-loop
+
+Agentic systems often require some human-in-the-loop (or "on-the-loop") interaction patterns. This is because agentic systems are still not very reliable, so having a human involved is required for any sensitive tasks/actions. These are all easily enabled in LangGraph, largely due to built-in [persistence](./persistence.md), implemented via checkpointers.
+
+The reason a checkpointer is necessary is that a lot of these interaction patterns involve running a graph up until a certain point, waiting for some sort of human feedback, and then continuing. When you want to "continue" you will need to access the state of the graph prior to the interrupt. LangGraph persistence enables this by checkpointing the state at every superstep.
+
+There are a few common human-in-the-loop interaction patterns we see emerging.
+
+## Approval
+
+![](./img/human_in_the_loop/approval.png)
+
+A basic pattern is to have the agent wait for approval before executing certain tools. This may be all tools, or just a subset of tools. This is generally recommend for more sensitive actions (like writing to a database). This can easily be done in LangGraph by setting a [breakpoint](./low_level.md#breakpoints) before specific nodes.
+
+See [this guide](../how-tos/human_in_the_loop/breakpoints.ipynb) for how do this in LangGraph.
+
+## Wait for input
+
+![](./img/human_in_the_loop/wait_for_input.png)
+
+A similar one is to have the agent wait for human input. This can be done by:
+
+1. Create a node specifically for human input
+2. Add a breakpoint before the node
+3. Get user input
+4. Update the state with that user input, acting as that node
+5. Resume execution
+
+See [this guide](../how-tos/human_in_the_loop/wait-user-input.ipynb) for how do this in LangGraph.
+
+## Edit agent actions
+
+![](./img/human_in_the_loop/edit_graph_state.png)
+
+This is a more advanced interaction pattern. In this interaction pattern the human can actually edit some of the agent's previous decisions. This can be done either during the flow (after a [breakpoint](./low_level.md#breakpoints), part of the [approval](#approval) flow) or after the fact (as part of [time-travel](#time-travel))
+
+See [this guide](../how-tos/human_in_the_loop/edit-graph-state.ipynb) for how do this in LangGraph.
+
+## Time travel
+
+This is a pretty advanced interaction pattern. In this interaction pattern, the human can look back at the list of previous checkpoints, find one they like, optionally [edit it](#edit-agent-actions), and then resume execution from there.
+
+See [this guide](../how-tos/human_in_the_loop/time-travel.ipynb) for how to do this in LangGraph.
+
+## Review Tool Calls
+
+This is a specific type of human-in-the-loop interaction but it's worth calling out because it is so common. A lot of agent decisions are made via tool calling, so having a clear UX for reviewing tool calls is handy.
+
+A tool call consists of:
+
+- The name of the tool to call
+- Arguments to pass to the tool
+
+Note that these tool calls can obviously be used for actually calling functions, but they can also be used for other purposes, like to route the agent in a specific direction.
+You will want to review the tool call for both of these use cases.
+
+When reviewing tool calls, there are few actions you may want to take.
+
+1. Approve the tool call (and let the agent continue on its way)
+2. Manually change the tool call, either the tool name or the tool arguments (and let the agent continue on its way after that)
+3. Leave feedback on the tool call. This differs from (2) in that you are not changing the tool call directly, but rather leaving natural language feedback suggesting the LLM call it differently (or call a different tool). You could do this by either adding a `ToolMessage` and having the feedback be the result of the tool call, or by adding a `ToolMessage` (that simulates an error) and then a `HumanMessage` (with the feedback).
+
+See [this guide](../how-tos/human_in_the_loop/review-tool-calls.ipynb) for how to do this in LangGraph.
\ No newline at end of file
diff --git a/docs/docs/concepts/img/agent_types.png b/docs/docs/concepts/img/agent_types.png
new file mode 100644
index 000000000..3cefe0334
Binary files /dev/null and b/docs/docs/concepts/img/agent_types.png differ
diff --git a/docs/docs/concepts/img/challenge.png b/docs/docs/concepts/img/challenge.png
new file mode 100644
index 000000000..f131c2d03
Binary files /dev/null and b/docs/docs/concepts/img/challenge.png differ
diff --git a/docs/docs/concepts/img/human_in_the_loop/approval.png b/docs/docs/concepts/img/human_in_the_loop/approval.png
new file mode 100644
index 000000000..6c94d31fd
Binary files /dev/null and b/docs/docs/concepts/img/human_in_the_loop/approval.png differ
diff --git a/docs/docs/concepts/img/human_in_the_loop/edit_graph_state.png b/docs/docs/concepts/img/human_in_the_loop/edit_graph_state.png
new file mode 100644
index 000000000..1d4cf4d1a
Binary files /dev/null and b/docs/docs/concepts/img/human_in_the_loop/edit_graph_state.png differ
diff --git a/docs/docs/concepts/img/human_in_the_loop/wait_for_input.png b/docs/docs/concepts/img/human_in_the_loop/wait_for_input.png
new file mode 100644
index 000000000..94b211c95
Binary files /dev/null and b/docs/docs/concepts/img/human_in_the_loop/wait_for_input.png differ
diff --git a/docs/docs/concepts/img/langgraph.png b/docs/docs/concepts/img/langgraph.png
new file mode 100644
index 000000000..a821f8a13
Binary files /dev/null and b/docs/docs/concepts/img/langgraph.png differ
diff --git a/docs/docs/concepts/img/lg_studio.png b/docs/docs/concepts/img/lg_studio.png
new file mode 100644
index 000000000..5d41830fc
Binary files /dev/null and b/docs/docs/concepts/img/lg_studio.png differ
diff --git a/docs/docs/concepts/img/multi_agent/collaboration.png b/docs/docs/concepts/img/multi_agent/collaboration.png
new file mode 100644
index 000000000..3da47ba9b
Binary files /dev/null and b/docs/docs/concepts/img/multi_agent/collaboration.png differ
diff --git a/docs/docs/concepts/img/multi_agent/hierarchical.png b/docs/docs/concepts/img/multi_agent/hierarchical.png
new file mode 100644
index 000000000..3cee3b663
Binary files /dev/null and b/docs/docs/concepts/img/multi_agent/hierarchical.png differ
diff --git a/docs/docs/concepts/img/multi_agent/subgraph.png b/docs/docs/concepts/img/multi_agent/subgraph.png
new file mode 100644
index 000000000..29401933c
Binary files /dev/null and b/docs/docs/concepts/img/multi_agent/subgraph.png differ
diff --git a/docs/docs/concepts/img/multi_agent/supervisor.png b/docs/docs/concepts/img/multi_agent/supervisor.png
new file mode 100644
index 000000000..898f753b0
Binary files /dev/null and b/docs/docs/concepts/img/multi_agent/supervisor.png differ
diff --git a/docs/docs/concepts/img/persistence/checkpoints.jpg b/docs/docs/concepts/img/persistence/checkpoints.jpg
new file mode 100644
index 000000000..61ccf5549
Binary files /dev/null and b/docs/docs/concepts/img/persistence/checkpoints.jpg differ
diff --git a/docs/docs/concepts/img/persistence/checkpoints_full_story.jpg b/docs/docs/concepts/img/persistence/checkpoints_full_story.jpg
new file mode 100644
index 000000000..691d17c18
Binary files /dev/null and b/docs/docs/concepts/img/persistence/checkpoints_full_story.jpg differ
diff --git a/docs/docs/concepts/img/persistence/get_state.jpg b/docs/docs/concepts/img/persistence/get_state.jpg
new file mode 100644
index 000000000..7355a1429
Binary files /dev/null and b/docs/docs/concepts/img/persistence/get_state.jpg differ
diff --git a/docs/docs/concepts/img/persistence/re_play.jpg b/docs/docs/concepts/img/persistence/re_play.jpg
new file mode 100644
index 000000000..a4927ac76
Binary files /dev/null and b/docs/docs/concepts/img/persistence/re_play.jpg differ
diff --git a/docs/docs/concepts/img/tool_call.png b/docs/docs/concepts/img/tool_call.png
new file mode 100644
index 000000000..9301ee607
Binary files /dev/null and b/docs/docs/concepts/img/tool_call.png differ
diff --git a/docs/docs/concepts/index.md b/docs/docs/concepts/index.md
deleted file mode 100644
index ea6140c8d..000000000
--- a/docs/docs/concepts/index.md
+++ /dev/null
@@ -1,57 +0,0 @@
-# Conceptual Guides
-
-In this guide we will explore the concepts behind build agentic and multi-agent systems with LangGraph. We assume you have already learned the basic covered in the [introduction tutorial](../tutorials/introduction.ipynb) and want to deepen your understanding of LangGraph's underlying design and inner workings.
-
-There are three main parts to this concept guide. First, we'll discuss at a very high level what it means to be agentic. Next, we'll look at lower-level concepts in LangGraph that are core for understanding how to build your own agentic systems. Finally, we'll discuss common agentic patterns and how you can achieve those with LangGraph. These will be mostly conceptual guides - for more technical, hands-on guides see our [how-to guides](../how-tos/index.md)
-
-
-LangGraph for Agentic Applications
-
-- [What does it mean to be agentic?](high_level.md#what-does-it-mean-to-be-agentic)
-- [Why LangGraph](high_level.md#why-langgraph)
-- [Deployment](high_level.md#deployment)
-
-Low Level Concepts
-
-- [Graphs](low_level.md#graphs)
-    - [StateGraph](low_level.md#stategraph)
-    - [MessageGraph](low_level.md#messagegraph)
-    - [Compiling Your Graph](low_level.md#compiling-your-graph)
-- [State](low_level.md#state)
-    - [Schema](low_level.md#schema)
-    - [Reducers](low_level.md#reducers)
-    - [MessageState](low_level.md#working-with-messages-in-graph-state)
-- [Nodes](low_level.md#nodes)
-    - [`START` node](low_level.md#start-node)
-    - [`END` node](low_level.md#end-node)
-- [Edges](low_level.md#edges)
-    - [Normal Edges](low_level.md#normal-edges)
-    - [Conditional Edges](low_level.md#conditional-edges)
-    - [Entry Point](low_level.md#entry-point)
-    - [Conditional Entry Point](low_level.md#conditional-entry-point)
-- [Send](low_level.md#send)
-- [Checkpointer](low_level.md#checkpointer)
-- [Threads](low_level.md#threads)
-- [Checkpointer states](low_level.md#checkpointer-state)
-    - [Get state](low_level.md#get-state)
-    - [Get state history](low_level.md#get-state-history)
-    - [Update state](low_level.md#update-state)
-- [Configuration](low_level.md#configuration)
-- [Visualization](low_level.md#visualization)
-- [Streaming](low_level.md#streaming)
-
-Common Agentic Patterns
-
-- [Structured output](agentic_concepts.md#structured-output)
-- [Tool calling](agentic_concepts.md#tool-calling)
-- [Memory](agentic_concepts.md#memory)
-- [Human in the loop](agentic_concepts.md#human-in-the-loop)
-    - [Approval](agentic_concepts.md#approval)
-    - [Wait for input](agentic_concepts.md#wait-for-input)
-    - [Edit agent actions](agentic_concepts.md#edit-agent-actions)
-    - [Time travel](agentic_concepts.md#time-travel)
-- [Map-Reduce](agentic_concepts.md#map-reduce)
-- [Multi-agent](agentic_concepts.md#multi-agent)
-- [Planning](agentic_concepts.md#planning)
-- [Reflection](agentic_concepts.md#reflection)
-- [Off-the-shelf ReAct Agent](agentic_concepts.md#react-agent)
diff --git a/docs/docs/concepts/low_level.md b/docs/docs/concepts/low_level.md
index 7dcb58c8b..47043bcdc 100644
--- a/docs/docs/concepts/low_level.md
+++ b/docs/docs/concepts/low_level.md
@@ -1,4 +1,4 @@
-# Low Level Conceptual Guide
+# LangGraph Glossary
 
 ## Graphs
 
@@ -30,7 +30,7 @@ The `MessageGraph` class is a special type of graph. The `State` of a `MessageGr
 
 To build your graph, you first define the [state](#state), you then add [nodes](#nodes) and [edges](#edges), and then you compile it. What exactly is compiling your graph and why is it needed?
 
-Compiling is a pretty simple step. It provides a few basic checks on the structure of your graph (no orphaned nodes, etc). It is also where you can specify runtime args like [checkpointers](#checkpointer) and [breakpoints](#breakpoints). You compile your graph by just calling the `.compile` method:
+Compiling is a pretty simple step. It provides a few basic checks on the structure of your graph (no orphaned nodes, etc). It is also where you can specify runtime args like [checkpointers](./persistence.md) and [breakpoints](#breakpoints). You compile your graph by just calling the `.compile` method:
 
 ```python
 graph = graph_builder.compile(...)
@@ -48,7 +48,30 @@ The main documented way to specify the schema of a graph is by using `TypedDict`
 
 By default, the graph will have the same input and output schemas. If you want to change this, you can also specify explicit input and output schemas directly. This is useful when you have a lot of keys, and some are explicitly for input and others for output. See the [notebook here](../how-tos/input_output_schema.ipynb) for how to use.
 
-By default, all nodes in the graph will share the same state. This means that they will read and write to the same state channels. It is possible to have nodes write to private state channels inside the graph for internal node communication - see [this notebook](../how-tos/pass_private_state.ipynb) for how to do that.
+#### Multiple schemas
+
+Typically, all graph nodes communicate with a single schema. This means that they will read and write to the same state channels. But, there are cases where we may want a bit more control over this:
+
+* Internal nodes may pass information that is not required in the graph's input / output.
+* We may also want to use different input / output schemas for the graph. The output might, for example, only contain a single relevant output key.
+
+It is possible to have nodes write to private state channels inside the graph for internal node communication. We can simply define a private schema and use a type hint -- e.g., `state: PrivateState` as shown below -- to specify it as the node input schema. See [this notebook](../how-tos/pass_private_state.ipynb) for more detail. 
+
+```python
+class OverallState(TypedDict):
+    foo: int
+
+class PrivateState(TypedDict):
+    baz: int
+
+def node_1(state: OverallState) -> PrivateState:
+    ...
+
+def node_2(state: PrivateState) -> OverallState:
+    ...
+```
+
+It is also possible to define explicit input and output schemas for a graph. In these cases, we define an "internal" schema that contains *all* keys relevant to graph operations. But, we also define `input` and `output` schemas that are sub-sets of the "internal" schema to constrain the input and output of the graph. See [this notebook](../how-tos/input_output_schema.ipynb) for more detail.
 
 ### Reducers
 
@@ -266,100 +289,9 @@ def continue_to_jokes(state: OverallState):
 graph.add_conditional_edges("node_a", continue_to_jokes)
 ```
 
-## Checkpointer
-
-LangGraph has a built-in persistence layer, implemented through [checkpointers][basecheckpointsaver]. When you use a checkpointer with a graph, you can interact with the state of that graph. When you use a checkpointer with a graph, you can interact with and manage the graph's state. The checkpointer saves a _checkpoint_ of the graph state at every super-step, enabling several powerful capabilities:
-
-First, checkpointers facilitate [human-in-the-loop workflows](agentic_concepts.md#human-in-the-loop) workflows by allowing humans to inspect, interrupt, and approve steps.Checkpointers are needed for these workflows as the human has to be able to view the state of a graph at any point in time, and the graph has to be to resume execution after the human has made any updates to the state.
-
-Second, it allows for ["memory"](agentic_concepts.md#memory) between interactions. You can use checkpointers to create threads and save the state of a thread after a graph executes. In the case of repeated human interactions (like conversations) any follow up messages can be sent to that checkpoint, which will retain its memory of previous ones.
-
-See [this guide](../how-tos/persistence.ipynb) for how to add a checkpointer to your graph.
-
-## Threads
-
-Threads enable the checkpointing of multiple different runs, making them essential for multi-tenant chat applications and other scenarios where maintaining separate states is necessary. A thread is a unique ID assigned to a series of checkpoints saved by a checkpointer. When using a checkpointer, you must specify a `thread_id` or `thread_ts` when running the graph.
-
-`thread_id` is simply the ID of a thread. This is always required
-
-`thread_ts` can optionally be passed. This identifier refers to a specific checkpoint within a thread. This can be used to kick of a run of a graph from some point halfway through a thread.
-
-You must pass these when invoking the graph as part of the configurable part of the config.
-
-```python
-config = {"configurable": {"thread_id": "a"}}
-graph.invoke(inputs, config=config)
-```
-
-See [this guide](../how-tos/persistence.ipynb) for how to use threads.
-
-## Checkpointer state
-
- When interacting with the checkpointer state, you must specify a [thread identifier](#threads).Each checkpoint saved by the checkpointer has two properties:
-
-- **values**: This is the value of the state at this point in time.
-- **next**: This is a tuple of the nodes to execute next in the graph.
-
-### Get state
-
-You can get the state of a checkpointer by calling `graph.get_state(config)`. The config should contain `thread_id`, and the state will be fetched for that thread.
-
-### Get state history
-
-You can also call `graph.get_state_history(config)` to get a list of the history of the graph. The config should contain `thread_id`, and the state history will be fetched for that thread.
-
-### Update state
-
-You can also interact with the state directly and update it. This takes three different components:
-
-- config
-- values
-- `as_node`
-
-**config**
-
-The config should contain `thread_id` specifying which thread to update.
-
-**values**
-
-These are the values that will be used to update the state. Note that this update is treated exactly as any update from a node is treated. This means that these values will be passed to the [reducer](#reducers) functions that are part of the state. So this does NOT automatically overwrite the state. Let's walk through an example.
-
-Let's assume you have defined the state of your graph as:
-
-```python
-from typing import TypedDict, Annotated
-from operator import add
-
-class State(TypedDict):
-    foo: int
-    bar: Annotated[list[str], add]
-```
-
-Let's now assume the current state of the graph is
-
-```
-{"foo": 1, "bar": ["a"]}
-```
-
-If you update the state as below:
-
-```
-graph.update_state(config, {"foo": 2, "bar": ["b"]})
-```
-
-Then the new state of the graph will be:
-
-```
-{"foo": 2, "bar": ["a", "b"]}
-```
-
-The `foo` key is completely changed (because there is no reducer specified for that key, so it overwrites it). However, there is a reducer specified for the `bar` key, and so it appends `"b"` to the state of `bar`.
-
-**`as_node`**
+## Persistence
 
-The final thing you specify when calling `update_state` is `as_node`. This update will be applied as if it came from node `as_node`. If `as_node` is not provided, it will be set to the last node that updated the state, if not ambiguous.
-
-The reason this matters is that the next steps in the graph to execute depend on the last node to have given an update, so this can be used to control which node executes next.
+LangGraph has a built-in persistence layer, implemented through [checkpointers][basecheckpointsaver]. When you use a checkpointer with a graph, you can interact with and manage the graph's state after the execution. The checkpointer saves a _checkpoint_ (a snapshot) of the graph state at every superstep, enabling several powerful capabilities, including human-in-the-loop, memory and fault-tolerance. See this [conceptual guide](./persistence.md) for more information.
 
 ## Graph Migrations
 
@@ -417,7 +349,7 @@ Read [this how-to](https://langchain-ai.github.io/langgraph/how-tos/recursion-li
 
 It can often be useful to set breakpoints before or after certain nodes execute. This can be used to wait for human approval before continuing. These can be set when you ["compile" a graph](#compiling-your-graph). You can set breakpoints either _before_ a node executes (using `interrupt_before`) or after a node executes (using `interrupt_after`.)
 
-You **MUST** use a [checkpoiner](#checkpointer) when using breakpoints. This is because your graph needs to be able to resume execution.
+You **MUST** use a [checkpoiner](./persistence.md) when using breakpoints. This is because your graph needs to be able to resume execution.
 
 In order to resume execution, you can just invoke your graph with `None` as the input.
 
@@ -431,208 +363,22 @@ graph.invoke(None, config=config)
 
 See [this guide](../how-tos/human_in_the_loop/breakpoints.ipynb) for a full walkthrough of how to add breakpoints.
 
-## Visualization
-
-It's often nice to be able to visualize graphs, especially as they get more complex. LangGraph comes with several built-in ways to visualize graphs. See [this how-to guide](../how-tos/visualization.ipynb) for more info.
-
-## Streaming
-
-LangGraph is built with first class support for streaming. There are several different ways to stream back results
-
-### `.stream` and `.astream`
-
-`.stream` and `.astream` are sync and async methods for streaming back results.
-There are several different modes you can specify when calling these methods (e.g. `graph.stream(..., mode="...")):
-
-- [`"values"`](../how-tos/stream-values.ipynb): This streams the full value of the state after each step of the graph.
-- [`"updates"`](../how-tos/stream-updates.ipynb): This streams the updates to the state after each step of the graph. If multiple updates are made in the same step (e.g. multiple nodes are run) then those updates are streamed separately.
-- `"debug"`: This streams as much information as possible throughout the execution of the graph.
-
-The below visualization shows the difference between the `values` and `updates` modes:
-
-![values vs updates](../static/values_vs_updates.png)
+### Dynamic Breakpoints
 
-
-### `.astream_events` (for streaming tokens of LLM calls)
-
-In addition, you can use the [`astream_events`](../how-tos/streaming-events-from-within-tools.ipynb) method to stream back events that happen _inside_ nodes. This is useful for [streaming tokens of LLM calls](../how-tos/streaming-tokens.ipynb).
-
-This is a standard method on all [LangChain objects](https://python.langchain.com/v0.2/docs/concepts/#runnable-interface). This means that as the graph is executed, certain events are emitted along the way and can be seen if you run the graph using `.astream_events`. 
-
-All events have (among other things) `event`, `name`, and `data` fields. What do these mean?
-
-- `event`: This is the type of event that is being emitted. You can find a detailed table of all callback events and triggers [here](https://python.langchain.com/v0.2/docs/concepts/#callback-events).
-- `name`: This is the name of event.
-- `data`: This is the data associated with the event.
-
-What types of things cause events to be emitted?
-
-* each node (runnable) emits `on_chain_start` when it starts execution, `on_chain_stream` during the node execution and `on_chain_end` when the node finishes. Node events will have the node name in the event's `name` field
-* the graph will emit `on_chain_start` in the beginning of the graph execution, `on_chain_stream` after each node execution and `on_chain_end` when the graph finishes. Graph events will have the `LangGraph` in the event's `name` field
-* Any writes to state channels (i.e. anytime you update the value of one of your state keys) will emit `on_chain_start` and `on_chain_end` events
-
-Additionally, any events that are created inside your nodes (LLM events, tool events, manually emitted events, etc.) will also be visible in the output of `.astream_events`.
-
-To make this more concrete and to see what this looks like, let's see what events are returned when we run a simple graph:
+It may be helpful to **dynamically** interrupt the graph from inside a given node based on some condition. In `LangGraph` you can do so by using `NodeInterrupt` -- a special exception that can be raised from inside a node.
 
 ```python
-from langchain_openai import ChatOpenAI
-from langgraph.graph import StateGraph, MessagesState, START, END
-
-model = ChatOpenAI(model="gpt-3.5-turbo")
-
-
-def call_model(state: MessagesState):
-    response = model.invoke(state['messages'])
-    return {"messages": response}
+def my_node(state: State) -> State:
+    if len(state['input']) > 5:
+        raise NodeInterrupt(f"Received input that is longer than 5 characters: {state['input']}")
 
-workflow = StateGraph(MessagesState)
-workflow.add_node(call_model)
-workflow.add_edge(START, "call_model")
-workflow.add_edge("call_model", END)
-app = workflow.compile()
-
-inputs = [{"role": "user", "content": "hi!"}]
-async for event in app.astream_events({"messages": inputs}, version="v2"):
-    kind = event["event"]
-    print(f"{kind}: {event['name']}")
-```
-```shell
-on_chain_start: LangGraph
-on_chain_start: __start__
-on_chain_end: __start__
-on_chain_start: call_model
-on_chat_model_start: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_stream: ChatOpenAI
-on_chat_model_end: ChatOpenAI
-on_chain_start: ChannelWrite<call_model,messages>
-on_chain_end: ChannelWrite<call_model,messages>
-on_chain_stream: call_model
-on_chain_end: call_model
-on_chain_stream: LangGraph
-on_chain_end: LangGraph
-```
-
-We start with the overall graph start (`on_chain_start: LangGraph`). We then write to the `__start__` node (this is special node to handle input).
-We then start the `call_model` node (`on_chain_start: call_model`). We then start the chat model invocation (`on_chat_model_start: ChatOpenAI`),
-stream back token by token (`on_chat_model_stream: ChatOpenAI`) and then finish the chat model (`on_chat_model_end: ChatOpenAI`). From there, 
-we write the results back to the channel (`ChannelWrite<call_model,messages>`) and then finish the `call_model` node and then the graph as a whole.
-
-This should hopefully give you a good sense of what events are emitted in a simple graph. But what data do these events contain?
-Each type of event contains data in a different format. Let's look at what `on_chat_model_stream` events look like. This is an important type of event
-since it is needed for streaming tokens from an LLM response.
-
-These events look like:
-
-```shell
-{'event': 'on_chat_model_stream',
- 'name': 'ChatOpenAI',
- 'run_id': '3fdbf494-acce-402e-9b50-4eab46403859',
- 'tags': ['seq:step:1'],
- 'metadata': {'langgraph_step': 1,
-  'langgraph_node': 'call_model',
-  'langgraph_triggers': ['start:call_model'],
-  'langgraph_task_idx': 0,
-  'checkpoint_id': '1ef657a0-0f9d-61b8-bffe-0c39e4f9ad6c',
-  'checkpoint_ns': 'call_model',
-  'ls_provider': 'openai',
-  'ls_model_name': 'gpt-3.5-turbo',
-  'ls_model_type': 'chat',
-  'ls_temperature': 0.7},
- 'data': {'chunk': AIMessageChunk(content='Hello', id='run-3fdbf494-acce-402e-9b50-4eab46403859')},
- 'parent_ids': []}
-```
-We can see that we have the event type and name (which we knew from before).
-
-We also have a bunch of stuff in metadata. Noticeably, `'langgraph_node': 'call_model',` is some really helpful information
-which tells us which node this model was invoked inside of.
-
-Finally, `data` is a really important field. This contains the actual data for this event! Which in this case
-is an AIMessageChunk. This contains the `content` for the message, as well as an `id`.
-This is the ID of the overall AIMessage (not just this chunk) and is super helpful - it helps
-us track which chunks are part of the same message (so we can show them together in the UI).
-
-This information contains all that is needed for creating a UI for streaming LLM tokens. You can see a 
-guide for that [here](../how-tos/streaming-tokens.ipynb).
-
-
-!!! warning "ASYNC IN PYTHON<=3.10"
-    You may fail to see events being emitted from inside a node when using `.astream_events` in Python <= 3.10. If you're using a Langchain RunnableLambda, a RunnableGenerator, or Tool asynchronously inside your node, you will have to propagate callbacks to these objects manually. This is because LangChain cannot automatically propagate callbacks to child objects in this case. Please see examples [here](../how-tos/streaming-content.ipynb) and [here](../how-tos/streaming-events-from-within-tools.ipynb).
-
-#### Only stream tokens from specific nodes/LLMs
-
-
-There are certain cases where you have multiple nodes in your graph that make LLM calls, and you do not wish to stream the tokens from every single LLM call. For example, you may use one LLM as a planner for the next steps to take, and another LLM somewhere else in the graph that actually responds to the user. In that case, you most likely WON'T want to stream tokens from the planner LLM but WILL want to stream them from the respond to user LLM. Below we show two different ways of doing this, one by streaming from specific nodes only and the second by streaming from specific LLMs only.
-
-First, let's define our graph:
-
-```python
-from langchain_openai import ChatOpenAI
-from langgraph.graph import StateGraph, MessagesState, START, END
-
-model_1 = ChatOpenAI(model="gpt-3.5-turbo", name="model_1")
-model_2 = ChatOpenAI(model="gpt-3.5-turbo", name="model_2")
-
-def call_first_model(state: MessagesState):
-    response = model_1.invoke(state['messages'])
-    return {"messages": response}
-
-def call_second_model(state: MessagesState):
-    response = model_2.invoke(state['messages'])
-    return {"messages": response}
-
-workflow = StateGraph(MessagesState)
-workflow.add_node(call_first_model)
-workflow.add_node(call_second_model)
-workflow.add_edge(START, "call_first_model")
-workflow.add_edge("call_first_model", "call_second_model")
-workflow.add_edge("call_second_model", END)
-app = workflow.compile()
-```
-
-**Streaming from specific node**
-
-In the case that we only want the output from a single node, we can use the event metadata to filter node names:
-
-```python
-inputs = [{"role": "user", "content": "hi!"}]
-
-async for event in app.astream_events({"messages": inputs}, version="v2"):
-    # Get chat model tokens from a particular node 
-    if event["event"] == "on_chat_model_stream" and event['metadata'].get('langgraph_node','') == "call_second_model":
-     print(event["data"]["chunk"].content, end="|", flush=True)
-```
-
-```shell
-|Hello|!| How| can| I| help| you| today|?||
+    return state
 ```
 
-As we can see only the response from the second LLM was streamed (you can tell because we only received a single response, if we had streamed both we would have received two "Hello! How can I help you today?" messages).
-
-**Streaming from specific LLM**
-
-Sometimes you might want to stream from specific LLMs instead of specific nodes. This could be the case if you have multiple LLM calls inside a single node, and only want to stream the output of a specific one or if you use the same LLM in different nodes and want to stream it's output anytime it is called. We can do this by using the `name` parameter for LLMs and events:
+## Visualization
 
-```python
-inputs = [{"role": "user", "content": "hi!"}]
-async for event in app.astream_events({"messages": inputs}, version="v2"):
-    # Get chat model tokens from a particular LLM inside a particular node
-    if event["event"] == "on_chat_model_stream" and event['name'] == "model_2":
-        print(event["data"]["chunk"].content, end="|", flush=True)
-```
+It's often nice to be able to visualize graphs, especially as they get more complex. LangGraph comes with several built-in ways to visualize graphs. See [this how-to guide](../how-tos/visualization.ipynb) for more info.
 
-```shell
-|Hello|!| How| can| I| assist| you| today|?||
-```
+## Streaming
 
-As expected, we only see a single LLM response since the response from `model_1` was not streamed. 
\ No newline at end of file
+LangGraph is built with first class support for streaming, including streaming updates from graph nodes during the execution, streaming tokens from LLM calls and more. See this [conceptual guide](./streaming.md) for more information.
\ No newline at end of file
diff --git a/docs/docs/concepts/multi_agent.md b/docs/docs/concepts/multi_agent.md
new file mode 100644
index 000000000..073c49a94
--- /dev/null
+++ b/docs/docs/concepts/multi_agent.md
@@ -0,0 +1,138 @@
+# Multi-agent Systems
+
+A multi-agent system is a system with multiple independent actors powered by LLMs that are connected in a specific way. These actors can be as simple as a prompt and an LLM call, or as complex as a [ReAct](./agentic_concepts.md#react-implementation) agent.
+
+The primary benefits of this architecture are:
+
+* **Modularity**: Separate agents facilitate easier development, testing, and maintenance of agentic systems.
+* **Specialization**: You can create expert agents focused on specific domains, and compose them into more complex applications
+* **Control**: You can explicitly control how agents communicate (as opposed to relying on function calling)
+
+## Multi-agent systems in LangGraph
+
+### Agents as nodes
+
+Agents can be defined as nodes in LangGraph. As any other node in the LangGraph, these agent nodes receive the graph state as an input and return an update to the state as their output.
+
+* Simple **LLM nodes**: single LLMs with custom prompts
+* **Subgraph nodes**: complex graphs called inside the orchestrator graph node
+
+![](./img/multi_agent/subgraph.png)
+
+### Agents as tools
+
+Agents can also be defined as tools. In this case, the orchestrator agent (e.g. ReAct agent) would use a tool-calling LLM to decide which of the agent tools to call, as well as the arguments to pass to those agents.
+
+You could also take a "mega-graph" approach – incorporating subordinate agents' nodes directly into the parent, orchestrator graph. However, this is not recommended for complex subordinate agents, as it would make the overall system harder to scale, maintain and debug – you should use subgraphs or tools in those cases.
+
+## Communication in multi-agent systems
+
+A big question in multi-agent systems is how the agents communicate amongst themselves and with the orchestrator agent. This involves both the schema of how they communicate, as well as the sequence in which they communicate. LangGraph is perfect for orchestrating these types of systems and allows you to define both.
+
+### Schema
+
+LangGraph provides a lot of flexibility for how to communicate within multi-agent architectures.
+
+* A node in LangGraph can have a [private input state schema](https://langchain-ai.github.io/langgraph/how-tos/pass_private_state/) that is distinct from the graph state schema. This allows passing additional information during the graph execution that is only needed for executing a particular node.
+* Subgraph node agents can have independent [input / output state schemas](https://langchain-ai.github.io/langgraph/how-tos/input_output_schema/). In this case it’s important to [add input / output transformations](https://langchain-ai.github.io/langgraph/how-tos/subgraph-transform-state/) so that the parent graph knows how to communicate with the subgraphs.
+* For tool-based subordinate agents, the orchestrator determines the inputs based on the tool schema. Additionally, LangGraph allows passing state to individual tools at runtime, so subordinate agents can access parent state, if needed.
+
+### Sequence
+
+LangGraph provides multiple methods to control agent communication sequence:
+
+* **Explicit control flow (graph edges)**: LangGraph allows you to define the control flow of your application (i.e. the sequence of how agents communicate) explicitly, via [graph edges](./low_level.md#edges).
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage
+from langgraph.graph import StateGraph, MessagesState, START, END
+
+model = ChatOpenAI(model="gpt-4o-mini")
+
+def research_agent(state: MessagesState):
+    """Call research agent"""
+    messages = [SystemMessage(content="You are a research assistant. Given a topic, provide key facts and information.")] + state["messages"]
+    response = model.invoke(messages)
+    return {"messages": [response]}
+
+def summarize_agent(state: MessagesState):
+    """Call summarization agent"""
+    messages = [SystemMessage(content="You are a summarization expert. Condense the given information into a brief summary.")] + state["messages"]
+    response = model.invoke(messages)
+    return {"messages": [response]}
+
+graph = StateGraph(MessagesState)
+graph.add_node("research", research_agent)
+graph.add_node("summarize", summarize_agent)
+
+# define the flow explicitly
+graph.add_edge(START, "research")
+graph.add_edge("research", "summarize")
+graph.add_edge("summarize", END)
+```
+
+* **Dynamic control flow (conditional edges)**: LangGraph also allows you to define [conditional edges](./low_level.md#conditional-edges), where the control flow is dependent on satisfying a given condition. In such cases, you can use an LLM to decide which subordinate agent to call next.
+
+
+* **Implicit control flow (tool calling)**: if the orchestrator agent treats subordinate agents as tools, the tool-calling LLM powering the orchestrator will make decisions about the order in which the tools (agents) are being called.
+
+```python
+from typing import Annotated
+from langchain_core.messages import SystemMessage, ToolMessage
+from langchain_openai import ChatOpenAI
+from langgraph.prebuilt import ToolNode, InjectedState, create_react_agent
+
+model = ChatOpenAI(model="gpt-4o-mini")
+
+def research_agent(state: Annotated[dict, InjectedState]):
+    """Call research agent"""
+    messages = [SystemMessage(content="You are a research assistant. Given a topic, provide key facts and information.")] + state["messages"][:-1]
+    response = model.invoke(messages)
+    tool_call = state["messages"][-1].tool_calls[0]
+    return {"messages": [ToolMessage(response.content, tool_call_id=tool_call["id"])]}
+
+def summarize_agent(state: Annotated[dict, InjectedState]):
+    """Call summarization agent"""
+    messages = [SystemMessage(content="You are a summarization expert. Condense the given information into a brief summary.")] + state["messages"][:-1]
+    response = model.invoke(messages)
+    tool_call = state["messages"][-1].tool_calls[0]
+    return {"messages": [ToolMessage(response.content, tool_call_id=tool_call["id"])]}
+
+tool_node = ToolNode([research_agent, summarize_agent])
+graph = create_react_agent(model, [research_agent, summarize_agent], state_modifier="First research and then summarize information on a given topic.")
+```
+
+## Example architectures
+
+Below are several examples of complex multi-agent architectures that can be implemented in LangGraph.
+
+### Multi-Agent Collaboration
+
+In this example, different agents collaborate on a **shared** scratchpad of messages (i.e. shared graph state). This means that all the work any of them do is visible to the other ones. The benefit is that the other agents can see all the individual steps done. The downside is that sometimes is it overly verbose and unnecessary to pass ALL this information along, and sometimes only the final answer from an agent is needed. We call this **collaboration** because of the shared nature the scratchpad.
+
+In this case, the independent agents are actually just a single LLM call with a custom system message.
+
+Here is a visualization of how these agents are connected:
+
+![](./img/multi_agent/collaboration.png)
+
+See full code example in this [tutorial](https://langchain-ai.github.io/langgraph/tutorials/multi_agent/multi-agent-collaboration/).
+
+### Agent Supervisor
+
+In this example, multiple agents are connected, but compared to above they do NOT share a shared scratchpad. Rather, they have their own independent scratchpads (i.e. their own state), and then their final responses are appended to a global scratchpad.
+
+In this case, the independent agents are a LangGraph ReAct agent (graph). This means they have their own individual prompt, LLM, and tools. When called, it's not just a single LLM call, but rather an invocation of the graph powering the ReAct agent.
+
+![](./img/multi_agent/supervisor.png)
+
+See full code example in this [tutorial](https://langchain-ai.github.io/langgraph/tutorials/multi_agent/agent_supervisor/).
+
+### Hierarchical Agent Teams
+
+What if the job for a single worker in agent supervisor example becomes too complex? What if the number of workers becomes too large? For some applications, the system may be more effective if work is distributed hierarchically. You can do this by creating additional level of subgraphs and creating a top-level supervisor, along with mid-level supervisors:
+
+![](./img/multi_agent/hierarchical.png)
+
+See full code example in this [tutorial](https://langchain-ai.github.io/langgraph/tutorials/multi_agent/hierarchical_agent_teams/).
\ No newline at end of file
diff --git a/docs/docs/concepts/persistence.md b/docs/docs/concepts/persistence.md
new file mode 100644
index 000000000..23a50c502
--- /dev/null
+++ b/docs/docs/concepts/persistence.md
@@ -0,0 +1,264 @@
+# Persistence
+
+LangGraph has a built-in persistence layer, implemented through checkpointers. When you compile graph with a checkpointer, the checkpointer saves a `checkpoint` of the graph state at every super-step. Those checkpoints are saved to a `thread`, which can be accessed after graph execution. Because `threads` allow access to graph's state after execution, several powerful capabilities including human-in-the-loop, memory, time travel, and fault-tolerance are all possible. See [this how-to guide](../how-tos/persistence.ipynb) for an end-to-end example on how to add and use checkpointers with your graph. Below, we'll discuss each of these concepts in more detail. 
+
+![Checkpoints](img/persistence/checkpoints.jpg)
+
+## Threads
+
+A thread is a unique ID or [thread identifier](#threads) assigned to each checkpoint saved by a checkpointer. When invoking graph with a checkpointer, you **must** specify a `thread_id` as part of the `configurable` portion of the config:
+
+```python
+{"configurable": {"thread_id": "1"}}
+```
+
+## Checkpoints
+
+Checkpoint is a snapshot of the graph state saved at each super-step and is represented by `StateSnapshot` object with the following key properties:
+
+- `config`: Config associated with this checkpoint. 
+- `metadata`: Metadata associated with this checkpoint.
+- `values`: Values of the state channels at this point in time.
+- `next` A tuple of the node names to execute next in the graph.
+- `tasks`: A tuple of `PregelTask` objects that contain information about next tasks to be executed. If the step was previously attempted, it will include error information. If a graph was interrupted [dynamically](../how-tos/human_in_the_loop/dynamic_breakpoints.ipynb) from within a node, tasks will contain additional data associated with interrupts.
+
+Let's see what checkpoints are saved when a simple graph is invoked as follows:
+
+```python
+from langgraph.graph import StateGraph, START, END
+from langgraph.checkpoint.memory import MemorySaver
+from typing import TypedDict, Annotated
+from operator import add
+
+class State(TypedDict):
+    foo: int
+    bar: Annotated[list[str], add]
+
+def node_a(state: State):
+    return {"foo": "a", "bar": ["a"]}
+
+def node_b(state: State):
+    return {"foo": "b", "bar": ["b"]}
+
+
+workflow = StateGraph(State)
+workflow.add_node(node_a)
+workflow.add_node(node_b)
+workflow.add_edge(START, "node_a")
+workflow.add_edge("node_a", "node_b")
+workflow.add_edge("node_b", END)
+
+checkpointer = MemorySaver()
+graph = workflow.compile(checkpointer=checkpointer)
+
+config = {"configurable": {"thread_id": "1"}}
+graph.invoke({"foo": ""}, config)
+```
+
+After we run the graph, we expect to see exactly 4 checkpoints:
+
+* empty checkpoint with `START` as the next node to be executed
+* checkpoint with the user input `{'foo': '', 'bar': []}` and `node_a` as the next node to be executed
+* checkpoint with the outputs of `node_a` `{'foo': 'a', 'bar': ['a']}` and `node_b` as the next node to be executed
+* checkpoint with the outputs of `node_b` `{'foo': 'b', 'bar': ['a', 'b']}` and no next nodes to be executed
+
+Note that we `bar` channel values contain outputs from both nodes as we have a reducer for `bar` channel.
+
+### Get state
+
+When interacting with the saved graph state, you **must** specify a [thread identifier](#threads). You can view the *latest* state of the graph by calling `graph.get_state(config)`. This will return a `StateSnapshot` object that corresponds to the latest checkpoint associated with the thread ID provided in the config or a checkpoint associated with a checkpoint ID for the thread, if provided.
+
+```python
+# get the latest state snapshot
+config = {"configurable": {"thread_id": "1"}}
+graph.get_state(config)
+
+# get a state snapshot for a specific checkpoint_id
+config = {"configurable": {"thread_id": "1", "checkpoint_id": "1ef663ba-28fe-6528-8002-5a559208592c"}}
+graph.get_state(config)
+```
+
+In our example, the output of `get_state` will look like this:
+
+```
+StateSnapshot(
+    values={'foo': 'b', 'bar': ['a', 'b']},
+    next=(),
+    config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28fe-6528-8002-5a559208592c'}},
+    metadata={'source': 'loop', 'writes': {'node_b': {'foo': 'b', 'bar': ['b']}}, 'step': 2},
+    created_at='2024-08-29T19:19:38.821749+00:00',
+    parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f9-6ec4-8001-31981c2c39f8'}}, tasks=()
+)
+```
+
+### Get state history
+
+You can get the full history of the graph execution for a given thread by calling `graph.get_state_history(config)`. This will return a list of `StateSnapshot` objects associated with the thread ID provided in the config. Importantly, the checkpoints will be ordered chronologically with the most recent checkpoint / `StateSnapshot` being the first in the list.
+
+```python
+config = {"configurable": {"thread_id": "1"}}
+list(graph.get_state_history(config))
+```
+
+In our example, the output of `get_state_history` will look like this:
+
+```
+[
+    StateSnapshot(
+        values={'foo': 'b', 'bar': ['a', 'b']},
+        next=(),
+        config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28fe-6528-8002-5a559208592c'}},
+        metadata={'source': 'loop', 'writes': {'node_b': {'foo': 'b', 'bar': ['b']}}, 'step': 2},
+        created_at='2024-08-29T19:19:38.821749+00:00',
+        parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f9-6ec4-8001-31981c2c39f8'}},
+        tasks=(),
+    ),
+    StateSnapshot(
+        values={'foo': 'a', 'bar': ['a']}, next=('node_b',),
+        config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f9-6ec4-8001-31981c2c39f8'}},
+        metadata={'source': 'loop', 'writes': {'node_a': {'foo': 'a', 'bar': ['a']}}, 'step': 1},
+        created_at='2024-08-29T19:19:38.819946+00:00',
+        parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f4-6b4a-8000-ca575a13d36a'}},
+        tasks=(PregelTask(id='6fb7314f-f114-5413-a1f3-d37dfe98ff44', name='node_b', error=None, interrupts=()),),
+    ),
+    StateSnapshot(
+        values={'foo': '', 'bar': []},
+        next=('node_a',),
+        config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f4-6b4a-8000-ca575a13d36a'}},
+        metadata={'source': 'loop', 'writes': None, 'step': 0},
+        created_at='2024-08-29T19:19:38.817813+00:00',
+        parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f0-6c66-bfff-6723431e8481'}},
+        tasks=(PregelTask(id='f1b14528-5ee5-579c-949b-23ef9bfbed58', name='node_a', error=None, interrupts=()),),
+    ),
+    StateSnapshot(
+        values={'bar': []},
+        next=('__start__',),
+        config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1ef663ba-28f0-6c66-bfff-6723431e8481'}},
+        metadata={'source': 'input', 'writes': {'foo': ''}, 'step': -1},
+        created_at='2024-08-29T19:19:38.816205+00:00',
+        parent_config=None,
+        tasks=(PregelTask(id='6d27aa2e-d72b-5504-a36f-8620e54a76dd', name='__start__', error=None, interrupts=()),),
+    )
+]
+```
+
+![State](img/persistence/get_state.jpg)
+
+### Replay
+
+It's also possible to play-back a prior graph execution. If we `invoking` a graph with a `thread_id` and a `checkpoint_id`, then we will *re-play* the graph from a checkpoint that corresponds to the `checkpoint_id`.
+
+* `thread_id` is simply the ID of a thread. This is always required.
+* `checkpoint_id` This identifier refers to a specific checkpoint within a thread. 
+
+You must pass these when invoking the graph as part of the `configurable` portion of the config:
+
+```python
+# {"configurable": {"thread_id": "1"}}  # valid config
+# {"configurable": {"thread_id": "1", "checkpoint_id": "0c62ca34-ac19-445d-bbb0-5b4984975b2a"}}  # also valid config
+
+config = {"configurable": {"thread_id": "1"}}
+graph.invoke(inputs, config=config)
+```
+
+Importantly, LangGraph knows whether a particular checkpoint has been executed previously. If it has, LangGraph simply *re-plays* that particular step in the graph and does not re-execute the step. See this [how to guide on time-travel to learn more about replaying](../how-tos/human_in_the_loop/time-travel.ipynb).
+
+![Replay](img/persistence/re_play.jpg)
+
+### Update state
+
+In addition to re-playing the graph from specific `checkpoints`, we can also *edit* the graph state. We do this using `graph.update_state()`. This method three different arguments:
+
+#### `config`
+
+The config should contain `thread_id` specifying which thread to update. When only the `thread_id` is passed, we update (or fork) the current state. Optionally, if we include `checkpoint_id` field, then we fork that selected checkpoint.
+
+#### `values`
+
+These are the values that will be used to update the state. Note that this update is treated exactly as any update from a node is treated. This means that these values will be passed to the [reducer](./low_level.md#reducers) functions, if they are defined for some of the channels in the graph state. This means that `update_state` does NOT automatically overwrite the channel values for every channel, but only for the channels without reducers. Let's walk through an example.
+
+Let's assume you have defined the state of your graph with the following schema (see full example above):
+
+```python
+from typing import TypedDict, Annotated
+from operator import add
+
+class State(TypedDict):
+    foo: int
+    bar: Annotated[list[str], add]
+```
+
+Let's now assume the current state of the graph is
+
+```
+{"foo": 1, "bar": ["a"]}
+```
+
+If you update the state as below:
+
+```
+graph.update_state(config, {"foo": 2, "bar": ["b"]})
+```
+
+Then the new state of the graph will be:
+
+```
+{"foo": 2, "bar": ["a", "b"]}
+```
+
+The `foo` key (channel) is completely changed (because there is no reducer specified for that channel, so `update_state` overwrites it). However, there is a reducer specified for the `bar` key, and so it appends `"b"` to the state of `bar`.
+
+#### `as_node`
+
+The final thing you can optionally specify when calling `update_state` is `as_node`. If you provided it, the update will be applied as if it came from node `as_node`. If `as_node` is not provided, it will be set to the last node that updated the state, if not ambiguous. The reason this matters is that the next steps to execute depend on the last node to have given an update, so this can be used to control which node executes next. See this [how to guide on time-travel to learn more about forking state](../how-tos/human_in_the_loop/time-travel.ipynb).
+
+![Update](img/persistence/checkpoints_full_story.jpg)
+
+## Checkpointer libraries
+
+Under the hood, checkpointing is powered by checkpointer objects that conform to [BaseCheckpointSaver][basecheckpointsaver] interface. LangGraph provides several checkpointer implementations, all implemented via standalone, installable libraries:
+
+* `langgraph-checkpoint`: The base interface for checkpointer savers ([BaseCheckpointSaver][basecheckpointsaver]) and serialization/deserialization interface ([SerializerProtocol][serializerprotocol]). Includes in-memory checkpointer implementation ([MemorySaver][memorysaver]) for experimentation. LangGraph comes with `langgraph-checkpoint` included.
+* `langgraph-checkpoint-sqlite`: An implementation of LangGraph checkpointer that uses SQLite database ([SqliteSaver][sqlitesaver] / [AsyncSqliteSaver][asyncsqlitesaver]). Ideal for experimentation and local workflows. Needs to be installed separately.
+* `langgraph-checkpoint-postgres`: An advanced checkpointer that uses Postgres database ([PostgresSaver][postgressaver] / [AsyncPostgresSaver][asyncpostgressaver]), used in LangGraph Cloud. Ideal for using in production. Needs to be installed separately.
+
+### Checkpointer interface
+
+Each checkpointer conforms to [BaseCheckpointSaver][basecheckpointsaver] interface and implements the following methods:
+
+* `.put` - Store a checkpoint with its configuration and metadata.  
+* `.put_writes` - Store intermediate writes linked to a checkpoint (i.e. [pending writes](#pending-writes)).  
+* `.get_tuple` - Fetch a checkpoint tuple using for a given configuration (`thread_id` and `checkpoint_id`). This is used to populate `StateSnapshot` in `graph.get_state()`.  
+* `.list` - List checkpoints that match a given configuration and filter criteria. This is used to populate state history in `graph.get_state_history()`
+
+If the checkpointer is used with asynchronous graph execution (i.e. executing the graph via `.ainvoke`, `.astream`, `.abatch`), asynchronous versions of the above methods will be used (`.aput`, `.aput_writes`, `.aget_tuple`, `.alist`).
+
+!!! note Note
+    For running your graph asynchronously, you can use `MemorySaver`, or async versions of Sqlite/Postgres checkpointers -- `AsyncSqliteSaver` / `AsyncPostgresSaver` checkpointers.
+
+### Serializer
+
+When checkpointers save the graph state, they need to serialize the channel values in the state. This is done using serializer objects. 
+`langgraph_checkpoint` defines [protocol][serializerprotocol] for implementing serializers provides a default implementation ([JsonPlusSerializer][jsonplusserializer]) that handles a wide variety of types, including LangChain and LangGraph primitives, datetimes, enums and more.
+
+## Capabilities
+
+### Human-in-the-loop
+
+First, checkpointers facilitate [human-in-the-loop workflows](agentic_concepts.md#human-in-the-loop) workflows by allowing humans to inspect, interrupt, and approve graph steps. Checkpointers are needed for these workflows as the human has to be able to view the state of a graph at any point in time, and the graph has to be to resume execution after the human has made any updates to the state. See [these how-to guides](../how-tos/human_in_the_loop/breakpoints.ipynb) for concrete examples.
+
+### Memory
+
+Second, checkpointers allow for ["memory"](agentic_concepts.md#memory) between interactions.  In the case of repeated human interactions (like conversations) any follow up messages can be sent to that thread, which will retain its memory of previous ones. See [this how-to guide](../how-tos/memory/manage-conversation-history.ipynb) for an end-to-end example on how to add and manage conversation memory using checkpointers.
+
+### Time Travel
+
+Third, checkpointers allow for ["time travel"](../how-tos/human_in_the_loop/time-travel.ipynb), allowing users to replay prior graph executions to review and / or debug specific graph steps. In addition, checkpointers make it possible to fork the graph state at arbitrary checkpoints to explore alternative trajectories.
+
+### Fault-tolerance
+
+Lastly, checkpointing also provides fault-tolerance and error recovery: if one or more nodes fail at a given superstep, you can restart your graph from the last successful step. Additionally, when a graph node fails mid-execution at a given superstep, LangGraph stores pending checkpoint writes from any other nodes that completed successfully at that superstep, so that whenever we resume graph execution from that superstep we don't re-run the successful nodes.
+
+#### Pending writes
+
+Additionally, when a graph node fails mid-execution at a given superstep, LangGraph stores pending checkpoint writes from any other nodes that completed successfully at that superstep, so that whenever we resume graph execution from that superstep we don't re-run the successful nodes.
\ No newline at end of file
diff --git a/docs/docs/concepts/streaming.md b/docs/docs/concepts/streaming.md
new file mode 100644
index 000000000..564144e25
--- /dev/null
+++ b/docs/docs/concepts/streaming.md
@@ -0,0 +1,133 @@
+# Streaming
+
+LangGraph is built with first class support for streaming. There are several different ways to stream back outputs from a graph run
+
+## Streaming graph outputs (`.stream` and `.astream`)
+
+`.stream` and `.astream` are sync and async methods for streaming back outputs from a graph run.
+There are several different modes you can specify when calling these methods (e.g. `graph.stream(..., mode="...")):
+
+- [`"values"`](../how-tos/stream-values.ipynb): This streams the full value of the state after each step of the graph.
+- [`"updates"`](../how-tos/stream-updates.ipynb): This streams the updates to the state after each step of the graph. If multiple updates are made in the same step (e.g. multiple nodes are run) then those updates are streamed separately.
+- `"debug"`: This streams as much information as possible throughout the execution of the graph.
+
+The below visualization shows the difference between the `values` and `updates` modes:
+
+![values vs updates](../static/values_vs_updates.png)
+
+
+## Streaming LLM tokens and events (`.astream_events`)
+
+In addition, you can use the [`astream_events`](../how-tos/streaming-events-from-within-tools.ipynb) method to stream back events that happen _inside_ nodes. This is useful for [streaming tokens of LLM calls](../how-tos/streaming-tokens.ipynb).
+
+This is a standard method on all [LangChain objects](https://python.langchain.com/v0.2/docs/concepts/#runnable-interface). This means that as the graph is executed, certain events are emitted along the way and can be seen if you run the graph using `.astream_events`. 
+
+All events have (among other things) `event`, `name`, and `data` fields. What do these mean?
+
+- `event`: This is the type of event that is being emitted. You can find a detailed table of all callback events and triggers [here](https://python.langchain.com/v0.2/docs/concepts/#callback-events).
+- `name`: This is the name of event.
+- `data`: This is the data associated with the event.
+
+What types of things cause events to be emitted?
+
+* each node (runnable) emits `on_chain_start` when it starts execution, `on_chain_stream` during the node execution and `on_chain_end` when the node finishes. Node events will have the node name in the event's `name` field
+* the graph will emit `on_chain_start` in the beginning of the graph execution, `on_chain_stream` after each node execution and `on_chain_end` when the graph finishes. Graph events will have the `LangGraph` in the event's `name` field
+* Any writes to state channels (i.e. anytime you update the value of one of your state keys) will emit `on_chain_start` and `on_chain_end` events
+
+Additionally, any events that are created inside your nodes (LLM events, tool events, manually emitted events, etc.) will also be visible in the output of `.astream_events`.
+
+To make this more concrete and to see what this looks like, let's see what events are returned when we run a simple graph:
+
+```python
+from langchain_openai import ChatOpenAI
+from langgraph.graph import StateGraph, MessagesState, START, END
+
+model = ChatOpenAI(model="gpt-4o-mini")
+
+
+def call_model(state: MessagesState):
+    response = model.invoke(state['messages'])
+    return {"messages": response}
+
+workflow = StateGraph(MessagesState)
+workflow.add_node(call_model)
+workflow.add_edge(START, "call_model")
+workflow.add_edge("call_model", END)
+app = workflow.compile()
+
+inputs = [{"role": "user", "content": "hi!"}]
+async for event in app.astream_events({"messages": inputs}, version="v1"):
+    kind = event["event"]
+    print(f"{kind}: {event['name']}")
+```
+```shell
+on_chain_start: LangGraph
+on_chain_start: __start__
+on_chain_end: __start__
+on_chain_start: call_model
+on_chat_model_start: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_stream: ChatOpenAI
+on_chat_model_end: ChatOpenAI
+on_chain_start: ChannelWrite<call_model,messages>
+on_chain_end: ChannelWrite<call_model,messages>
+on_chain_stream: call_model
+on_chain_end: call_model
+on_chain_stream: LangGraph
+on_chain_end: LangGraph
+```
+
+We start with the overall graph start (`on_chain_start: LangGraph`). We then write to the `__start__` node (this is special node to handle input).
+We then start the `call_model` node (`on_chain_start: call_model`). We then start the chat model invocation (`on_chat_model_start: ChatOpenAI`),
+stream back token by token (`on_chat_model_stream: ChatOpenAI`) and then finish the chat model (`on_chat_model_end: ChatOpenAI`). From there, 
+we write the results back to the channel (`ChannelWrite<call_model,messages>`) and then finish the `call_model` node and then the graph as a whole.
+
+This should hopefully give you a good sense of what events are emitted in a simple graph. But what data do these events contain?
+Each type of event contains data in a different format. Let's look at what `on_chat_model_stream` events look like. This is an important type of event
+since it is needed for streaming tokens from an LLM response.
+
+These events look like:
+
+```shell
+{'event': 'on_chat_model_stream',
+ 'name': 'ChatOpenAI',
+ 'run_id': '3fdbf494-acce-402e-9b50-4eab46403859',
+ 'tags': ['seq:step:1'],
+ 'metadata': {'langgraph_step': 1,
+  'langgraph_node': 'call_model',
+  'langgraph_triggers': ['start:call_model'],
+  'langgraph_task_idx': 0,
+  'checkpoint_id': '1ef657a0-0f9d-61b8-bffe-0c39e4f9ad6c',
+  'checkpoint_ns': 'call_model',
+  'ls_provider': 'openai',
+  'ls_model_name': 'gpt-4o-mini',
+  'ls_model_type': 'chat',
+  'ls_temperature': 0.7},
+ 'data': {'chunk': AIMessageChunk(content='Hello', id='run-3fdbf494-acce-402e-9b50-4eab46403859')},
+ 'parent_ids': []}
+```
+We can see that we have the event type and name (which we knew from before).
+
+We also have a bunch of stuff in metadata. Noticeably, `'langgraph_node': 'call_model',` is some really helpful information
+which tells us which node this model was invoked inside of.
+
+Finally, `data` is a really important field. This contains the actual data for this event! Which in this case
+is an AIMessageChunk. This contains the `content` for the message, as well as an `id`.
+This is the ID of the overall AIMessage (not just this chunk) and is super helpful - it helps
+us track which chunks are part of the same message (so we can show them together in the UI).
+
+This information contains all that is needed for creating a UI for streaming LLM tokens. You can see a 
+guide for that [here](../how-tos/streaming-tokens.ipynb).
+
+
+!!! warning "ASYNC IN PYTHON<=3.10"
+    You may fail to see events being emitted from inside a node when using `.astream_events` in Python <= 3.10. If you're using a Langchain RunnableLambda, a RunnableGenerator, or Tool asynchronously inside your node, you will have to propagate callbacks to these objects manually. This is because LangChain cannot automatically propagate callbacks to child objects in this case. Please see examples [here](../how-tos/streaming-content.ipynb) and [here](../how-tos/streaming-events-from-within-tools.ipynb).
\ No newline at end of file
diff --git a/docs/docs/reference/checkpoints.md b/docs/docs/reference/checkpoints.md
index a7031613d..f8f914f2b 100644
--- a/docs/docs/reference/checkpoints.md
+++ b/docs/docs/reference/checkpoints.md
@@ -1,13 +1,15 @@
-# Checkpoints
+# Checkpointers
 
-You can [compile][langgraph.graph.MessageGraph.compile] any LangGraph workflow with a [CheckPointer][basecheckpointsaver] to give your agent "memory" by persisting its state. This permits things like:
+You can [compile][langgraph.graph.MessageGraph.compile] any LangGraph workflow with a [Checkpointer][basecheckpointsaver] to give your agent "memory" by persisting its state. This permits things like:
 
 - Remembering things across multiple interactions
 - Interrupting to wait for user input
 - Resilience for long-running, error-prone agents
 - Time travel retry and branch from a previous checkpoint
 
-Key checkpointer interfaces and primitives are defined in [`langgraph_checkpoint`](https://github.com/langchain-ai/langgraph/tree/main/libs/checkpoint) library.
+Key checkpointer interfaces and primitives are defined in [`langgraph_checkpoint`](https://github.com/langchain-ai/langgraph/tree/main/libs/checkpoint) library. Additional checkpointer implementations are also available as installable libraries:
+* [`langgraph-checkpoint-sqlite`](https://github.com/langchain-ai/langgraph/tree/main/libs/checkpoint-sqlite): An implementation of LangGraph checkpointer that uses SQLite database. Ideal for experimentation and local workflows.  
+* [`langgraph-checkpoint-postgres`](https://github.com/langchain-ai/langgraph/tree/main/libs/checkpoint-postgres): An advanced checkpointer that uses Postgres database, used in LangGraph Cloud. Ideal for using in production.  
 
 ### Checkpoint
 
@@ -21,11 +23,17 @@ Key checkpointer interfaces and primitives are defined in [`langgraph_checkpoint
 
 ::: langgraph.checkpoint.base.BaseCheckpointSaver
 
+## Serialization / deserialization
+
 ### SerializerProtocol
 
 ::: langgraph.checkpoint.base.SerializerProtocol
 
-## Implementations
+### JsonPlusSerializer
+
+::: langgraph.checkpoint.serde.jsonplus.JsonPlusSerializer
+
+## Checkpointer Implementations
 
 LangGraph also natively provides the following checkpoint implementations.
 
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 5e913c940..84cf27326 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -189,10 +189,13 @@ nav:
           - Add Human-in-the-loop to a ReAct agent: how-tos/create-react-agent-hitl.ipynb
           - Create prebuilt ReAct agent from scratch: how-tos/react-agent-from-scratch.ipynb
   - "Conceptual Guides":
-      - "concepts/index.md"
-      - LangGraph for Agentic Applications: concepts/high_level.md
-      - Low Level LangGraph Concepts: concepts/low_level.md
+      - Why LangGraph?: concepts/high_level.md
+      - LangGraph Glossary: concepts/low_level.md
       - Common Agentic Patterns: concepts/agentic_concepts.md
+      - Human-in-the-Loop: concepts/human_in_the_loop.md
+      - Multi-Agent Systems: concepts/multi_agent.md
+      - Persistence: concepts/persistence.md
+      - Streaming: concepts/streaming.md
       - FAQ: concepts/faq.md
   - Reference:
       - Graphs: reference/graphs.md