Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(max): Conversational assistant #28154

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented Jan 31, 2025

Changes

This PR rearchitects the assistant into a more capable agent:

  1. The Router node is replaced with Root. Root is still capable of choosing a relevant insight type to create, via tool calling, but it's only part of its job. Most importantly, Root is now what the user actually talks with at all times. Its prompt includes lots of personality nudges (lifted from @slshults's work on docs Max) as well as orientation towards data analysis.
  2. The Summarizer node is replaced with QueryExecutor. Like Summarizer, QueryExecutor takes the result of the preceding TrendsGenerator/FunnelGenerator/RetentionGenerator step, but only adds the results to the message history instead of calling an LLM on top of them.

Combined, the key architectural change is added recursion: QueryExecutor goes back to Root for either summarization of results or a decision to analyze deeper. This is currently rather simple, and we're still missing instrumentation on query planning/generation nodes to e.g. edit queries from the past – but this architecture should take us far, as we can easily fan out into multiple sub-questions from the root, to combine them when all are done.

How did you test this code?

TODO

@Twixes Twixes marked this pull request as draft January 31, 2025 17:59
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR significantly restructures the assistant framework by replacing the Router and Summarizer nodes with Root and QueryExecutor nodes, creating a more conversational and capable AI assistant named Max.

  • Introduced new Root node in /ee/hogai/root/nodes.py that handles both conversation flow and insight type selection through tool calling
  • Added QueryExecutor node in /ee/hogai/query_executor/nodes.py that processes query results without LLM summarization, returning control to Root for interpretation
  • Several test files still reference old architecture (e.g. test_eval_router.py using RouterMessage), requiring updates for consistency
  • Patch paths in /ee/hogai/root/test/test_nodes.py incorrectly reference old router module instead of new root module
  • Missing instrumentation for query planning/generation nodes as noted in PR description, potentially impacting debugging capabilities

26 file(s) reviewed, 17 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +226 to 231
def add_query_executor(self, next_node: AssistantNodeName = AssistantNodeName.ROOT):
builder = self._graph
summarizer_node = SummarizerNode(self._team)
builder.add_node(AssistantNodeName.SUMMARIZER, summarizer_node.run)
builder.add_edge(AssistantNodeName.SUMMARIZER, next_node)
query_executor_node = QueryExecutorNode(self._team)
builder.add_node(AssistantNodeName.QUERY_EXECUTOR, query_executor_node.run)
builder.add_edge(AssistantNodeName.QUERY_EXECUTOR, next_node)
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: add_query_executor() should validate that next_node is not END since the new architecture requires returning to ROOT for summarization

Suggested change
def add_query_executor(self, next_node: AssistantNodeName = AssistantNodeName.ROOT):
builder = self._graph
summarizer_node = SummarizerNode(self._team)
builder.add_node(AssistantNodeName.SUMMARIZER, summarizer_node.run)
builder.add_edge(AssistantNodeName.SUMMARIZER, next_node)
query_executor_node = QueryExecutorNode(self._team)
builder.add_node(AssistantNodeName.QUERY_EXECUTOR, query_executor_node.run)
builder.add_edge(AssistantNodeName.QUERY_EXECUTOR, next_node)
return self
def add_query_executor(self, next_node: AssistantNodeName = AssistantNodeName.ROOT):
if next_node == AssistantNodeName.END:
raise ValueError("Query executor must return to ROOT for summarization")
builder = self._graph
query_executor_node = QueryExecutorNode(self._team)
builder.add_node(AssistantNodeName.QUERY_EXECUTOR, query_executor_node.run)
builder.add_edge(AssistantNodeName.QUERY_EXECUTOR, next_node)
return self

Comment on lines +267 to 268
elif node_name in STREAMING_NODES and langchain_message.content: # Also checking content to ignore tools
self._chunks += langchain_message # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Type ignore on line 268 should be replaced with proper type handling for AIMessageChunk concatenation

Comment on lines +14 to +15
The current date and time is {{{utc_datetime_display}}} UTC, which is {{{project_datetime_display}}} in this project's timezone ({{{project_timezone}}}).
It's expected that the data point for the current period can have a drop in value, as it's not complete yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider adding a more descriptive explanation of why data points might drop. For example, mention that data collection is still in progress for the current period.

Here's the query I came up with:
```json
{{{query}}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The triple mustache syntax {{{query}}} will not escape HTML entities. Ensure that the query content is properly sanitized before insertion to prevent XSS vulnerabilities.

@@ -26,8 +26,8 @@ class TestSummarizerNode(ClickhouseTestMixin, APIBaseTest):

@patch("ee.hogai.summarizer.nodes.process_query_dict", side_effect=process_query_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The import path in the patch decorator still references 'summarizer.nodes' instead of 'query_executor.nodes'

Suggested change
@patch("ee.hogai.summarizer.nodes.process_query_dict", side_effect=process_query_dict)
@patch("ee.hogai.query_executor.nodes.process_query_dict", side_effect=process_query_dict)

"""

POST_QUERY_USER_PROMPT = """
Okay, so let's get back what I was a asking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: grammar error in prompt text

Suggested change
Okay, so let's get back what I was a asking.
Okay, so let's get back to what I was asking.

Comment on lines +28 to +30
You have access to data retrieval tools. When a question is data-related, proactively call a single tool to retrieve concrete results.
If the user asked for a tweak to an earlier query, call that tool as well to apply necessary changes.
When calling a tool, ALWAYS first tell the user you're doing so, very briefly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: no clear guidance on what constitutes a 'single tool' or how to handle cases where multiple data points are needed for a complete answer

"""


ROOT_INSIGHT_DESCRIPTION_PROMPT = f"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: f-string is used but no string interpolation is present - this could cause issues if variables are added later

Suggested change
ROOT_INSIGHT_DESCRIPTION_PROMPT = f"""
ROOT_INSIGHT_DESCRIPTION_PROMPT = """

Comment on lines +45 to +47
If information is missing or there is a potential data issue, retrieve a different new analysis instead of giving a subpar answer.
Any time you're about to retrieve more data, SAY SO FIRST. And NEVER retrieve data more than 3 times in a row.
Avoid generic advice. Take into account what you know about the product. Your answer needs to be super high-impact, no more than a few sentences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: the '3 times in a row' limit needs clarification on timeframe and what constitutes a reset of this counter

@@ -56,7 +56,7 @@ class AssistantNodeName(StrEnum):
MEMORY_ONBOARDING = "memory_onboarding"
MEMORY_INITIALIZER = "memory_initializer"
MEMORY_INITIALIZER_INTERRUPT = "memory_initializer_interrupt"
ROUTER = "router"
ROOT = "router"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Inconsistent naming - node is called ROOT but string value is still 'router'. This could cause confusion and should be updated to 'root' for consistency.

Copy link
Contributor

Size Change: +771 B (+0.07%)

Total Size: 1.17 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 1.17 MB +771 B (+0.07%)

compressed-size-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant