-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689
partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does tool calling still work following this change?
I'm finding issues with tool calling integration tests when running locally (e.g., this test).
Unfortunately we don't run these in CI for Ollama. You can run locally via
cd libs/partners/ollama
poetry run python -m pytest tests/integration_tests/test_chat_models.py
Hey, the tool calling does pass now, we will fix the out of date branches soon! |
…to bug/chatollam-token-level-streaming-bindtools-unavailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience with the review. Only blocking question is on async streaming case.
@@ -476,15 +476,16 @@ async def _acreate_chat_stream( | |||
|
|||
params["options"]["stop"] = stop | |||
if "tools" in kwargs: | |||
yield await self._async_client.chat( | |||
async for part in await self._async_client.chat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to also fix the async streaming case? We unfortunately don't cover async streaming with tools in our standard tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I believe we did. It was because when testing it manually the async streaming case was not returning chunks, it was just returning one big chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a standard test for async tool calling this morning in #28133. You can run it against Ollama via
python -m pytest tests/integration_tests/test_chat_models.py::TestChatOllama::test_tool_calling_async
It appeared broken here due to changing stream
to True in the async case for tool calling. I pushed an update to fix the test. Confirmed no new failures across standard tests.
format=params["format"], | ||
tools=kwargs["tools"], | ||
) | ||
if len(kwargs["tools"]) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Wondering if there are any options for simplifying this:
- change
if "tools" in kwargs
toif kwargs.get("tools")
, which is Falsey for empty list; or - There's a method _should_stream which is called on both
.stream
and.astream
that controls whether we delegate to.invoke
. Could potentially override this on ChatOllama. See example here:langchain/libs/partners/openai/langchain_openai/chat_models/base.py
Lines 982 to 1002 in 00e7b2d
def _should_stream( self, *, async_api: bool, run_manager: Optional[ Union[CallbackManagerForLLMRun, AsyncCallbackManagerForLLMRun] ] = None, response_format: Optional[Union[dict, type]] = None, **kwargs: Any, ) -> bool: if isinstance(response_format, type) and is_basemodel_subclass(response_format): # TODO: Add support for streaming with Pydantic response_format. warnings.warn("Streaming with Pydantic response_format not yet supported.") return False if self.model_name is not None and self.model_name.startswith("o1"): # TODO: Add support for streaming with o1 once supported. return False return super()._should_stream( async_api=async_api, run_manager=run_manager, **kwargs )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the async case. Let me know if you see any issues.
Thanks!
Description: The issue concerns the unexpected behavior observed using the bind_tools method in LangChain's ChatOllama. When tools are not bound, the llm.stream() method works as expected, returning incremental chunks of content, which is crucial for real-time applications such as conversational agents and live feedback systems. However, when bind_tools([]) is used, the streaming behavior changes, causing the output to be delivered in full chunks rather than incrementally. This change negatively impacts the user experience by breaking the real-time nature of the streaming mechanism.
Issue: #26971