Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming with OpenAIAgent is slower than using the same logic in Python #1555

Open
tintotechbee opened this issue Dec 8, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tintotechbee
Copy link

tintotechbee commented Dec 8, 2024

Introduction

Hello everyone, I am here again to report the issue I have faced related to the speed of receiving the stream response when using OpenAIAgent.chat() with function call and stream option. It's actually slower than using the same way in Python.

I wrote the code to compare the performance between using NodeJS and using Python to implement the same logic:

Implement with Python

Code

from llama_index.core import Settings
from llama_index.core.chat_engine.types import StreamingAgentChatResponse
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
import time
from dotenv import load_dotenv
load_dotenv()

Settings.llm = OpenAI(model="gpt-4o-mini")

def get_content():
    return f"""
        Seasonal influenza (influenza) is an acute respiratory infection caused by influenza viruses. 
        The disease is common worldwide and most people recover without treatment. 
        The flu virus spreads easily from person to person when they cough or sneeze. 
        Vaccines are the best way to prevent disease. 
        Symptoms of flu include sudden fever, cough, sore throat, muscle aches and fatigue. 
        Symptoms and signs: Flu symptoms usually begin about 2 days after exposure to an infected person.
    """

get_content_tool = FunctionTool.from_defaults(
    fn=get_content,
    name="get_content",
    description=f"""
        Use this tool to get information about the Seasonal influenza
    """
)

agent = OpenAIAgent.from_tools(
    [get_content_tool],
    verbose=True
)

start_time = time.time()
print(f"Start to get answer at {start_time}")

response: StreamingAgentChatResponse = agent.stream_chat("Tell me about Seasonal influenza")
result = ''

for chunk in response.response_gen:
    print(f"Chunk: {chunk}, After: {time.time()-start_time} seconds")
    result += chunk

print(result)

Log

# Result (Too long so I truncated the result at the chunk result)

Start to get answer at 1733665853.995188
Added user message to memory: Tell me about Seasonal influenza
=== Calling Function ===
Calling function: get_content with args: {}
Got output:
        Seasonal influenza (influenza) is an acute respiratory infection caused by influenza viruses.    
        The disease is common worldwide and most people recover without treatment.
        The flu virus spreads easily from person to person when they cough or sneeze.
        Vaccines are the best way to prevent disease.
        Symptoms of flu include sudden fever, cough, sore throat, muscle aches and fatigue.
        Symptoms and signs: Flu symptoms usually begin about 2 days after exposure to an infected person.

========================

Chunk: Season, After: 1.679663896560669 seconds
Chunk: al, After: 1.6801247596740723 seconds        
Chunk:  influenza, After: 1.6924858093261719 seconds
Chunk: ,, After: 1.6924858093261719 seconds
Chunk:  commonly, After: 1.7254602909088135 seconds
Chunk:  known, After: 1.7262098789215088 seconds
Chunk:  as, After: 1.7298998832702637 seconds   
Chunk:  the, After: 1.7298998832702637 seconds  
Chunk:  flu, After: 1.7382407188415527 seconds  
Chunk: ,, After: 1.7382407188415527 seconds     
Chunk:  is, After: 1.7382407188415527 seconds   
Chunk:  an, After: 1.7382407188415527 seconds   
Chunk:  acute, After: 1.765491008758545 seconds
Chunk:  respiratory, After: 1.765491008758545 seconds
Chunk:  infection, After: 1.7720947265625 seconds
Chunk:  caused, After: 1.7734062671661377 seconds
Chunk:  by, After: 1.7734062671661377 seconds
Chunk:  influenza, After: 1.7734062671661377 seconds
Chunk:  viruses, After: 1.7985939979553223 seconds
Chunk: ., After: 1.7985939979553223 seconds
Chunk:  It, After: 1.8924853801727295 seconds
Chunk:  is, After: 1.8926935195922852 seconds
...
Chunk:  high, After: 2.730426549911499 seconds
Chunk: -risk, After: 2.7313318252563477 seconds
Chunk:  groups, After: 2.736332416534424 seconds
Chunk: ,, After: 2.737325429916382 seconds
Chunk:  many, After: 2.737325429916382 seconds
Chunk:  people, After: 2.737325429916382 seconds
Chunk:  recover, After: 2.7418887615203857 seconds
Chunk:  with, After: 2.742650032043457 seconds
Chunk:  proper, After: 2.745450019836426 seconds
Chunk:  care, After: 2.745450019836426 seconds
Chunk: ., After: 2.7464444637298584 seconds

Seasonal influenza, commonly known as the flu, is an acute respiratory infection caused by influenza viruses. It is prevalent worldwide, and most individuals recover without the need for treatment. The flu virus spreads easily from person to person, particularly when an infected person coughs or sneezes.

### Key Points:
- **Prevention**: Vaccines are the most effective way to prevent seasonal influenza.
- **Symptoms**: Common symptoms include:
  - Sudden fever
  - Cough
  - Sore throat
  - Muscle aches
  - Fatigue
- **Onset**: Symptoms typically begin about 2 days after exposure to an infected person.

Overall, while seasonal influenza can be serious, especially for certain high-risk groups, many people recover with proper care.

Conclusion: So, after reviewing the result, we can know the first chunk started after 1.67 seconds and the stream progress finished after 2.746 seconds. By the way the duration between chunks is a bit large, we need to notice this point to compare with using Nodejs.

Implement with NodeJs

Code

import { 
  ContextChatEngine, 
  Document, 
  FunctionTool, 
  getResponseSynthesizer, 
  LLMAgent, OpenAI, OpenAIAgent, 
  OpenAIEmbedding, 
  PGVectorStore, 
  PromptTemplate, 
  ResponseMode, 
  RetrieverQueryEngine, 
  Settings, StorageContext, 
  storageContextFromDefaults, 
  TreeSummarizePrompt, 
  VectorStoreIndex 
} from "llamaindex";

const miliToSec = (mili:number) => mili/1000

const getContent = () => {
        return `
           Seasonal influenza (influenza) is an acute respiratory infection caused by influenza viruses. 
          The disease is common worldwide and most people recover without treatment. 
          The flu virus spreads easily from person to person when they cough or sneeze. 
          Vaccines are the best way to prevent disease. 
          Symptoms of flu include sudden fever, cough, sore throat, muscle aches and fatigue. 
          Symptoms and signs: Flu symptoms usually begin about 2 days after exposure to an infected person.
        `
}

const getContentTool = FunctionTool.from(
        getContent,
        {
          name:'get_content',
          description:'Use this tool to get information about the Seasonal influenza',
          parameters:{}
        }
)

const agent = new OpenAIAgent({
        tools:[getContentTool],
        verbose:true
})

const startTime = new Date().getTime()
console.log(`Start to get answer at ${miliToSec(startTime)}`)

const response = await agent.chat({
        message:'Tell me about Seasonal influenza',
        stream:true,
})

let result = ''

for await (const chunk of response) {
        console.log(`Chunk: ${chunk.message.content}, After: ${miliToSec(new Date().getTime()-startTime)} seconds`)
        result += chunk
}

console.log(result)

Log

# Result (Too long so I truncated the result at the chunk result)

Start to get answer at 1733666845.355

=== Calling Function ===
Calling function: get_content with args: {}

Got output:
Seasonal influenza (influenza) is an acute respiratory infection caused by influenza viruses.
The disease is common worldwide and most people recover without treatment.
The flu virus spreads easily from person to person when they cough or sneeze.
Vaccines are the best way to prevent disease.
Symptoms of flu include sudden fever, cough, sore throat, muscle aches and fatigue.
Symptoms and signs: Flu symptoms usually begin about 2 days after exposure to an infected person.

Chunk: Season, After: 2.68 seconds
Chunk: al, After: 2.681 seconds
Chunk:  influenza, After: 2.681 seconds
Chunk: ,, After: 2.681 seconds
Chunk:  commonly, After: 2.682 seconds
Chunk:  known, After: 2.682 seconds
Chunk:  as, After: 2.682 seconds
Chunk:  the, After: 2.683 seconds
Chunk:  flu, After: 2.683 seconds
Chunk: ,, After: 2.684 seconds
Chunk:  is, After: 2.684 seconds
Chunk:  an, After: 2.684 seconds
Chunk:  acute, After: 2.684 seconds
Chunk:  respiratory, After: 2.685 seconds
Chunk:  infection, After: 2.685 seconds
Chunk:  caused, After: 2.685 seconds
Chunk:  by, After: 2.686 seconds
Chunk:  influenza, After: 2.686 seconds
Chunk:  viruses, After: 2.686 seconds
Chunk: ., After: 2.686 seconds
...

Chunk:  after, After: 2.724 seconds
Chunk:  exposure, After: 2.724 seconds
Chunk:  to, After: 2.724 seconds
Chunk:  an, After: 2.725 seconds
Chunk:  infected, After: 2.725 seconds
Chunk:  person, After: 2.725 seconds
Chunk: ., After: 2.725 seconds
Chunk: , After: 2.726 seconds

Seasonal influenza, commonly known as the flu, is an acute respiratory infection caused by influenza viruses. It is prevalent worldwide, and most individuals recover without the need for treatment. The flu virus spreads easily from person to person, primarily through respiratory droplets when an infected person coughs or sneezes.

The best way to prevent seasonal influenza is through vaccination. Common symptoms of the flu include:

- Sudden fever
- Cough
- Sore throat
- Muscle aches
- Fatigue

Typically, flu symptoms begin about two days after exposure to an infected person.

Conclusion: So, the first chunk came after: 2.68 seconds and the stream progress finished after: 2.726 seconds.

Problems

  • If we compare with the result from Python, we can see the first chunk appeared after 2.68 seconds > 1.67 seconds of Python, this means Nodejs is slower than Python.
  • By the way, the duration between chunks is pretty small or we can say the time we got each chunk is at the same time. This is not same as the result from Python.
  • After all, I believe that NodeJS's Agent did not respond the first chunk when it appeared, maybe the Agent kept all the chunk and waited until all chunks came and responded at the same time.

What I expect

  • I expect that the first chunk of the response can come as soos as possible, like when I am using Python.
  • I don't know if my code in NodeJS to get the chunk from the stream response is correct or not? I'm afraid if I implement incorrectly it cause the stream response comes so slow.
  • This line may be not related to the issue but If i have the stream response and want to send each chunk for the client, how can i do that in NodeJS, especially in ExpressJS or NestJS ? I went through all the examples in https://stackblitz.com/github/run-llama/LlamaIndexTS/tree/main/examples but nothing talks about that. They only show the for-loop and console.log way to check the result.

Environment

  • OS: [Windows 10]
  • [ 21.4.0 ] Node.js
  • [ 0.8.22 ] llamaindex
  • [gpt-4o-mini] OpenAI

Thank you for taking time to have your eyes on my long post. I am very grateful about that.

@tintotechbee tintotechbee added the bug Something isn't working label Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant