Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to pass images into llama index query engine. #17309

Open
1 task done
AyushParikh opened this issue Dec 18, 2024 · 2 comments
Open
1 task done

[Question]: How to pass images into llama index query engine. #17309

AyushParikh opened this issue Dec 18, 2024 · 2 comments
Labels
question Further information is requested

Comments

@AyushParikh
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hey, I want to use multi modal for llama index query. How can I pass in the images into my query to get a response. I checked the documentation, but couldn't find a clear a answer.

    query_engine = index.as_query_engine()
    response = query_engine.query(query)
@AyushParikh AyushParikh added the question Further information is requested label Dec 18, 2024
Copy link

dosubot bot commented Dec 18, 2024

Hey @AyushParikh, I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this challenge together!

To pass images into the Llama Index query engine for multi-modal functionality and get a response, you can use the image_query method. Here's a basic example of how you can achieve this:

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core import PromptTemplate

# Define the prompt template
qa_tmpl_str = (
    "Given the images provided, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)

# Initialize the OpenAI multi-modal model
openai_mm_llm = OpenAIMultiModal(
    model="gpt-4o", api_key=OPENAI_API_KEY, max_new_tokens=1500
)

# Create the query engine
query_engine = index.as_query_engine(
    llm=openai_mm_llm, image_qa_template=qa_tmpl
)

# Define the query string
query_str = "Tell me more about the relationship between those paintings."

# Perform the image query
response = query_engine.image_query("./mixed_wiki/2.jpg", query_str)

# Print the response
print(response)

This setup allows you to provide an image path and a prompt string to query the engine. Make sure to replace "./mixed_wiki/2.jpg" with the path to your image and adjust the query_str as needed for your specific query [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

Images cab be inputs to query engines, at least not directly

There is a multi modal retriever that can take in images and return images. And from there, you can call an llm to synthesize a response

Feel free to check out the many multimodal pages in the docs

https://docs.llamaindex.ai/en/stable/use_cases/multimodal/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants