Add Multi Modal Bedrock Integration #17451

dnandha · 2025-01-07T21:27:25Z

Description

This PR adds support for AWS Bedrock multi-modal models in LlamaIndex. The integration allows users to interact with Bedrock's multi-modal models (currently Claude 3 family) for image and text analysis tasks.

Key features:

Support for all Claude 3 multi-modal models in Bedrock
Proper handling of image inputs through both file paths and base64 encoding
Token count tracking from Bedrock API responses
Comprehensive AWS authentication methods
Retry logic for API resilience
Both synchronous and asynchronous APIs

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating?

Yes
No

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Tests include:

Class inheritance and initialization
Model validation
Synchronous and asynchronous completion
Image input handling
Response parsing
Token count tracking

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

Implementation Details

The implementation includes:

BedrockMultiModal class that inherits from MultiModalLLM
Support for both file-based and base64-encoded image inputs
Proper message formatting for Bedrock API
AWS credential resolution with multiple authentication methods
Retry logic for API resilience
Token count tracking from API response headers
Comprehensive documentation and examples

Testing

The package includes unit tests that cover:

Model initialization and validation
Image input handling
API request formatting
Response parsing
Error handling
Token count tracking

All tests are passing and the code follows LlamaIndex's coding standards.

Solves #13507.

logan-markewich · 2025-01-07T23:53:29Z

I appreciate the PR @dnandha -- but actually, Im inclined to not merge this. We are in the middle of deprecating our multimodal llm classes, and building in multi modal capabilities directly into the existing LLM class

For example, anthropic:

llama_index/llama-index-integrations/llms/llama-index-llms-anthropic/llama_index/llms/anthropic/utils.py

Line 172 in db29c90

elif isinstance(block, ImageBlock):

OpenAI:

llama_index/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py

Line 274 in db29c90

elif isinstance(block, ImageBlock):

Usage:

logan-markewich · 2025-01-07T23:54:45Z

Issue we are tracking the migration with
#15949

dnandha · 2025-01-08T08:02:58Z

Hey @logan-markewich . Thanks for looking into the PR so quickly and sharing your concern. I wasn't aware about the issue you mentioned, but see your point. However, after checking your standpoint, my suggestion would still be merging the PR:

This integration can serve as a basis for the migration to llms with ImageBlock
It gives users something to work with, currently it's not possible to work with Bedrock multi modal models with images
The tutorials all build upon multi-modal-llms: https://docs.llamaindex.ai/en/stable/examples/multi_modal/anthropic_multi_modal/: it would be easy to add a new page for bedrock. later, after deprecation of multi-modal-llms the tutorials could all redirect readers to the llm integration

Since the work has already been done, my suggestion would be the following:

Review / Merge PR
Add another checkbox for Bedrock in Consolidate MultiModal LLMs with Base LLMs #15949
Work on migration into llms (I can assist)
Add a "TO BE DEPRECATED" note with a link to Consolidate MultiModal LLMs with Base LLMs #15949 to all existing multi-modal-integrations folders, so that other developers know

Let me know your thoughts on this.

logan-markewich · 2025-01-10T15:50:45Z

@dnandha the unit tests are failing, can you take a look? If these are relying on actual api calls, you might have to mock them

add multimodal bedrock integration

001e69a

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 7, 2025

linting + BUILD files

aa05077

dnandha and others added 2 commits January 10, 2025 19:43

Merge branch 'run-llama:main' into multi-modal-llms-bedrock

80532cc

mock bedrock response

6495865

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multi Modal Bedrock Integration #17451

Add Multi Modal Bedrock Integration #17451

dnandha commented Jan 7, 2025

logan-markewich commented Jan 7, 2025

logan-markewich commented Jan 7, 2025

dnandha commented Jan 8, 2025

logan-markewich commented Jan 10, 2025

Add Multi Modal Bedrock Integration #17451

Are you sure you want to change the base?

Add Multi Modal Bedrock Integration #17451

Conversation

dnandha commented Jan 7, 2025

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Checklist:

Implementation Details

Testing

logan-markewich commented Jan 7, 2025

logan-markewich commented Jan 7, 2025

dnandha commented Jan 8, 2025

logan-markewich commented Jan 10, 2025