Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review the Media APIs for multimodality #1562

Open
ThomasVitale opened this issue Oct 18, 2024 · 0 comments
Open

Review the Media APIs for multimodality #1562

ThomasVitale opened this issue Oct 18, 2024 · 0 comments

Comments

@ThomasVitale
Copy link
Contributor

The current Media API provides a generic way of adding multimedia content to a prompt when calling models with multimodality support.

So far, the Media API has been used for images. Starting with #1560, it's also used for audio files. It's working correctly, but there are two points that might be improved:

  1. The MimeTypeUtils from Spring Framework doesn't include any audio-related mime types. Therefore, developers need to use an explicit one, such as MimeTypeUtils.parseMimeType("audio/mp3"). Perhaps we can introduce an audio-specific utility in Spring AI?

  2. When the media content is extracted from the Spring AI UserMessage into the provider-specific APIs (such as OpenAI), there's no immediate way to filter the media content based on whether it's image or audio content. For now, support only exists in the OpenAI integration and the audio content is checked individually (see: OpenAI - Support audio input modality #1561), where there's also the additional challenge of mapping mime type to an OpenAI-specific enum. There might be room for streamlining this logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant