Review the Media APIs for multimodality #1562

ThomasVitale · 2024-10-18T22:05:50Z

The current Media API provides a generic way of adding multimedia content to a prompt when calling models with multimodality support.

So far, the Media API has been used for images. Starting with #1560, it's also used for audio files. It's working correctly, but there are two points that might be improved:

The MimeTypeUtils from Spring Framework doesn't include any audio-related mime types. Therefore, developers need to use an explicit one, such as MimeTypeUtils.parseMimeType("audio/mp3"). Perhaps we can introduce an audio-specific utility in Spring AI?
When the media content is extracted from the Spring AI UserMessage into the provider-specific APIs (such as OpenAI), there's no immediate way to filter the media content based on whether it's image or audio content. For now, support only exists in the OpenAI integration and the audio content is checked individually (see: OpenAI - Support audio input modality #1561), where there's also the additional challenge of mapping mime type to an OpenAI-specific enum. There might be room for streamlining this logic.

The text was updated successfully, but these errors were encountered:

ThomasVitale mentioned this issue Oct 19, 2024

Reflect well-known MediaTypes intent in Javadoc spring-projects/spring-framework#33754

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review the Media APIs for multimodality #1562

Review the Media APIs for multimodality #1562

ThomasVitale commented Oct 18, 2024

Review the Media APIs for multimodality #1562

Review the Media APIs for multimodality #1562

Comments

ThomasVitale commented Oct 18, 2024