You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current Media API provides a generic way of adding multimedia content to a prompt when calling models with multimodality support.
So far, the Media API has been used for images. Starting with #1560, it's also used for audio files. It's working correctly, but there are two points that might be improved:
The MimeTypeUtils from Spring Framework doesn't include any audio-related mime types. Therefore, developers need to use an explicit one, such as MimeTypeUtils.parseMimeType("audio/mp3"). Perhaps we can introduce an audio-specific utility in Spring AI?
When the media content is extracted from the Spring AI UserMessage into the provider-specific APIs (such as OpenAI), there's no immediate way to filter the media content based on whether it's image or audio content. For now, support only exists in the OpenAI integration and the audio content is checked individually (see: OpenAI - Support audio input modality #1561), where there's also the additional challenge of mapping mime type to an OpenAI-specific enum. There might be room for streamlining this logic.
The text was updated successfully, but these errors were encountered:
The current
Media
API provides a generic way of adding multimedia content to a prompt when calling models with multimodality support.So far, the
Media
API has been used for images. Starting with #1560, it's also used for audio files. It's working correctly, but there are two points that might be improved:The
MimeTypeUtils
from Spring Framework doesn't include any audio-related mime types. Therefore, developers need to use an explicit one, such asMimeTypeUtils.parseMimeType("audio/mp3")
. Perhaps we can introduce an audio-specific utility in Spring AI?When the media content is extracted from the Spring AI
UserMessage
into the provider-specific APIs (such as OpenAI), there's no immediate way to filter the media content based on whether it's image or audio content. For now, support only exists in the OpenAI integration and the audio content is checked individually (see: OpenAI - Support audio input modality #1561), where there's also the additional challenge of mapping mime type to an OpenAI-specific enum. There might be room for streamlining this logic.The text was updated successfully, but these errors were encountered: