Offline Speech Recognition #2089 #2242

VladislavAntonyuk · 2024-10-01T11:14:19Z

Description of Change

Added 2 new methods for Offline Speech Recognition, Removed ListenAsync method, as it is impossible (with current implementation) to correctly Stop Listening the recognition. I also added a new State, allowing us to see if Listening is Active, but Silience.

Linked Issues

Fixes Offline SpeechToText #2089 , [BUG] SpeechToText throws Objective-C exception on iOS #1779 , [BUG] SpeechToText on iOS 17 cuts off or fails to recognize many words #1966

PR Checklist

Has a linked Issue, and the Issue has been approved(bug) or Championed (feature/proposal)
Has tests (if omitted, state reason in description)
Has samples (if omitted, state reason in description)
Rebased on top of main at time of PR
Changes adhere to coding standard
Documentation created or updated: https://github.com/MicrosoftDocs/CommunityToolkit/pulls

Additional information

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

bijington

@VladislavAntonyuk thanks for this. I think it looks great! I only had a few comments to get your thoughts on some things.

Finally, this might be outside of the scope of this PR but I wanted to raise it because I think it should be in a follow-up PR; I would love the ability to chain the 2 implementations together when registering with DI, something like:

builder.Services.AddSpeechToText(SpeechToText.Default).WithFallback(OfflineSpeechToText.Default);

And in theory developers could chain the other way round:

builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);

What do you think to the above? We might have to wrap this in another class rather than complicating the flow of the 2 current implementations so it probably is best in a follow-up PR.

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToText.shared.cs

...mmunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToTextImplementation.shared.cs

VladislavAntonyuk · 2024-10-14T14:01:09Z

Thank you @bijington.
if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);
he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

bijington · 2024-10-15T18:14:41Z

Thank you @bijington. if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default); he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

Yes I agree the developer will need to define the lifetime of the service which increases the complexity.

Perhaps we could move the WithFallback method onto the ISpeechToText interface instead, then the developer could write something like:

builder.Services.AddSingleton(OfflineSpeechToText.Default.WithFallback(SpeechToText.Default));

Then WithFallback could look something like:

public static void ISpeechToTextExtensions
{
    public ISpeechToText WithFallback(this ISpeechToText primary, ISpeechToText secondary)
    {
        return new PriorityBasedSpeechToText(primary, secondary);
    }
}

internal class PriorityBasedSpeechToText : ISpeechToText
{
    readonly ISpeechToText primaryService;
    readonly ISpeechToText secondaryService;

    public PriorityBasedSpeechToText(this ISpeechToText primary, ISpeechToText secondary)
    {
        primaryService = primary;
        secondaryService = secondary;
    }

    public Task<SpeechToTextRecognitionResult> StartListenAsync()
    {
        // attempt primary, if fails fallback to secondary...
    }
}

It could well become more complicated with things like permissions, so we may have to request all permissions for both primary and secondary.

What do you think?

VladislavAntonyuk · 2024-10-15T19:17:30Z

I see pros and cons of such approach.
As for developers it is much easier registering the service, but the strategy of choosing the right implementation maybe complicated (Users preferences may forbid online recognition, unstable internet connection, etc).
Also as WithFallback still receives interface as a parameter, developers may write such code:
services.AddSpeechToText(SpeechToText.Default).WithFallback(SpeechToText.Default);

We could technically hide the online/offline recognition in the implementation and keep single service. We had such implementation for Windows in our initial release.

The main idea of Offline Speech To Text to allow developers explicitly specify the required implementation.
Also I don't want they inject 2 separate interfaces in the service (MyService(ISpeechToText s1, IOfflineSpeechToText s2)), because there simultaneous usage may be unpredictable.

We can open a discussion for the next month.

…kit.Maui.Sample.csproj

Offline Speech Recognition #2089

2d8bb93

VladislavAntonyuk self-assigned this Oct 1, 2024

Merge branch 'main' into 2089-offline-speech-recognition

67f44a3

VladislavAntonyuk added breaking change This label is used for PRs that include a breaking change area/essentials Issue/Discussion/PR that has to do with Essentials labels Oct 1, 2024

VladislavAntonyuk requested a review from brminnick October 1, 2024 11:16

bijington reviewed Oct 1, 2024

View reviewed changes

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs Outdated Show resolved Hide resolved

VladislavAntonyuk added the needs discussion Discuss it on the next Monthly standup label Oct 1, 2024

brminnick added the hacktoberfest-accepted A PR that has been approved during Hacktoberfest label Oct 3, 2024

Offline Speech Recognition #2089 (#2258)

9b7e48d

VladislavAntonyuk requested a review from bijington October 5, 2024 14:35

VladislavAntonyuk removed the needs discussion Discuss it on the next Monthly standup label Oct 5, 2024

VladislavAntonyuk and others added 2 commits October 7, 2024 11:42

Fix build

07c4ac8

Merge branch 'main' into 2089-offline-speech-recognition

f3c1497

bijington reviewed Oct 14, 2024

View reviewed changes

VladislavAntonyuk and others added 2 commits October 14, 2024 16:53

Update according to comments

6c52500

Merge branch 'main' into 2089-offline-speech-recognition

27bf6ae

Fix tizen

e8a28b8

VladislavAntonyuk added 3 commits October 19, 2024 14:17

Merge branch 'main' into 2089-offline-speech-recognition

02d322a

Discard changes to samples/CommunityToolkit.Maui.Sample/CommunityTool…

14facc0

…kit.Maui.Sample.csproj

Discard changes to global.json

4e8b436

VladislavAntonyuk requested a review from bijington October 19, 2024 11:19

Merge branch 'main' into 2089-offline-speech-recognition

285e477

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline Speech Recognition #2089 #2242

Offline Speech Recognition #2089 #2242

VladislavAntonyuk commented Oct 1, 2024 •

edited

Loading

bijington left a comment

VladislavAntonyuk commented Oct 14, 2024

bijington commented Oct 15, 2024 •

edited

Loading

VladislavAntonyuk commented Oct 15, 2024

Offline Speech Recognition #2089 #2242

Are you sure you want to change the base?

Offline Speech Recognition #2089 #2242

Conversation

VladislavAntonyuk commented Oct 1, 2024 • edited Loading

Description of Change

Linked Issues

PR Checklist

Additional information

bijington left a comment

Choose a reason for hiding this comment

VladislavAntonyuk commented Oct 14, 2024

bijington commented Oct 15, 2024 • edited Loading

VladislavAntonyuk commented Oct 15, 2024

VladislavAntonyuk commented Oct 1, 2024 •

edited

Loading

bijington commented Oct 15, 2024 •

edited

Loading