Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline Speech Recognition #2089 #2242

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

VladislavAntonyuk
Copy link
Collaborator

@VladislavAntonyuk VladislavAntonyuk commented Oct 1, 2024

Description of Change

Added 2 new methods for Offline Speech Recognition, Removed ListenAsync method, as it is impossible (with current implementation) to correctly Stop Listening the recognition. I also added a new State, allowing us to see if Listening is Active, but Silience.

Linked Issues

PR Checklist

  • Has a linked Issue, and the Issue has been approved(bug) or Championed (feature/proposal)
  • Has tests (if omitted, state reason in description)
  • Has samples (if omitted, state reason in description)
  • Rebased on top of main at time of PR
  • Changes adhere to coding standard
  • Documentation created or updated: https://github.com/MicrosoftDocs/CommunityToolkit/pulls

Additional information

@VladislavAntonyuk VladislavAntonyuk self-assigned this Oct 1, 2024
@VladislavAntonyuk VladislavAntonyuk added breaking change This label is used for PRs that include a breaking change area/essentials Issue/Discussion/PR that has to do with Essentials labels Oct 1, 2024
@VladislavAntonyuk VladislavAntonyuk added the needs discussion Discuss it on the next Monthly standup label Oct 1, 2024
@brminnick brminnick added the hacktoberfest-accepted A PR that has been approved during Hacktoberfest label Oct 3, 2024
@VladislavAntonyuk VladislavAntonyuk removed the needs discussion Discuss it on the next Monthly standup label Oct 5, 2024
Copy link
Contributor

@bijington bijington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladislavAntonyuk thanks for this. I think it looks great! I only had a few comments to get your thoughts on some things.

Finally, this might be outside of the scope of this PR but I wanted to raise it because I think it should be in a follow-up PR; I would love the ability to chain the 2 implementations together when registering with DI, something like:

builder.Services.AddSpeechToText(SpeechToText.Default).WithFallback(OfflineSpeechToText.Default);

And in theory developers could chain the other way round:

builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);

What do you think to the above? We might have to wrap this in another class rather than complicating the flow of the 2 current implementations so it probably is best in a follow-up PR.

@VladislavAntonyuk
Copy link
Collaborator Author

Thank you @bijington.
if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);
he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

@bijington
Copy link
Contributor

bijington commented Oct 15, 2024

Thank you @bijington. if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default); he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

Yes I agree the developer will need to define the lifetime of the service which increases the complexity.

Perhaps we could move the WithFallback method onto the ISpeechToText interface instead, then the developer could write something like:

builder.Services.AddSingleton(OfflineSpeechToText.Default.WithFallback(SpeechToText.Default));

Then WithFallback could look something like:

public static void ISpeechToTextExtensions
{
    public ISpeechToText WithFallback(this ISpeechToText primary, ISpeechToText secondary)
    {
        return new PriorityBasedSpeechToText(primary, secondary);
    }
}

internal class PriorityBasedSpeechToText : ISpeechToText
{
    readonly ISpeechToText primaryService;
    readonly ISpeechToText secondaryService;

    public PriorityBasedSpeechToText(this ISpeechToText primary, ISpeechToText secondary)
    {
        primaryService = primary;
        secondaryService = secondary;
    }

    public Task<SpeechToTextRecognitionResult> StartListenAsync()
    {
        // attempt primary, if fails fallback to secondary...
    }
}

It could well become more complicated with things like permissions, so we may have to request all permissions for both primary and secondary.

What do you think?

@VladislavAntonyuk
Copy link
Collaborator Author

I see pros and cons of such approach.
As for developers it is much easier registering the service, but the strategy of choosing the right implementation maybe complicated (Users preferences may forbid online recognition, unstable internet connection, etc).
Also as WithFallback still receives interface as a parameter, developers may write such code:
services.AddSpeechToText(SpeechToText.Default).WithFallback(SpeechToText.Default);

We could technically hide the online/offline recognition in the implementation and keep single service. We had such implementation for Windows in our initial release.

The main idea of Offline Speech To Text to allow developers explicitly specify the required implementation.
Also I don't want they inject 2 separate interfaces in the service (MyService(ISpeechToText s1, IOfflineSpeechToText s2)), because there simultaneous usage may be unpredictable.

We can open a discussion for the next month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/essentials Issue/Discussion/PR that has to do with Essentials breaking change This label is used for PRs that include a breaking change hacktoberfest-accepted A PR that has been approved during Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offline SpeechToText
3 participants