Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

Open
ghost opened this issue Jan 12, 2025 · 6 comments
Open
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Jan 12, 2025

Describe the solution you'd like

automatic voice recognition/text ocr/attachment to text preview
When user uploads a audio file automatically transcript the file with whisper

Type of feature

User Experience (UX)

Additional context

use https://github.com/sist2app/sist2 as an attachment analyzer, and read its index sqlite file

@ghost ghost added the enhancement New feature or request label Jan 12, 2025
@ghost ghost changed the title automatic voice recognition with whisper automatic voice recognition/text ocr with http api Jan 13, 2025
@ghost ghost changed the title automatic voice recognition/text ocr with http api automatic voice recognition/text ocr with sist2 Jan 13, 2025
@massimo-ua
Copy link

Hey @finch71 Nice feature to work on. Let me know if you need some help

@ghost
Copy link
Author

ghost commented Jan 14, 2025

I can set up sist2. it works nicely in its own GUI (can also be started with cli) and i can see filepath in sqlite but i don't know how to program in golang. Ideally one can get all these information from this external sqlite database or query sist2 GUI via HTTP requests (may require authentication)

@massimo-ua
Copy link

massimo-ua commented Jan 14, 2025

So the sist2 is considered as a attachment content indexer, isn't it? And whisper is in the loop just because sist2 can't handle audio transcription.
According to my understanding of current memos architecture the feature OCR/voice transcribing will require separate setup for sist2 with it's own sqlite or another storage instance. So IMHO it might be implemented as a plugin (so the users have a freedom of choice whether to use it or not). The plugin once it's enabled might run as a sidecar to the main app and interact with memos via it's api

@ghost
Copy link
Author

ghost commented Jan 14, 2025

yes, but there is not a standardized api for "any attachment to text" like an openai api

however, many program rely on these and has to bundle libraries with huge model files into the program, for example, sist2 and memos

@ghost ghost changed the title automatic voice recognition/text ocr with sist2 automatic voice recognition/text ocr/attachment to text preview with sist2 Jan 14, 2025
@massimo-ua
Copy link

But memos is positioning itself as a self-hosted solution so I don’t think it should embed LLM that’s why I’m saying about plugin

@massimo-ua
Copy link

massimo-ua commented Jan 14, 2025

image Does this seem relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant