automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

ghost · 2025-01-12T01:42:42Z

Describe the solution you'd like

automatic voice recognition/text ocr/attachment to text preview
When user uploads a audio file automatically transcript the file with whisper

Type of feature

User Experience (UX)

Additional context

use https://github.com/sist2app/sist2 as an attachment analyzer, and read its index sqlite file

massimo-ua · 2025-01-13T18:50:58Z

Hey @finch71 Nice feature to work on. Let me know if you need some help

ghost · 2025-01-14T01:49:47Z

I can set up sist2. it works nicely in its own GUI (can also be started with cli) and i can see filepath in sqlite but i don't know how to program in golang. Ideally one can get all these information from this external sqlite database or query sist2 GUI via HTTP requests (may require authentication)

massimo-ua · 2025-01-14T08:24:47Z

So the sist2 is considered as a attachment content indexer, isn't it? And whisper is in the loop just because sist2 can't handle audio transcription.
According to my understanding of current memos architecture the feature OCR/voice transcribing will require separate setup for sist2 with it's own sqlite or another storage instance. So IMHO it might be implemented as a plugin (so the users have a freedom of choice whether to use it or not). The plugin once it's enabled might run as a sidecar to the main app and interact with memos via it's api

ghost · 2025-01-14T09:27:27Z

yes, but there is not a standardized api for "any attachment to text" like an openai api

however, many program rely on these and has to bundle libraries with huge model files into the program, for example, sist2 and memos

massimo-ua · 2025-01-14T13:15:23Z

But memos is positioning itself as a self-hosted solution so I don’t think it should embed LLM that’s why I’m saying about plugin

massimo-ua · 2025-01-14T14:35:26Z

Does this seem relevant?

ghost added the enhancement New feature or request label Jan 12, 2025

ghost changed the title ~~automatic voice recognition with whisper~~ automatic voice recognition/text ocr with http api Jan 13, 2025

ghost changed the title ~~automatic voice recognition/text ocr with http api~~ automatic voice recognition/text ocr with sist2 Jan 13, 2025

ghost changed the title ~~automatic voice recognition/text ocr with sist2~~ automatic voice recognition/text ocr/attachment to text preview with sist2 Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

ghost commented Jan 12, 2025 •

edited by ghost

Loading

massimo-ua commented Jan 13, 2025

ghost commented Jan 14, 2025 •

edited by ghost

Loading

massimo-ua commented Jan 14, 2025 •

edited

Loading

ghost commented Jan 14, 2025

massimo-ua commented Jan 14, 2025

massimo-ua commented Jan 14, 2025 •

edited

Loading

automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

automatic voice recognition/text ocr/attachment to text preview with sist2 #4291

Comments

ghost commented Jan 12, 2025 • edited by ghost Loading

Describe the solution you'd like

Type of feature

Additional context

massimo-ua commented Jan 13, 2025

ghost commented Jan 14, 2025 • edited by ghost Loading

massimo-ua commented Jan 14, 2025 • edited Loading

ghost commented Jan 14, 2025

massimo-ua commented Jan 14, 2025

massimo-ua commented Jan 14, 2025 • edited Loading

ghost commented Jan 12, 2025 •

edited by ghost

Loading

ghost commented Jan 14, 2025 •

edited by ghost

Loading

massimo-ua commented Jan 14, 2025 •

edited

Loading

massimo-ua commented Jan 14, 2025 •

edited

Loading