-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial native message iface #93
Conversation
I will develop a more complete native message interface, with json communication that will accept commands from the browser. For now though it is a simple stdin listener...
I really appreciate the initiative and first draft here which gives us something, to begin with, discuss and synchronize ideas. I have tried to respond in a more general sense below as this is the early stages. This will need to be coordinated with at least @jelmervdl. There needs to be parity with the existing WebAssembly extension code (we can have both with minimal overhead, we don't need to sunset one then try other stuff). I'm certain we can drop the remote models feature in the first iteration. A desired feature in translateLocally is translations by language code (mentioned by @kpu already). The extension will most likely communicate in terms of language codes (@jelmervdl has expressed interest in doing so in slack if I remember correctly; If I were writing the JS, this is how I'd go). A "query by language to obtain model" feature with a wider scope is not strictly required when focusing on extension, but perhaps the one that fits in the best ((browsermt/bergamot-translator#320 and we can pool our efforts to make something query-able and better). On message content, we need to discuss both directions. For an MVP, I think we only need to translate and translateLocally should do the rest (model loading, going from lang-code to model, etc). Note that we will still need the download, etc you've proposed in code, just that we're not allowing extension client interaction with those verbs. Input {
"src": "en",
"tgt": "de",
"id": "<id>",
"input": "The quick brown fox jumps over the lazy dog.",
"options": {
"html": false,
"quality": true,
"alignments": true
}
} Output: on success {
"status": "ok",
"id": "<id>",
"response": "<toJSON(Response)>"
} Output: on failure {
"status": "error",
"id": "<id>",
"message": "<err.what()>"
} Edit: Included |
src/cli/NativeMsgIface.cpp
Outdated
std::cin.read(input.get(), ilen); | ||
std::cerr << "Received message size of: " << ilen << " with content: " << input.get() << std::endl; | ||
translator_->translate(QString(input.get())); | ||
eventLoop_.exec(); // Do not read any input before this translation is done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're doing serial? Perhaps we will allow multiple inputs and opts in this case then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining js doing the batching, but as you and Jelmer pointed out, probably it's best to do it on the CPP side. Will change.
In my extension I skipped the "load model" and "download & load model" messages because that's state that would need to be synced between both sides that are talking. However, it is useful to have these to "pre-load" the model before you start sending translation input. So let's keep them, but make sure that multiple calls to them, i.e. when the model is already loaded, don't cause trouble. Same for calling a download & load model when that specific model (and version) is already downloaded. Some message for getting a list of available models is great. If that doesn't scale (because too many models when you include remote repositories maybe?) you could limit it by search for source and target languages (lists for each sides though, because the extension often has a list of languages it thinks the page can be in, and same for languages the user might like.) I don't think there is need for a Let's not use I would also want to propose to make all messages async. I.e. each message received by this endpoint has a message id, and the response will also have that same message id, but there is no guarantee that the order of requests and responses are the same. This way we can
|
Yes, async is better. While there should be a managed default per language pair, language codes shouldn't be the sole identifier of a model. We already have speed variants and there will soon be OPUS overlap too. Search by pair makes sense here: that way pivoting and multilingual models can be hidden from the client rather than enumerating a long list of possibilities. A load message is good to reduce latency when typing happens. Recommend combining the "load model" and "download & load model" message so it's identical from the client. And the message should be idempotent as Jelmer suggests. Response could be download progress, loaded, or error. |
Suggestions heard, stand by later today for async variants. @jelmervdl I will keep using marianIface for now and replace it at a later point. |
The python tester allows to quickly test some changes. Extra work to be done. Pending.
The old translation method will not and I am not sure why. We need to create a new service anyways, but this is a problem for the Nick of tomorrow.
…n requests. Lock on stdout writing via callback Early versions. Several issues. Number one is termination. Is there a way of knowing when all service work is done so that we can safely die, before we have any queued translations pending?
First MVP-ablae version. It reads and writes json, including e mssage id (although I think I don't set the id yet, but that's easy to fix). What is missing:
|
AsyncService's destructor will block until all translation requests are processed and all workers have finished their tasks. So you can just let it go out of scope (or reset a unique ptr or something, anything that causes its destructor to be called) and after that you're safe. Think: int main() {
{
AsyncService service;
Message message;
while (getMessage(std::cin, message)) {
if (!models.has(message.model))
models[message.model] = loadModel(message.model);
service.translate(models[message.model], message.source, [](Response &&response){
scoped_lock(std::cout);
std::cout << toJSON(message.id, response);
});
}
// now service will be deallocated, which will block until it finished.
// At that point all requests are handled, all callbacks called.
}
std::cerr << "Done!\n";
return 0;
}
With AsyncService you don't need to wait for the service to finish. You can just load up another model, and when queueing new translation requests use that one. If you want to release models that are no longer used you can release its shared pointer (all pending requests that still need it will have a shared_ptr to it, so it will be released once that's safe to do.) (but how do you know that the next request won't need them)
For practical purposes, I'd only need |
I have the same inputs as @jelmervdl. However, I want to do something "incremental" for the following:
I gave this some thought while initially referring to https://github.com/mozilla/bergamot-translator/wiki/Unified-API-Example designing {
"source": {"text": "..."},
"target": {"text": "..."}
} We may drop the "source" key in the first iteration but the structure needs to incrementally grow and mimic C++ Response, staying consistent across JSON, C++, WebAssembly and Python bindings. I have also opened browsermt/bergamot-translator#350 to make WebAssembly consistent with the JSON so both can operate easily. |
@jerinphilip , we had a discussion offline with @jelmervdl about the stdout buffer size: https://stackoverflow.com/questions/10904067/in-c-whats-the-size-of-stdout-buffer This is why I am questioning what strictly needs to be in the json. It is otherwise relatively trivial for me to put extra things in there. |
Oh but it is more fun than that! I just noticed:
Source: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_messaging#app_side At least Firefox seems to be doing concurrent reading & writing using async IO so no deadlock because "it can't read from stdin while it is still trying to write to stdout" on our side I expect. (I haven't looked at other browsers.) this.readPromise = this.proc.stdout
.readUint32()
.then(len => {
if (len > NativeApp.maxRead) {
throw new ExtensionError(
`Native application tried to send a message of ${len} bytes, which exceeds the limit of ${NativeApp.maxRead} bytes.`
);
}
return this.proc.stdout.readJSON(len);
})
.then(…) Source: https://searchfox.org/mozilla-central/source/toolkit/components/extensions/NativeMessaging.jsm#165. So it seems long term we might need to add some sort of multipart messaging response messaging which I'm really dreading. How are you going to break up a large alignments table that in itself exceeds 1mb when encoded as JSON? Because the messages till have to each be valid JSON 😭 For an MVP I doubt that a Firefox extension will hit that limit, it wants small chunks of translated text anyway to make it feel more like loading a progressive JPEG. Edit: to clarify, for an MVP I think we can get away with locking/writing to stdout from the callback directly. Long term we'll need to move to some queue-based thing where there is a thread that reads the queue, encodes the responses into one or more messages, etc. We can then decide whether we want a bound or unbound queue (bound -> stop translating if nobody reads stdout, unbound -> continue translating hoping that someone starts reading stdout before we run out of memory) Edit 2: I do agree with Jerin that it would be good to have a matching output across most APIs. I don't see any problems with basically following bergamot-translator's Response object. But it looks like the native messaging API might need an extra layer of protocol on top of it to deal with the message size limits. I'll also have a look around how others deal with this issue. |
To be clear, I am ideally not looking at exporting the hard-alignments table. But life is simpler if I have a JSON representation of the struct so I quickly know what to edit. Alignments are in response consumption at HTML, we're just choosing not to clear it out because WebAssembly VM would have already allocated it. The simpler version of what I'm trying to say is keep the keys you want from the existing composition of
|
If we want the NativeInterface response json to 100% resemble response itself, the proper thing to do would be for |
True, except that ideally we don't want to deal with JSON strings in most interface implementations. The Python bindings convert a response to native python objects. WASM hides the serialisation/deserialisation in the emscripten bindings (which does a lot on demand instead of up front). The part where we communicate with a JSON string is specific to this interface. Just the names of things, and how they are nested, is the same across all interfaces. |
@jelmervdl I see. The response struct has multiple sub-structs and some of them have sub-structs. Is there one place where I can view the whole scheme of the configuration? |
I don't have one 😅 Maybe @jerinphilip does. Things like the Annotation part of an AnnotatedText I have never used outside C++, so I don't know what the interface to that would be. Internally it works with a sort of byte offset array, but I'm not sure we would want to expose that implementation detail via JSON… I suppose figuring this out is worth an issue on its own. For now, really the only important bit is |
@jelmervdl currently it looks like: ./native_client.py
(None, None)
b'/\x00\x00\x00{"text": "Hello world!", "id": 0, "die": false}e\x00\x00\x00{"text": "Sticks and stones may break my bones but words WILL NEVER HURT ME!", "id": 1, "die": false}?\x00\x00\x00{"text": "Why does it not work well...", "id": 2, "die": false}'
Loading model: English-German tiny
{
"id": 2,
"target": {
"text": "Warum funktioniert es nicht gut..."
}
}
{
"id": 0,
"target": {
"text": "Hallo Welt!"
}
}
{
"id": 1,
"target": {
"text": "Stöcke und Steine können mir die Knochen brechen, aber Worte werden NIEMALS HURT ME!"
}
} Tomorrow, I will add model loading functionality. |
Removed service shared pointer and put it inside the thread. Clean shutdown. Removed syncrhonisation with C style iostream. Modified the json
…ifest.json extension-id No way to change that. So easy hack for now: if you see the extension-id as an argument, switch to nativeMessaging.
And flush, otherwise the message won't make it directly to Firefox
I need HTML in the extension because there is no non-HTML mode anymore
I've pushed some additions that were necessary to get it to work with Firefox, but this now works with https://github.com/jelmervdl/firefox-translations/tree/native-messaging (given you go to a page that matches the model that translateLocally selected) |
Wrapped one string in tr, removed unneeded headers and forward defines and used constexpr if template instead of overloaded functions
…stModels and DownloadModel success messages
This code should now work with the latest version of my Firefox plugin. Be sure to switch to TranslateLocally in the extension settings. (I have only tested it on macOS so far, but will test it with Windows and Ubuntu when I have the time for those VMs to boot up, install Firefox in them, etc.) |
… I think
Native messaging looks broken on Windows. If I try the python test script there, it does not read the full input. So either the length at the start of the message is incorrect, or it writes more bytes than the reported length. I don't think it is an issue on the Python side since Firefox also reports not being able to parse the returned JSON data. std::cout doesn't automatically try to convert newlines or anything, right? |
Might be an issue due to Windows using utf 16 by default? |
Apparently windows does do automatic line ending conversion on files opened in text mode, and stdin/out/err are all opened in text mode by default. See also https://stackoverflow.com/questions/16888339/what-is-the-simplest-way-to-write-to-stdout-in-binary-mode#16888382 |
Okay, found that as well. Waiting for browsermt/bergamot-translator#396 to be merged, and then I'll pull that into one final commit to get browser translations on Windows to work out of the box. |
Documentation is explicit about only calling it with unsigned char, and Windows' C++ runtime is checking for that.
I've confirmed that native client now works with Firefox and the browser extension out of the box (well, you have to start TranslateLocally once) in Windows 10, macOS 12.3 and Ubuntu 20.10. 🎊 |
We should add the native message configuration to the read me. |
@jelmervdl , could you please add to the README.md instructions how to use the native message interface, and then we can merge this. |
So nobody gets hurt
I added a brief description about the native messaging functionality. It is mostly just pointers to where to find the information you'd need if you want to develop for it. Let me know whether you think anything is missing. |
I think this is fine for now. We can do further prs like the one that would fix #99 later on. I'm in favour of merging it. |
I will develop a more complete native message interface, with json communication that will accept commands from the browser. For now though it is a simple stdin listener...
To test you can use the following python code:
Then you get to experience some buffer overflow because it's 1:30 AM and I will fix it tmrw..
Needless to say this is an early draft and has a lot of space for improvements.
I propose that this client will accept the following json encoded commands:
It will also respond to all of them with some well formatted json that we can read in the browser.
@kpu @jelmervdl @jerinphilip comment on json.