- Type: Design notes
- Author: Ilmir Usmanov
- Contributors: Anton Banykh, Zalim Bashorov, Ilya Gorbunov, Svyatoslav Kuzmich, Roman Elizarov.
- Status: Under consideration
- Discussion and feedback: #241
While other Kotlin back-ends support multithreading natively, Kotlin/JS does not. There are two reasons for the limitations. First, the code that goes to the worker thread needs to be known at compile-time so that the compiler puts it in either a separate JavaScript file or in a blob. Second, the worker thread does not have access to the main thread's address space. The only possible way of communication between threads is message passing. Thus, all the data needs to be sent to the worker. In other words, worker threads are more like processes in other back-ends, rather than threads.
Currently, if one wants to create a worker, there are two ways to do this.
Either create a separate Kotlin/JS module, compile it into a separate JavaScript
file, and call the Worker
constructor, passing the file's path as a parameter.
Or, wrap the worker code in Blob
and pass the blob to the Worker
constructor. Additionally, Mozilla's workers and Node.js's worker threads
provide different APIs.
All in all, comparing to other back-ends, the support of web workers is minimal.
In Kotlin/JVM, for example, the thread
function provides much more
flexibility.
Before I propose a solution, I would like to list the limitations, which cannot be avoided and should be taken into account.
All the communication between main and worker threads are done via message
passing or SharedArrayBuffer
. I will stick to message passing and not mention
SharedArrayBuffer
since it is not required. That, however, can change in the
future.
Not everything can be sent to the worker. In short, only objects and primitives, which can be copied using structured cloning algorithm can be sent to the worker. The most notable objects, which are not supported, are functions, the prototype chain, and DOM nodes.
To send the object, one should call the Worker.postMessage
function. On the
worker side, one should register an event listener for the message
event and
return a result to the main thread via the global postMessage
function (I will
use modern JavaScript for clarity):
// main.js
const worker = Worker("worker.js")
worker.addEventListener("message", message => {
const { data } = message
console.log(data.value)
}
worker.postMessage({value: "Ping"})
// worker.js
addEventListener("message", message => {
const { data } = message
if (data.value == "Ping")
postMessage("Pong")
}
the example will print Pong
, as expected.
Since there are limitations on what can be sent to the worker, and since the
worker does not have access to the main thread's address space, we cannot run
any code on the worker. Most notable, unlike the thread
function, our yet
theoretical worker
function cannot accept any lambda. The lambda is not
copyable, and, thus, the lambda's code should be known at compile-time so that
the compiler can put it in a separate JavaScript file.
With these limitations in mind, let us tackle how we can solve the problem.
First, let us see what we want to get. In Kotlin/JVM, we would write something like
fun main() {
var result: String? = null
thread {
result = "Pong"
}.join()
println(result!!)
}
In Kotlin/JS, however, we cannot block the main thread. Thus, the Kotlin/JS version of the same code would look like this:
fun main() {
worker {
"Pong"
}.then {
println(it)
}
}
So, in the stdlib, there will be an intrinsic worker.
@ImplicitCodeColoring(forbidsPackages = ["org.w3c.dom.**"])
fun <T> worker(block: WorkerGlobalScope.() -> T): Worker<T> = error("intrinsic")
The compiler puts the argument of the intrinsic to a separate JavaScript file. The intrinsic accepts only block lambdas, i.e., the lambdas like
worker {
// code
}
and passing a lambda stored in a variable is an error:
val c: WorkerGlobalScope.() -> Unit = {}
worker(c) // compilation error
This restriction exists since the compiler should be able to determine which code goes to a separate JavaScript file, and we cannot send functions and lambdas to a worker.
While WorkerGlobalScope
comes from JavaScript, Worker
is added to stdlib:
interface Worker<out T> {
fun then(block: (T) -> Unit): Worker<T>
fun catch(block: (Throwable) -> Unit): Worker<T>
}
then
and catch
, unlike the worker
itself, accept variables, not just
blocks. These functions are transformed into an event listener (see
Compilation Model). Initially, Worker
's instances will not be able to be
stored in variables because of captured variables. More on that later.
Just an intrinsic, however, is not enough for anything useful. If we do not allow any code transfer, we can still copy all functions called inside the worker block to the JavaScript file, and functions, which they call transitively, as long as they do not capture lambdas or call forbidden APIs.
To do that, the compiler marks the functions in the call graph and checks whether they are safe to use in a worker. As long as they do not manipulate DOM or call lambdas (except parameter lambdas, since they come from the caller, and the compiler already colored them), they are considered safe. For example,
fun foo(): Int {
return bar
}
val bar = 1
// ....
worker {
foo()
bar
}
compiles and works fine. However,
val c = { foo() }
worker {
c()
::foo
}
results is a compilation error since functional types are not copyable. This compiler check does not require explicitly marking every function, which can be used in a worker with a modifier or an annotation. Instead, the compiler implicitly colors the call graph. Hence, the name, implicit code coloring, as opposing to explicit code coloring, which coroutines use.
The compiler can check not every function. External functions, for example,
cannot be analyzed. Thus, to use them in a worker, one should annotate them with
@WorkerSafe
annotation:
@WorkerSafe
external fun foo(): Int
worker {
foo()
}
master
intrinsic is a companion of the worker
intrinsic. It transfers
execution back to the main thread:
fun <T> master(block: Window.() -> T): Worker<T> = error("intrinsic")
Instead of putting its argument to a separate JavaScript file, the compiler puts it to the main one.
With this intrinsic, one can write
val hello = "Hello"
worker {
val message = "$hello World!"
master {
window.alert(message)
}
}
and the program will show an alert. The compiler does not check functions used
in master
's block.
Initially, workers will not be able to create other workers. So,
master { worker { } }
workaround can be used.
Again, postMessage
can transfer not every data type.
Initially, a prototype will support only @Serializable
classes, primitives,
including strings and arrays, and collections of these types.
Capturing any other type into the colored function will result in a compilation
error.
After inlining, the compiler colors call graph, including virtual functions. Because of the closed world model, all overrides of the function are reachable and can be checked for forbidden APIs usage. Then it copies all the code required for the worker in a separate JavaScript file.
The worker
's block is transformed:
- All
return
s are replaced withpostMessage
calls. - All captured variable accesses are replaced with message content accesses.
- The block is wrapped in a try block and all caught exceptions are sent to the main thread.
- The block itself is placed inside an event listener.
- Event listener for
error
event is added, which always throwsIllegalStateException
.
Since all user-defined exceptions are passed via messages, error
event means,
that the compiler has generated invalid code.
In the main script
worker
intrinsic call is replaced withWorker
constructor.then
andcatch
calls are replaced with event listeners.- Captured variables are placed into the message.
For example (all example are just illustrations of the idea, in reality, it is likely, that variables will be serialized and deserialized, at least, on the prototype stage):
worker {
if (foo == bar) return@worker "Equal"
return@worker "Not equal"
}.then { println (it)}.catch { throw it }
is transformed into
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $captured } } = $message
if ($kind == "START") {
if ($captured.foo == $captured.bar) {
postMessage({
$kind: "RETURN",
$result: "Equal",
$captured: {foo: $captured.foo, bar: $captured.bar}
})
return
}
postMessage({
$kind: "RETURN",
$result: "Not equal",
$captured: {foo: $captured.foo, bar: $captured.bar}
})
return
}
}
// main.js
const $worker_0 = Worker("worker_0.js")
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "RETURN") {
foo = $captured.foo
bar = $captured.bar
println($result)
} else if ($kind == "THROW") {
foo = $captured.foo
bar = $captured.bar
throw $error
}
})
$worker_0.addEventListener("error", (err) => {
throw IllegalStateException(
"Internal worker error: " + err.message +
" at " + err.filename + ":" + err.lineno
)
})
$worker_0.postMessage({
$kind: "START",
$captured: {foo: foo, bar: bar}
})
I will omit error
's event listener in later examples, since its content is
always the same.
You might have noticed that captured variables are sent back to the main script. This is because updates to them should be visible in the main script.
For example:
val result = mutableListOf<String>()
worker {
result.add("OK")
}.then {
println(result[0])
}
is transformed into
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $captured } } = $message
if ($kind == "START") {
$captured.result += "OK"
postMessage({
$kind: "RETURN",
$result: Unit,
$captured: {result: $captured.result}
})
return
}
}
// main.js
var result = //...
const $worker_0 = Worker("worker_0.js")
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "RETURN") {
result = $captured.result
println(result[0])
}
})
$worker_0.postMessage({
$kind: "START",
$captured: {result: result}
})
This, of course, can lead to data races if multiple workers are involved.
If the worker throws an exception, the exception is passed as message as well.
worker {
throw IllegalStateException("Boo")
}.catch {
throw it
}
becomes
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $captured } } = $message
if ($kind == "START") {
try {
throw IllegalStateException("Boo")
} catch (e) {
postMessage({
$kind: "THROW",
$exception: e
})
}
}
}
// main.js
const $worker_0 = Worker("worker_0.js")
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "THROW") {
throw $exception
}
})
$worker_0.postMessage({
$kind: "START",
$captured: {}
})
If the worker contains master
call, its content goes to main JavaScript file.
For example:
val o = "O"
var result = ""
worker {
val k = "K"
master {
result = o + k
}
}.then {
println(result)
}
becomes:
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "START") {
const k = "K"
var { o, result } = $captured
postMessage({
$kind: "MASTER_0_START",
$captured: {o: o, k: k}
})
} else if ($kind == "MASTER_0_RETURN") {
postMessage({
$kind: "RETURN",
$result: $result,
$captured: {o: $captured.o, result: $captured.result}
})
} else if ($kind == "MASTER_0_THROW") {
postMessage({
$kind: "THROW",
$exception: $exception,
$captured: {o: $captured.o, result: $captured.result}
})
}
}
// main.js
var o = "O"
var result = ""
const $worker_0 = Worker("worker_0.js")
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "RETURN") {
o = $captured.o
result = $captured.result
} else if ($kind == "MASTER_0_START") {
var { o, k, result } = $captured
result = o + k
worker_0.postMessage({
$kind = "MASTER_0_RETURN",
$result = Unit,
$captured = {o: o, k: k, result: result}
})
}
})
$worker_0.postMessage({
$kind: "START",
$captured: {o: o, result: result}
})
These examples show, which fields present in data
object:
$kind
specifies the state in this worker state-machine$result
contains return value ofworker
ormaster
intrinsic, if the intrinsic finished its execution.$exception
contains the exception, thrown inworker
ormaster
block$captured
is a map, containing all captured variables.
Where $kind
contains not only the kind of the message, but also a unique
identifier in case of master
.
The compiler might optimize away $result
and $exception
in the worker
file, if there is no master
call in the worker-colored part of the
call-graph.
In addition, the last example shows, that exceptions propagate from master
block to the worker
one and back to the main thread. More on that in
"Error Handling" section.
then
and catch
intrinsics can accept variables, not only blocks. These
functional objects are called after updating captured variables. For example
val c: (String) -> Unit = {
println(it)
}
val o = "O"
worker {
o + "K"
}.then(c)
becomes
// main.js
// Worker creation and `postMessage` are omitted
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "RETURN") {
o = $captured.o
c($result)
}
})
In prototype stage, the compiler is likely to generate blocks, passed to then
and catch
as separate lambda objects and call them inside the event listener,
instead of inlining them.
I have already touched upon error handling in the previous section, so this section elaborates on the topic.
I have said, that every exception in worker block is passed to the main thread
via postMessage
:
worker {
throw IllegalStateException("Boo")
}
becomes
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $captured } } = $message
if ($kind == "START") {
try {
throw IllegalStateException("Boo")
} catch(e) {
postMessage({
$kind: "THROW",
$exception: e
})
}
}
}
but this is not the whole story.
In addition to exceptions, thrown in the worker
block, we have exceptions,
thrown in functions, called in the worker
block. To pass the exception to the
main thread, the compiler wraps the worker
block in try
block, passing
everything it caught to postMessage
:
worker {
foo()
}
becomes
// worker_0.js
addEventListener("message", ($message) => {
const { data: { $kind, $captured } } = $message
if ($kind == "START") {
try {
var $result = foo()
} catch (e) {
postMessage({
$kind: "THROW",
$exception: e
})
}
postMessage({
$kind: "RETURN",
$result: $result
})
}
}
The same happens when worker
uses master
intrinsic - the compiler wraps
the master
block is try
and any caught exception is sent to the worker.
We cannot simply transfer continuation object to the worker, since it holds a
link to its suspend function, and we cannot transfer functions. So, if we want
to support coroutines, we need to transfer COROUTINE_SUSPENDED
and value to
resume the worker with, and inside the worker resume correct continuation,
not the worker
block's one.
The example shows the idea
var c: Continuation<Any?>? = null
suspend fun foo() {}
suspend fun returnsInt(): Int = suspendCoroutine {
saveWorkerContinuation(it) {
c = it
}
}
suspend fun test() {
suspendWorker {
foo()
returnsInt()
}.then {
println(it)
}
}
fun builder(c: suspend () -> Unit) {
c.startCoroutine(Continuation(EmptyCoroutineContext) {})
}
fun main() {
builder {
test()
}
c?.resume(42)
}
We cannot capture continuation objects, unless they are used in color-changing
intrinsics saveWorkerContinuation
and saveMasterContinuation
.
The compiler generates the following for the worker script:
// worker_0.
var c = null;
var $savedContinuation = null;
function returnsInt($completion) {
c = $completion;
$savedContinuation = $completion;
postMessage({$kind: "SAVE_CONTINUATION"})
return COROUTINE_SUSPENDED;
}
var $continuation = null;
addEventListener("message", ($message) => {
var { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "RESUME") {
// 1
if ($continuation == null) {
$continuation = createContinuation(EmptyContinuation)
} else if ($exception != undefined) {
if ($savedContinuation != $continuaion) {
c.resumeWithException($exception)
return
}
} else {
if ($savedContinuation != $continuaion) {
c.resume($result)
return
}
}
// 2
while(true) {
try {
if ($continuation.$label == 0) {
$continuation.$label++
$result = foo()
if ($result == COROUTINE_SUSPENDED) {
postMessage({$kind: "SUSPEND"})
break;
}
} else if ($continuation.$label == 1) {
$continuation.$label++
$result = returnsInt($continuation)
if ($result == COROUTINE_SUSPENDED) {
postMessage({$kind: "SUSPEND"})
break;
}
postMessage({
$kind: "RETURN",
$result: $result
})
break;
} else {
throw new IllegalStateException("Unreachable")
}
} catch (e) {
postMessage({
$kind: "THROW",
$exception: e
})
break;
}
}
}
})
as you can see, START
signal is replaced with RESUME
. Additionally,
SUSPEND
signal it sent when worker suspends. Since master
intrinsic turns worker and main threads' code into state-machines, coroutines
simply extend the state-machines with their owns (// 2
).
In // 1
we create a continuation object to pass it as completion to suspend
functions, and check, whether we have suspended. If so, resume saved
continuation, unless saved continuation is worker
block's continuation.
We also send signal to the main thread to save worker continuation via
SAVE_CONTINUATION
signal. More on that later.
Here is the main script:
var c = null;
function test($completion) {
const $worker_0 = Worker("worker_0.js")
var $continuation = // ...
$worker_0.addEventListener("message", ($message) => {
const { data: { $kind, $result, $exception, $captured } } = $message
if ($kind == "THROW") {
throw $exception
} else if ($kind == "RETURN") {
println($result)
} else if ($kind == "SUSPEND") {
$continuation.suspend()
} else if ($kind == "SAVE_CONTINUATION") {
c = wrapWorkerContinuation($worker_0)
}
})
$worker_0.postMessage({$kind: "RESUME"})
}
// The rest of the script is not changed and omitted
where $continuation.suspend()
simply tells test
to return
COROUTINE_SUSPENDED
.
Unlike worker
and master
intrinsics, their suspend counterparts accept
suspend lambdas:
@ImplicitCodeColoring(forbidsPackages = ["org.w3c.dom.**"])
suspend fun <T> suspendWorker(block: suspend WorkerGlobalScope.() -> T): Worker<T> = error("intrinsic")
suspend fun <T> suspendMaster(block: suspend Window.() -> T): Worker<T> = error("intrinsic")
allowing them to call suspend functions.
saveWorkerContinuation
has several purposes
- it saves current continuation
- it notifies the main thread, that lambda of
saveWorkerContinuation
should be executed - it changes the color of its lambda back to worker-unsafe, allowing one to save continuation object in, for example, collection. Remember, we cannot transfer continuation objects from one color to another.
Saved continuation is later used to resume coroutines on RESUME
signal.
It is defined as
fun <T, R> saveWorkerContinuation(c: Continuation<T>, block: (Continuation<T>) -> R): R = error("intrinsic")
saveMasterContinuation
is the twin of saveWorkerContinuation
, performing
the same tasks, but when master
intrinsic is used.
These internal functions simply wrap a worker object and then send it a RESUME
signal in their invokeSuspend
method, thus, bridging the gap between main and
worker threads from the caller side, telling the worker or the master to resume
coroutines.
then
and catch
look ugly to me. We cannot, however, simply assume, that
we should wait for the worker to finish its executions - it beats the whole
purpose of web workers.