-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UInput Orchestrator #5
Comments
Reserved |
This seems so simple and intuitive as a solution. As you mentioned, the only immediate concern is that it might require long chains of virtual devices. Honestly I'd say that any inherent performance issue with that is probably unintentional. If I were a kernel dev and someone asked me if there were performance issues, I'd probably say, "I don't know, are there performance issues?" 😆 I guess we'll probably have to try it and find out, to find out.... but opening a simple, static chain of uinput devices and passing between them should hopefully be fairly simple. I'm actually thinking that interception might be able to do it, out of the box? |
I had a look at interception. It seems to do some of those things, but not everything. The biggest difference is that interception starts processes and pipes them together, which has some limitations/problems, e.g.:
But their udevmon code might prove valueble in order to understand the udev APIs :) -- I don't find the official docs very helpful. |
Sorry, what I meant was that, Interception might be useful to test the effect of having many uinput devices open... as in, maybe Interception can help to answer your question:
I agree, it would be too limited to reach your intended goal. |
I suppose that the biggest problem that needs to be solved is indeed making several input mappers use each other output, in a way that does not require the end user to manually configure input and output devices for every single mapper they use. A daemon which a mapper could ask "I want to map keyboard devices to keyboard devices. Give me the input and output devices I should use." would indeed solve that problem. Without any configuration on the user's side, the daemon could ensure that each mapper get put on a single deterministic part of the chain, and if the user doesn't like the order the daemon automatically choose, then they can reorder it easily in a single GUI written for the UIO daemon, without having to reconfigure each mapper manually. That shifts the task from convincing the Wayland crew from using a new protocol to:
Which may potentially be easier, but it really depends on how willing the majority of the input mapper developers are to go along with it.
I do have several thoughts regarding whether it is possible to create a sufficiently transparent wrapper like UIOInput that does not require big changes to existing mappers, but no coherent conclusion regarding that yet. Currently my biggest worry is how this is going to affect the event loop: on a low level, mappers would now need to maintain an open communication channel with the UIOInput daemon (whether over D-Bus or a Unix socket) and may occassionally need to change which event devices they have open, and thus change which file descriptors they poll/epoll. I think that abstracting that away would significantly decrease performance, requiring the high-performance oriented mappers to do some nontrivial plumbing around their event loop. But I'm not wholly sure of that yet. There are many options to consider here.
I think that UIO should be designed such that a crashing mapper cannot crash UIO. I think it would be acceptable for a crashing mapper to crash UIO if mappers were written as shared objects (.so) that are dynamically loaded into UIO's memory space, kind of like a kernel module getting loaded into the kernel. That would greatly increase performance at the cost of making mappers harder to write and allowing one of them to bring down the whole house of cards. As long as we do not make the tradeoff of allowing mappers to enter UIO's memory space, crashing mappers should not crash UIO.
I've written a small benchmark with python-evdev to check how fast my program evsieve can grab and mirror an input device 750 times: #!/usr/bin/env python3
import asyncio
import evdev
import evdev.ecodes as e
import os
import subprocess as sp
import time
ALPHABET = list("abcdefghijklmnopqrstuvwxyz")
NUM_KEYS_TO_SEND = 200
TIME_BETWEEN_KEYS = 0.1
# Create a device that we will send events into.
capabilities = {
e.EV_KEY: [
e.ecodes["KEY_" + key.upper()]
for key in ALPHABET
]
}
input_device = evdev.UInput(capabilities, name="virtual-keyboard")
INPUT_DEVICE_SYMLINK = "/dev/input/by-id/benchmark-0"
if os.path.islink("/dev/input/by-id/benchmark-0"):
os.unlink(INPUT_DEVICE_SYMLINK)
sp.run(["ln", "-s", "--", input_device.device, INPUT_DEVICE_SYMLINK])
# Creates one layer that clones the previous layer's input device.
def create_layer(index: int):
input_path = f"/dev/input/by-id/benchmark-{index}"
output_path = f"/dev/input/by-id/benchmark-{index+1}"
args = ["systemd-run", "--service-type=notify", "--collect", "evsieve"]
args += ["--input", "grab", "persist=exit", input_path]
args += ["--output", f"create-link={output_path}"]
sp.run(args)
# Create all layers.
NUM_LAYERS = 750
for i in range(NUM_LAYERS):
print(f"Creating device {i+1}/{NUM_LAYERS}")
create_layer(i)
# Then open the device created by the last layer.
output_device = evdev.InputDevice(f"/dev/input/by-id/benchmark-{NUM_LAYERS}")
output_device.grab()
# Sends events to the input device, then closes the input device when done.
async def send_events_then_close(device):
timestamps_of_sending_events = []
for event_index in range(NUM_KEYS_TO_SEND):
keycode = e.ecodes[f"KEY_{ALPHABET[event_index%len(ALPHABET)].upper()}"]
timestamps_of_sending_events.append(time.time())
device.write(e.EV_KEY, keycode, 1)
device.syn()
await asyncio.sleep(TIME_BETWEEN_KEYS / 2)
timestamps_of_sending_events.append(time.time())
device.write(e.EV_KEY, keycode, 0)
device.syn()
await asyncio.sleep(TIME_BETWEEN_KEYS / 2)
# Give the other tasks some time to finish reading events before we exit.
await asyncio.sleep(1.0)
device.close()
return timestamps_of_sending_events
# Measure the time of which the events that we can observe from the event devices.
async def read_events(device):
timestamps_of_reading_events = []
try:
async for event in device.async_read_loop():
if event.type == e.EV_KEY:
timestamps_of_reading_events.append(time.time())
except OSError:
return timestamps_of_reading_events
# Tell the user what the average difference between the input and output events is.
def present_report(timestamps_in, timestamps_out):
total_delta = 0
count = 0
assert(len(timestamps_in) == len(timestamps_out))
# Measure the total difference between the time at which we wrote events to the input device
# and the time the event showed up at the output device after being mapped through NUM_LAYERS
# amount of layers.
for time_in, time_out in zip(timestamps_in, timestamps_out):
total_delta += (time_out - time_in)
count += 1
MICROSECONDS_PER_SECOND = 1000000
print("")
print(f"Average delay of {round(total_delta/count/NUM_LAYERS * MICROSECONDS_PER_SECOND * 10)/10} microseconds per layer per event over {count} events and {NUM_LAYERS} layers.")
async def main():
timestaps_in, timestamps_out = await asyncio.gather(
send_events_then_close(input_device),
read_events(output_device),
)
present_report(timestaps_in, timestamps_out)
asyncio.run(main()) On my system, it outputs
There does not appear to be any worse-than-linear scaling involved as the chain of input devices becomes longer. At least, for the purpose of event latency. Maybe some other programs are poorly equipped to handle a large number of input devices. For example, libinput will probably need to open every single input device even if most of them are grabbed. The Also, another thing I ran into: there was a limit to how many layers I could use in the above benchmark. Specifically, 776 layers was the maximum my system could handle. I'm not sure why that specific number. It does seem to be possible to create more UInput devices than said arbitrary limit, but those devices do not show up under /dev/input/event*, and as such are practically invisible to the rest of the system. Based on A maximum of ~1024 event devices is not an unreachable cap, but still one that will in practice probably not be met that often. Maybe the cap is arbitrary and could be raised by the kernel devs if there is a need to, or maybe there are more fundamental reasons for the cap like a limited amount of device node numbers in some POSIX standard. |
First thanks for the feedback and the performance testing :). So, this is what I did so far with UIO (which is far less than I had hoped to do in that timeframe :/ ) : I started writing a prototype for UIO. What I got so far is a daemon and a little test client. The client can request a specific input device from the server and gets a file descriptor from which it can read events. A good amount of time also got into rethinking details (multiple times). So far I learned...
Because of 2 & 3 I have by doubts that using uinput devices really is a good idea. So far the only real advantage compared to a deamon that maybe facilitates shared memory between mappers or simply forwards the events seem to be that the kernel will take care of some things (e.g. filter out invalid events). Both of these approaches might be much simpler to implement. What do you think? |
This is a good insight. I do want to emphasize that groups of matching in/out devices does not mean pairs of in/out devices. It is for example imaginable that some mapper wants to take a joystick as input and generate both a keyboard and mouse device as output.
This is a good point. The thought of using uinput devices was that mappers could simply use It's good to be aware that there is no need to stick to uinput devices and that we have other options available. At the same time, I haven't found a clearly better alternative yet: Loading the mappers as shared objects into the memory space of UIO This is also only a viable option for mappers that are written in system languages without a runtime (i.e. C, C++, Rust, or Zig). Even for such mappers, there would be additional development overhead because each mapper needs to be able to clean up all its own memory if the mapper gets unloaded. Even languages as Rust do AFAIK not provide such functionality automatically, because Rust does not drop static variables. (That does not mean that it would be impossible to write mappers in non-system languages like Python; we could offer multiple options like "either get loaded into UIO's memory space, or communicate over pipes", with mappers written in Python choosing the latter option.) Running mappers as separate processes with shared memory Fortunately, this seems to be achievable using POSIX semaphores, which are basically slightly generalized mutexes sharable between multiple processes. Mapper B can use However, that still means that after Mapper A writes events to shared memory, we still need to wait until the kernel wakes up mapper B before the processing continues. I don't have proof for this, but I believe that waiting for the scheduler to give a piece of time to an idle mapper process is the biggest source of latency. The kernel knows that Mapper B can be scheduled immediately because of a semaphore, but that is also true for other communication methods: if a virtual input device or pipe is used for communication between mappers, then the kernel would also know that Mapper B can run as soon as something is written to the input device or pipe. I haven't benchmarked this, but it is very possible that waiting on a semaphore has the same latency as waiting on an input device or pipe. Also, the above method allows the kernel scheduler to immediately start the process that was waiting, but does not require it to do so. Even if Mapper A immediately calls According to the discussion I found here, there may be some way to get something done using custom scheduling groups and the Linux-specific Letting mappers communicate using pipes Furthermore, if there is only a single mapper running (the most common usecase!), then there would be significant overhead because UIO would have to translate an input device to a pipe, the mapper would translate a pipe of input events to a pipe of output events, and then UIO would have to translate the pipe of output events to an uinput device. That requires three read/write cycles, whereas only a single read/write cycle would be necessary if the mapper read directly from the real input device and wrote to the real output device.
Another option could be to not reuse event devices that were allocated to mappers that quit. E.g. if the chain is (A → B → C), and B quits to reduce the chain to (A → C), then we could give Mapper A a brand new output device and Mapper C a brand new input device, and close the devices that were previously used for the (A → B) and (B → C) transitions. This does have the disadvantage that the state of the input devices would be lost, e.g. if a user was pressing and holding the A key, then that key might get released if any mapper quits. This matters for the usecase of transient mappers, like some xdotool-like program typing a few keys and then quitting.
An additional bonus is that it is possible to ask the kernel about the current state of the event device, e.g. you can tell where a current absolute axis is or whether a certain key is pressed before you receive any events related to them. This is handy when a device gets handed over to a different process. I suppose that that could also be achieved if all events need to be routed through the UIO daemon instead of mapper-to-mapper communication, but would be harder to ensure in case of mapper-to-mapper communication. (A mapper could announce the current state of each device upon exit, but that requires cooperation of the mappers, and fails to work if the mapper crashes or is programmed to just call But the main reason to stick to uinput devices is just a baseline reluctancy to invent a new event protocol when the current protocol isn't broken, unless there are clear advantages to the new protocol. If communication through shared memory can be shown to indeed have lower latency than communication through virtual input devices, then that would be a good reason to switch to a new protocol using shared memory. I suppose that using uinput devices is somewhat broken because it pollutes the (If that is the only issue, it may be possible to convince the kernel to add an API for creating an uinput device that does not show up in |
Yes, but...
I think this one has too many pitfalls for now. Maybe it could be implemented as an option later.
I had not heard of POSIX semaphores yet - thanks for bringing them up.
I wonder how much of the initial state we actually need to handle (assuming you mean things like which keys are pressed and which LEDs are on). This might be a good candidate for not beeing necessary for the first release, but I need to think about it some more. Other than that, some custom protocol is necessary anyway. It would be nice to keep it simple, of course. I think what actually bothers me about uinput the most right now is the need to ask processes to switch to another device. This means
In between new devices need to be created or destroyed. Of course, having shared memory directly between mappers might lead to similiar needs. If everything in between the real device and the virtual output that is read by wayland is done by UIO simply forwarding stuff to the next mapper (either by unix pipes or shared memory between mapper and UIO) any such change would just be an adjustment of inner state in UIO. As for xdotool and similiar tools - I think these simply want to insert something that no other mapper cares aboutjust before wayland. So why not just reserve some independent output devices for them? But let's get back to the "what's best to do now": My plan was (and still is) to provide a library that wraps any communication to UIO. This has one major advantage: whatever approach is selected - ideally it should be possible to implement the others later and maybe even support a mix of all of them without changing the client code. In the case of a single mapper, UIO could be configured so that the mapper simply gets direct access to the input & output devices. So what I currently guess to be the best plan is:
The big question (2): what is the easiest to implement solution? My guess is UIO passing events between mappers using unix sockets for communication. I also plan to put the code on github soon, but need to think a bit about licensing (probably LGPL). |
UInput Orchestrator (UIO)
What is the idea?
A daemon that manages connections between mappers by creating, assigning and re-assigning multiple uinput devices, but could be extended to something else later. This should result in a stable path where multiple input mappers can process an input device in a deterministic order.
Mappers connect through a few functions in a library and can request file descriptors to matching (virtual) input devices. UIO's job is to ensure that
Disclaimer: this is an early draft and I have not done enough research to be sure that it is technically feasible (or even possible).
Let's start with a few diagrams.
The first is about order
Each Mapper has contexts, identified by it's path (abbreviated here to M#) and role. Roles can be requested by the mapper or configured by the user. If configured by the user, the mapper can request available roles from UIO.
If a mapper wants to create a context, a GUI asks the user to confirm. "New" is where new contexts pop up by default.
All contexts are specific to an input device (although input devices could be grouped for easier configuration).
Startup and first events
I hope this is somewhat clear..
UIO makes sure there is a chain of (for now!) uinput devices. It can open and create input devices, then shares the FDs with mappers through UIOInput/UIOOutput. Virtual devices can be kept open as it deems necessary (e.g. for short lived scripts / a short while after a mapper exits, in case it's just restarting..).
I hope it is possible to manage access rights to the virtual devices in a safe and stable way.
UIOInput and UIOOutput offer transparent read/write functions. My plan is to use uinput for now, but this may be extended and configured to support other ways of communication between mappers like a direct shared buffer (for performance) or one that is managed by UIO and keeps state of all keys (lower performance, but safer handling of some cases).
read_evdev()
means that it returns the event as evdev would. We could add transformations to libinput structs, etc. later.Window change
We may have options to handle cases where a keycode changes while it's pressed. But I'm not sure how/where to do that yet.
Advantages
Disadvantages (for now)
Some open questions
Implementation details
UIOOutputRequirements - a struct holding the parameters by which a fitting output is chosen.
The text was updated successfully, but these errors were encountered: