Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for possible collaboration #2

Open
kermitfrog opened this issue Dec 2, 2022 · 78 comments
Open

Call for possible collaboration #2

kermitfrog opened this issue Dec 2, 2022 · 78 comments

Comments

@kermitfrog
Copy link
Owner

Hi,

I'm the developer of a tool called inputMangler, which transforms input events on linux.
After a few years of other priorities I want to continue development (well.. rewrite it from scratch actually..).
As I like to avoid duplicate work, I had a look around the net to see if someone else started another project like mine. I found a few which at least do something similiar and, if you're mentioned at the end of this post, one of them is yours.

While all those projects seem to have more or less different goals and approaches, there still might be enough common ground for collaboration.
So this thread is about exploring possibilities to work together.

In the next post, I will write an overview of my goals. I invite everyone interested to do the same.

Afterwards we can compare those and discuss if it would make sense to
• put some base code in a common library
• merge projects (may be unlikely, but .. maybe)
• just share experience on strange input-related problems ;D

Links to the projects:
https://github.com/kermitfrog/inputmangler
https://github.com/sezanzeb/input-remapper
https://github.com/samvel1024/kbct
https://github.com/shiro/map2
https://github.com/rvaiya/keyd
https://github.com/snyball/Hawck
https://github.com/KarsMulder/evsieve

And the people that I hope will have a look at this after receiving a notification for being mentioned:
@sezanzeb, @samvel1024, @shiro, @rvaiya, @snyball, @KarsMulder

@kermitfrog
Copy link
Owner Author

Goal overview of inputmangler

Working in current version

  • direct remapping of linux input events (keys, mouse wheel, etc.) for multiple devices
  • hierarchy of presets, which activate depending on window class and window title (no need for manual change of presets)
  • support for triggering combos (e.g. shift-a), macros, mouse wheel acceleration
  • dbus interface (change preset, update config, create event, print current window information)
  • localized key names (kind of - they're actually custom configured)

Planned

  • Easy to use UI
  • system service which uses something like policykit to detemine current user
  • plugins

Nice to have if feasable

  • handle more complex input like key combinations
  • possibility to run external commands or sending dbus events on certain conditions
  • config format that can easily be edited manually. This was a priority before and is working in the current version, but I would change it in favor of a cleaner format.

Frame

  • linux first
  • focus on reliability, performance and security
    • rewrite in Rust
    • experimental features that are in conflict with above requirements:
      • have to be explicitly allowed in service config
      • have to be explicitly activated by the user
      • trigger a warning in UI if activated

@sezanzeb
Copy link

sezanzeb commented Dec 2, 2022

Have you considered contributing to the existing ones?

I'll probably invest my time into maintaining the 2.0 (beta right now) release coming in February

@kermitfrog
Copy link
Owner Author

Have you considered contributing to the existing ones?

I am still considering that and its part of what this thread is about.
We all have certain ideas about what the program should do and how best to achieve that. So the question is: is there a project where I can realize my goals by contributing (this would clearly be the preferred way) or are the projects goals incompatible with mine?

Project descriptions tell something about what the current state is, but little about what is planned.

Having everyone summarize their goals and priorities would help a lot to clear this up.

@sezanzeb
Copy link

sezanzeb commented Dec 2, 2022

Ok

For input-remapper the current goal is to finish 2.0 up. The current work on that is happening on the beta branch: https://github.com/sezanzeb/input-remapper/tree/beta. After that pretty much any input can be mapped to anything. For example mouse movements to joysticks. It will feature an overhaul of the GUI to support all that without editing configs. After the release, people might discover bugs, since a lot of new stuff will be released.

See sezanzeb/input-remapper#177 for information about contributing, and https://github.com/sezanzeb/input-remapper/blob/beta/readme/development.md for some technical details.

Works:

  • direct remapping of linux input events (keys, mouse wheel, etc.) for multiple devices
  • support for triggering combos (e.g. shift-a), macros, mouse wheel acceleration
  • dbus interface (change preset
  • system service which uses something like policykit to detemine current user
  • handle more complex input like key combinations
  • config format that can easily be edited manually. This was a priority before and is working in the current version, but I would change it in favor of a cleaner format.

reliability

If input-remapper doesn't work, then it is usually because something is fundamentally broken or impossible as of now. But it seems to be quite stable during operation. There are tons of automated tests.

  • Easy to use UI

I like to think it is

  • linux first

input-remapper will never support anything other than linux probably

Somewhat works:

  • hierarchy of presets, which activate depending on window class and window title (no need for manual change of

Via third party software: https://github.com/DreadPirateLynx/input-remapper-xautopresets. This needs to be individual for each Desktop Environment. There is no solution that works for all Wayland DEs. It's easy to do in X11 apparently.

  • localized key names (kind of - they're actually custom configured)

This works for X11, gnome on wayland, plasma on wayland, but other DEs that run on wayland may not support it properly. Input-remapper has to rely on using xmodmap -pke to get that information, which is part of xorg. Right now this is not causing trouble.

performance

Not causing any issues, but CPU usage can go up to 5% on my computer during usage (on a single core). input-remapper-service has never been profiled properly, there might be potential for optimization.

security

Key logging is possible or a few minutes while the GUI is open. There is no way around that. Because information has to go from a privileged service to the unprivileged GUI via a pipe to record input. Other than that, I don't think input-remapper is leaking input anywhere during normal operation.

Doesn't work:

  • plugins
  • possibility to run external commands or sending dbus events on certain conditions

Because the daemon runs as root, which is a security problem if mappings trigger commands and challenging to sandbox properly to not cause problems. I'd like to avoid those things. Running external commands is often possible via the DEs settings, and probably sufficient for most users.

  • dbus interface update config, create event, print current window information)

updating config is done via the GUI, which just writes to a json file

@kermitfrog
Copy link
Owner Author

Thanks for the info :)

Easy to use UI

I like to think it is

I tried your beta and think the UI has a solid concept and is well done (although some polish is needed - which is to be expected in beta).
However, it follows a different logic than I would like (Device > Preset > Mapping vs. Preset > Device > Mapping).

I find defining output combos difficult though. Autocompletion is a great idea, but recording the output sequence should be better in most cases.

Not causing any issues, but CPU usage can go up to 5% on my computer during usage (on a single core).

For comparison, I tried it on my computer and it goes up to 9 % for mouse movement and 12 % with a SpaceMouse compared to 1.3 % / 2 % with inputmangler, so there is clearly room for imrovement.
Are you using Cython?

@sezanzeb
Copy link

sezanzeb commented Dec 5, 2022

Are you using Cython?

Yes, not much difference with pypy once the jit compilation has started optimize it

I find defining output combos difficult though. Autocompletion is a great idea, but recording the output sequence should be better in most cases.

Have you seen the information on the botton of the output editor?

image

It was added for the purpose. If there is no device available to record the output the user wants it might get difficult to set certain mappings.

@sezanzeb
Copy link

sezanzeb commented Dec 5, 2022

although some polish is needed

You are very welcome to tweak it in Glade and to make a PR

Device > Preset > Mapping

and also to create a new issue to discuss this. Showing how the GUI would have to change, and explaining how the workflow of recording input would change would be helpful there :)

@KarsMulder
Copy link

My two cents: I think that the hardest part of a keymapper project is actually not the implementation, but the design.

If the user wanted to have full control over how input maps to output, then there is already python-evdev for that. The disadvantage of python-evdev is that it requires some boilerplate and it is difficult to write scripts that do not suffer from many different edge cases.

(Particularly, before I started on evsieve, I had about two dozen python scripts for different things, and regularly observed that writing a script that did thing A was relatively simple, writing a script that did thing B was relatively simple, but writing one that did both A and B was really difficult due to edge cases introduced by their interaction. Relatedly, the big time sink for adding new features to evsieve is not figuring out some way to implement it, but deciding on how that feature should interact with the other features that are already there, and figuring out what would be the most sensible behaviour for every edge case that could come up.)

Several projects have started to search for a higher-level way to describe ways to transform events. These higher-level configurations tend to make simple things easier but difficult things harder or impossible. The big question is how flexible you want your configuration language too be: if your configuration is too simple, many things users might want become impossible. If it is too complex, it ceases to offer much advantage over just writing a python-evdev script.

Many different projects have struck the balance between simplicity of configuration and versatiliy at different points. Before you start working on implementation details, I think it is important to first of all figure out exactly which kinds of transformations you intend to support and how you intend to present that in an user-friendly way to the user.

Since we can't beat python-evdev on versatiliy, we need to beat it on ease-of-use and user-friendliness. Having a user-friendly user interface for your targeted level of versatiliy is where the value of keymapping programs lies.

In particular,

Nice to have if feasable

  • handle more complex input like key combinations

I think that whether, how, and which "complex input" you intend to support—along with how you intend to present that configurability to the user—is a fundamental question that needs to be considered before anything else, rather than treated as an afterthought. The answer to this question will impact just about every other part of the development process.

@sezanzeb
Copy link

sezanzeb commented Dec 6, 2022

In our case it would be the "mapping handler" architecture, which is like a pipeline, combined out of multiple handlers that can do different things. As far as we are know it is finished on beta. We'll have to wait and see if someone raises issues about certain things not being possible. It allows for example to combine mouse-movements with button clicks to produce some other output.

@kermitfrog
Copy link
Owner Author

@sezanzeb

although some polish is needed

You are very welcome to tweak it in Glade and to make a PR

Learning GTK / Glade is out of scope for me, but I'll create issues so you know what I mean.

Device > Preset > Mapping

and also to create a new issue to discuss this. Showing how the GUI would have to change, and explaining how the workflow of recording input would change would be helpful there :)

My plans for the UI are incomplete, but it will involve a TreeView to represent the hierarchy (group -> [subgroubs ..] -> window -> title) while mappings for all devices for that preset should be visible in the same view.
There is also the matter of "can be made to work" vs. "works well". If I were to transfer my current inputmangler configuration to input-remapper (using xautopresets) I'll probably end up with the configuration spread over more than a hundred files. Also in inputmangler, mappings can be inherited - I don't think that can be currently represented in input-remapper.
So the config format would have to change to make it work well.
I have to think a bit more about it..

Speaking of UI - I'm currently learning QML/Kirigami and have plenty of experience with the rest of Qt (mostly in C++ though), so I might help a bit with your Qt port. More by answering questions though, as it's not a high priority for me right now.

@KarsMulder

My two cents: I think that the hardest part of a keymapper project is actually not the implementation, but the design.

Yeah, I totally agree. One thing I'd like to explore here is which projects have (partially) compatible designs.

Since we can't beat python-evdev on versatiliy, we need to beat it on ease-of-use and user-friendliness.

Yep. But don't forget about performance. I don't like to have tools running in the background that use up more ressources than they need..

Do any of you have detailed documentation, describing your projects design?

@snyball
Copy link

snyball commented Dec 6, 2022

If I were to create Hawck from scratch, this is the architecture I'd probably go with:

A single input-capture-redirect service with a small custom sandboxed VM, runs as a user with access to input, and accepts
literally any piece of code given to it on a UNIX socket from the desktop user. Safe because the VM cannot interact with the outside world, which was rarely needed in practice anyway. And similar to @kermitfrog I'd want to write this new version in Rust rather than C++.

The system should have access to not just keyboard/mouse/controller input but also many xdg-desktop-portal extensions, preferably the portable ones, and should include some wm-specific functionality that doesn't exist portably for Wayland compositors right now (like currently focused window.) Also random number generator, tty-detection, open-in-browser, etc.

As for launching things, I think we could provide functionality for launching .desktop files, but only from /usr/share (and never using the users $PATH or really any of their env) without thwarting the Wayland security model.

Then any GUI-based thing can just talk to this input service, and it should be flexible enough to do whatever one of those GUIs might want to do, and any text-based system can be compiled to the VMs bytecode.

I've been thinking about building this service just for fun, but it has ended up on the back-burner for a while because low-level Linux input stuff can be kinda frustrating due to a lack of documentation in a few areas.

If anyone else thinks this is a good idea, I'd write a spec for this architecture for reuse in other projects.

Of course, 99.9% of users are looking for one of a few select specific things like replacing caps-lock with ctrl/escape, but I still think a highly generic but safe and fast keyboard remapping system is a nice-to-have for the platform.

@jonasBoss
Copy link

I started work on the InputRemapper beta branch a year ago in order to solve my personal needs (using a 3DConnexion SpaceMouse as Joystick). Which somehow escalated into reinventing the whole architecture. That pretty much confirms the concerns raised by @KarsMulder:

I think that the hardest part of a keymapper project is actually not the implementation, but the design.

That said, I think the current approach can accomplish almost any reasonable remapping (mouse/joystick -> keyboard and mouse <-> joystick) with support for combinations in each case. + macro support to generate complex input-sequences (I think it is possible to make keyboard-> joystick/mouse mappings with macros).

There are some limitations:

In general I think it is quite possible to design a common Sevice which is simple to use for simple tasks e.g. remapping of n inputs to one output. But also provides a api for user scripts and more complex behavior. Implementing a good

dbus interface (change preset, update config, create event ...)

will make it possible to develop different GUIs or simple scripts which may or may not maintain their own configurations and translate them for the service.

@sezanzeb
Copy link

sezanzeb commented Dec 6, 2022

across different physical devices as they run in different processes

I sometimes wonder if this limitation can be avoided. Soon mappings will hold the information of their source device, so we could as well just record from all devices at once I guess. Idk.

Python is slow. InputRemapper has quiet some overhead. It was never properly profiled, so there might be potential to optimize it. But there is no arguing that Rust or C++ would be much faster.

For performance, if there really are no good optimizations possible, I'd not be very opposed of translating everything to a different language. It probably doesn't matter which one, because python is pretty much one of the slowest widespread languages. Translating the Tests could be a bit tricky sometimes, but they cover a lot of edge cases and past bug reports, so that would be really nice to be able to keep them. But anyway, if someone could do some profiling that would be great.

@snyball suggestion for a sandboxed process could be a good solution to support proper scripting in macros.

a small custom sandboxed VM

Also see sezanzeb/input-remapper#500. I thought lua doesn't require a vm to sandbox it, or does it?

@KarsMulder
Copy link

KarsMulder commented Dec 13, 2022

Do any of you have detailed documentation, describing your projects design?

I do not have such documentation written other than the comments interspersed through the source code, but I can give a quick rundown of the major parts:

The input system
The input system uses the C library libevdev to open and read events from devices. It uses the Linux epoll syscalls to wait until an event becomes available on any of the devices.

I have benchmarked epoll vs poll and was not able to find any measurable difference in performance.

I have not benchmarked how the performance would compare against using LIBEVDEV_READ_FLAG_BLOCKING. I wasn't even aware that was possible when I started writing, and at this point it would be too much hassle to implement it.

Argument parsing
The command line arguments are parsed using the arguments module in a two steps: a parse() function which turns the textual commands into structs, and an implement() which turns those structs into variants of the enum StreamEntry and enumerates which input/output devices need to be opened. Those StreamEntrys are kind of like an internal bytecode that is used to process events. Many of the arguments map to a single StreamEntry, but not necessarily so. Some similar arguments like --map and --copy are both mapped to the same StreamEntry, and some arguments are mapped to a combination of multiple StreamEntrys.

Event propagation
All StreamEntries are expected, if applicable, to define two methods similar to the following: apply_to_all(&[Event], &mut Vec<Event>) and apply_to_all_caps(&[Capability], &mut Vec<Capability>). The first function takes as input a vector of events, and should write what those events map to to the output vector. Events that the entry does not interact with should be written to the output vector as-is. An entry can drop events by not writing them to the output vector. Each entry is supposed to preserve the order of the events handed to it.

At a first glance, you may think that this use of out-pointers looks like a bad practice that originates from the time of C, and modern programs should just return Vec<Event> instead. However, I have found that the use of out pointers not only measurably increases performance, but surprisingly is also easier to work with. For example, if apply_to_all wants to delegate its task to another function (e.g. apply(Event, &mut Vec<Event>)) then it can just pass its out-pointer to that function and things will magically go well, which easier and more performant than having to do an .into_iter().flat_map(_) every time the task gets delegated. Furthermore, most of the reasons that out-pointers are a pain to work with in C are avoided in Rust since there are no buffer overflows, no buffer size limits, and &mut clearly marks which variables will be modified.

That said, in hindsight I think that processing multiple events at the same time was a bad design decision that is making some new arguments (most importantly, --hook --withhold) a pain to implement. I would redesign this internal model if that didn't break backwards compatibility in some edge cases.

The apply_to_all_caps function works similarly to the apply_to_all function, except instead of events, it works with capabilities: it takes a list of all events that might possibly reach this StreamEntry and should generate a list of all events that it could possibly generate on basis of that. This is how evsieve can automatically decide which capabilities its output devices should have.

The output system
First the input devices get opened, their capabilities get read, their capabilities get propagated through the stream, and then output devices are created based on which capabilities come out at the end of the stream.

(Also, if an input device marked with --persist=reopen disconnects and reconnects and then turns out to have more capabilities, the capabilities are propagated again. If it turns out that some output device now needs more capabilities, that output device is destroyed and recreated with the new set of capabilities. I suppose nobody actually needs this, but I am picky about having evsieve work correctly in every single situation.)

Threading structure
There is one main thread which does the event handling and uses epoll to wait for new events. New events are read from an input device, send through all StreamEntrys, written to the right output device, and then we wait for epoll() again.

If needed, some additional background threads may be spawned to do tasks that I do not want to delay event handling (i.e. garbage-collecting subprocesses that were spawed using --hook exec-shell= and monitoring the filesystem to see if previously disconnected event devices become available again in case of --input persist=reopen.) These threads communicate with the main thread using an internal UNIX pipe that is also monitored using epoll.

The code is written synchronously (i.e. without using the async feature), for two reasons: (1) in a previous development version that was based on python-evdev before I rewrote it in Rust, I found that using epoll to wait for events had half the latency of using Python's async, and (2) at the time, I heard that the Rust async ecosystem still had several rough edges. I have not benchmarked whether the Rust async feature has the same performance overhead as the Python one, but in the end I think it was the right decision to write synchronous code, because I cannot imagine the codebase becoming cleaner if async was involved.

@kermitfrog
Copy link
Owner Author

Hm.. I sense a wide agreement on Rust - no real suprise here :)

Maybe I should write a bit about inputmanglers current architecture (which isn't exactly how I would do it now):

  • The whole process runs in user-space and device access is managed through group permissions, which are set by udev. The important security concept is not to remap your main keyboard (which I never really needed to, so that's not a problem).
  • The configuration is split in 2 sections (actually 3, but I'm ommiting the mappings part as it only concerns how events are configured in the other sections).
    • The first section defines which input events on which devices are available for remapping in the second section.
    • The second section defines the generated events for those input events per window/windowtitle.
  • uinput devices are created to be able to generate all possible events that are configured in section 2.
  • For each input device there is a thread waiting for input. When input is read, it looks if it matches an event that was configured in section 1. If so, the corresponding event for the currently active window configuration (aka preset) is generated. Else the event is passed on.
  • There are different types of generated events (single event transformations, macros, wheel acceleration, ..), so each can be generated with a minimal code path.
    • e.g: single events from keyboard to keyboard just modify the input code, then pass the event on.

Things I would like to change:

  • use a system service instead of group permissions.

Things I would like to keep:

  • the configuration logic. This does not have to restrict the backend, but would strongly influence how I'd design a GUI.
  • generate events with minimal code path.
  • no need to enter password for config changes. Not sure if this is possible without compromising security if the main keyboard can be read, though.

@snyball
A VM sounds interesting for the purpose of writing complex macros.
The question is: can it be done without compromising performance for simple use cases?

literally any piece of code given to it on a UNIX socket from the desktop user.

I assume you mean that the user process passes code to the service to execute on a given event, which is then done there. Not that events are passed from system space to user space, which then sends something back to system space. Right?

@jonasBoss

.. using a 3DConnexion SpaceMouse as Joystick ..

I sometimes wonder how many people use these things for gaming, compared to those who use them for their intended purpose of 3D-Modelling..

Inputs need to be processed as whole frames (EV_SYN - EV_SYN), not on a per event basis

Inputmangler has the same problem. Doing this per event has worked perfectly fine for me for a long time. But recently mouse wheels are sending normal wheel events alongside hi-resolution events, causing double scrolling events.
I worked around this by surpressing the hi-res events, which only caused problems in QtCreator so far.
But this definitively needs to be taken care of.

@KarsMulder
Thanks for the detailed description. I think I'm going to have a closer look at your code when I have more time.

The input system uses the C library libevdev to open and read events from devices.

I wonder if libevdev causes any measurable overhead compared to direct ioctl calls / device read. This would be an interesting thing to profile.

@pallaswept
Copy link

Of course, 99.9% of users are looking for one of a few select specific things like replacing caps-lock with ctrl/escape

Since a lot of relevant people are here, this might be a good place to discuss this.

I'd agree with the above quoted assertion that 99.9% of users just want to do that one thing and be done with it, but that's because the some 25% of all users, who need more (there's a lot of us crippled dudes around), just can't use linux, so they don't. Back in windows-land, it's not even a bat of an eyelid to be running 5 or 6 input handling tools like this simultaneously. Nobody talks about it because it's normal. In linux-land, nobody talks about it because it's impossible. I mistakenly thought that problem has been solved, moved back to linux, and I've found out I was wrong. It's a physically painful mistake, but I'm too far in to go back to windows now, so I want to do what I can to get this sorted.

Since X doesn't support a lot of video features I need, I had been waiting for Wayland tools to mature, so that I could do all of the input mangling I need, which I could do in Windows. I kept my ear to the ground, and over time I heard about many new projects which were wayland-compatible replacements for existing X tools which I used to use in linux. xdotool gave birth to ydotool, some KWin shortcuts features offered keybinding ability to run scripts (that's AutoHotKey taken care of) and finally, the most important one for me, mouse-actions came along to replace easystroke(X)/StrokesPlus(Windows) for mouse gestures. So, I figured it was a safe time to jump ship back to Linux (I can't stand Windows, so this was exciting for me!)

I need to rebind and disable keys and key-combos, bind key combos to external commands, adjust analog input (joystick) sensitivity curves, re-map mouse buttons, map foot switches to scripts, and mouse gestures are an absolute MUST. Why:? Because I'm physically disabled. So all these accessibility tools aren't just "nice-to-haves', they're 'must haves'. And each of the presently available tools on linx/wayland works fantastically. But once I tried to use more than one, I hit a wall, and it's a hard one. While everyone was talking about how the lack of Wayland replacements for classic tools like xdotool had been solved, nobody was talking about the fact that you can't use them all. Only one.

Pretty much (actually I think it's literally) every Wayland input device handler, takes the same approach - go a layer lower in the stack than X11 did, and exclusively grab the evdev devices. It's a simple solution t the problem but short sighted in that it means you get to choose one and only one accessibility tool, because one effectively locks out all the others. It doesn't seem practical or realistic that any single tool should be the all-singing all-dancing solution to every input device accessibility requirement, so the thing that is really needed from all of you tagged in this thread, is to find a way to get your tools to play nicely together.

I'm not really sure of the right way to go about resolving this issue, but I am sure that it means that, at least in it's present form, Wayland is an accessibility failure from the get-go. And I see a lot of people who should be involved in a conversation about this, in this one thread, so I'd be interested to hear your thoughts. Because if you're going to discuss collaboration, this is the first thing that needs to be addressed. None of you could be expected to write a single tool that does everything, nor should the user be limited to that one tool, so finding a way to make them all work simultaneously is step 1 in collaborating (I mean, the word 'collaborating' literally means 'working together' and most of your apps won't work together 😄 )

Since it's been almost a year, I'll do that ping again, apologies if this causes you any consternation: @sezanzeb @samvel1024 @shiro @rvaiya @snyball @KarsMulder @jersou

Speaking in terms of the solution to this problem....it strikes me that what's required here, is a new layer between evdev and these applications, which would exclusively grab the evdev device as these apps do, and then allow these 'client' applications, rather than exclusively locking the devices, to subscribe to callbacks from the intermediary layer, to handle input events; thus allowing a single input event to be handled by multiple applications. Perhaps there's a better way to deal with it, which is why I'm asking you for your thoughts.

@sezanzeb
Copy link

sezanzeb commented Nov 9, 2023

Pretty much (actually I think it's literally) every Wayland input device handler, takes the same approach - go a layer lower in the stack than X11 did, and exclusively grab the evdev devices

a new layer between evdev and these applications

I like this idea

image

This avoids grabbing, while still allowing applications to hide events from the desktop-environment. This way, multiple mapping-tools can map the same event to whatever they like.

Those new pipes that applications read from could be compatible with tools like python-evdev by behaving exactly like uinputs/InputDevices, they are just at a different path, and they ignore requests for grabbing. Allowing existing mapping tools to continue to work, as long as they discover those new devnodes.

The new-layer has to wait for each mapping-tool to report that it is done handling the event, and only then decide if the event should be forwarded or not. It won't forward it, if one of the tools reported that it is suppressed.

If a service/layer like this is written, then please

  • dependency injection architecture
  • well commented and documented code
  • unittests
  • low-level-language (probably Rust I guess, beware though that I can't write code in any low-level-language, and I'm not a very active programmer in my freetime anymore)

@shiro
Copy link

shiro commented Nov 9, 2023

Would like to see someone make a proof of concept for this to test performance, lots of piping/polling going on, not sure how much latency this adds.

Maybe a wayland protocol would be a good place to put this, not sure if gnome/kde would pick it up though.

@sezanzeb
Copy link

sezanzeb commented Nov 9, 2023

Given that noone ever comlained about input-remapper having too much of a latency, even though it's written in python and never has seen any sort of optimization, I doubt it will be significant. But that is just my gut-feeling.

@sezanzeb
Copy link

sezanzeb commented Nov 9, 2023

I had to add to the proposal above, that the new-layer has to keep track of tools that are reading, in order to wait for each one of them to finish processing in order to know if the event is suppressed. I don't know if this is possible. Do owners of pipes have a way of knowing which processes are reading from it?

@KarsMulder
Copy link

KarsMulder commented Nov 9, 2023

I had to add to the proposal above, that the new-layer has to keep track of tools that are reading, in order to wait for each one of them to finish processing in order to know if the event is suppressed. I don't know if this is possible. Do owners of pipes have a way of knowing which processes are reading from it?

I think it is not possible to accomplish the above with just pipes because anything you write to a pipe can only be read by a single process anyway. Those "new readable pipes" would have to become Unix domain sockets instead. With sockets, it becomes also becomes possible to track which processes are listening as a nice side-effect.

Would like to see someone make a proof of concept for this to test performance, lots of piping/polling going on, not sure how much latency this adds.

I haven't tried implementing the proposed scheme, based on how quickly I've managed to get event round-trips to work in evsieve, I'd expect a latency of ~0.15ms when zero input remappers are in use (one round-trip; I assume that most of that latency comes from waiting for the scheduler to give the program some CPU time), and at least ~0.45ms when one or more input remappers are in use (which involves three round trips.) An inefficient Python implementation takes ~0.5ms for a single round-trip.

Assuming you're gaming on a cutting-edge 240 Hz monitor, a latency of 0.45ms would mean that there is about 11% chance that an input event gets delayed a single frame. Which is an acceptable delay in case you're actually using remappers.

For users not using remappers, I can however imagine that any scheme that proposes adding 0.15ms of latency to Wayland as a background service would receive more flak than dbus. Some people still don't accept that dbus adds enough value to be worth the couple of megabytes of memory it uses. If we want to go with the above scheme, I think it would greatly help adoption if it was a dynamic service that could be started on-demand when the first program needs it, rather than something the operating system is expected to keep alive whether the user wants it or not.

The protocol
Note that it is not possible to just play out the evdev protocol as-is over a pipe because event devices accept more ioctls than just read(). For example, it is possible to query the capabilities of a device, query the current position of a joystick without having seen any event for them, and of course grab the devices. We would need to find a new protocol that either works fundamentally different from evdev, or encodes all actions that are possible in evdev over some bidirectional communication protocol.

Of course, libevdev (and python-evdev?) would need patches to be able to work on those sockets.

I personally think that the evdev protocol is a bit painful to work with. However, we must remember that the evdev protocol has been crafted by kernel developers which have seen every single crazy input device hardware manufacturers have devised, and the evdev protocol has stood the test of time for quite a while now.

I would be skeptical about proposals to replace the input stack that the kernel has built up with a new protocol in a userspace daemon just to make keymapping possible.

Alternative solution: can't we solve this in the kernel instead?

It is easy to jump to the idea of writing another userspace daemon because you can "just do" and does not require anyone's approval, but I wonder if our effort is better spent submitting patches to the kernel instead?

So far, a lot of event mappers for Wayland have decided that grabbing event devices and creating new event devices is a good idea. However, we're discussing creating an abstraction layer over them. This makes sense because there are several drawbacks to the approach of creating new event devices. From the top of my head, the big pain points are:

  1. It takes a while for the new device to be recognized by programs, which is painful for short-lived scripts that just want to send some keys and then quit.
  2. The new device does not take over the state of the old device. If the user has pressed a key on the keyboard and grab that device, that key will remain permanently pressed and cannot be released by a KEY_UP event on the virtual device.
  3. Any configuration options the user has applied to the old device are not taken over by the new device. For example, if the used changed the mouse acceleration of their physical mouse, they need to re-configure that acceleration for the virtual mouse.
  4. Event devices need to announce all event codes they can produce when they are created. When user keymapping scripts can be Turing-complete, it becomes impossible to reliably predict what those codes can be.
  5. Event devices are valid for the whole system rather than a single user, and therefore require root-level permissions.

The kernel folks have already been kind to the keymapping community by giving us tools like uinput and grabbing event devices. And looking at the above list, I think all except the last pain point could be fixed if we had an additional ioctl (say, EVIOCSHADOW) which did the following:

  • Create a shadow device identical to an opened event device (say, shadow-1 and keyboard-1 respectively.)
  • Redirect all events originating from the kernel that would be sent to keyboard-1 to shadow-1;
  • Give a file descriptor of shadow-1 to the program that issued EVIOCSHADOW, and not to any other part of the system;
  • Allow the program that issued EVIOCSHADOW to treat keyboard-1 as if it were an uinput device;

In other words, a kernel ioctl that makes it possible for a program to change the events on an event device without the rest of the system having to notice that event devices are getting created, grabbed, or destroyed.

It would solve issue 1, 2, and 3. Issue 4 would remain; solving it would require some kind of extension to the evdev protocol to allow devices to change their capabilities, but that might run into backwards compatibility issues. Issue #5 would remain as well, but is more of a theoretical issue since most computers are single-user nowadays.

This way it also would make it easily possible to run 5 or 6 input handling tools simultaneously, since each tool can shadow the input device that was already shadowed by the previous tool without needing those tools to even be aware that there are other tools running as well.

Thinking about it, most of the pain points related to grabbing event devices for keymapping stem from the newly created uinput device being a distinct entity from the original device. If we could get a new ioctl that would allow us to sidestep that issue, about 60% of our problems would be solved without requiring a new userspace daemon.

@kermitfrog
Copy link
Owner Author

[design-diagram]

I'm also worried that the whole extra layer might add too much latency and complexity. If we do that, it should ideally be optional in the sense that it's only used when there are actually multiple tools trying to grab the same device.

My gut feeling says that the kernel approach is probably the better idea, but I have to think about that some more...

As for issue number 4:

Event devices need to announce all event codes they can produce when they are created. When user keymapping scripts can be Turing-complete, it becomes impossible to reliably predict what those codes can be.

I don't think it's such a big problem. In the first version of inputMangler, I didn't know that uinput could do everything I needed, so I wrote my own kernel-module which simply announced all the events that could make sense for that type of device - no matter whether those events were ever generated or not.
As far as I remember, the only real issue was that the capabilities determine which kind of device the system believes it to be. If you do it wrong, a virtual joystick might be recognised as a tablet. But it's not too hard to figure out.

SDL (and by extension Steam) seems to differentiate between joystick and controller by looking up the vendor/product id in a database first, then defaulting to controller if the device has 6 axes. So it might be good to convince the SDL devs to reserve certain ranges of product ids for vendor 0x0000 for certain types of virtual devices to prevent issues (I had enough of these with the Spacemouse).

Of course, if we were to actually make one backend service to handle all possible input transformations, which has great performance, and so on, all of that might not even be necessary... well .. if it just was that easy..

Until any of this is implemented, maybe there is a workaround... the question is:

@pallaswept: do you need to have the same events processed be multiple tools that grab a device?

If e.g. you just need tool A to process mouse movements and tool B to process it's buttons, this might be solvable by splitting the events into 2 virtual devices. I think evsieve is currently the only tool that supports this, so that would be the lowest layer. Then tool A could grab the move device and tool B the button device.

There might be some problems with tools reading the virtual devices if all of them have the same vendor/product-id, but uinput allows that to be changed. @KarsMulder does evsieve support setting those ids?

It might also be neccessary to unite those devices later, but I believe most of the tools here do that anyway.

@sezanzeb
Copy link

sezanzeb commented Nov 9, 2023

I think it is not possible to accomplish the above with just pipes because anything you write to a pipe can only be read by a single process anyway. Those "new readable pipes" would have to become Unix domain sockets instead. With sockets, it becomes also becomes possible to track which processes are listening as a nice side-effect.

Thanks for the clarification

As far as I remember, the only real issue was that the capabilities determine which kind of device the system believes it to be. If you do it wrong, a virtual joystick might be recognised as a tablet. But it's not too hard to figure out.

Couldn't get it to work for a stylus, but yeah, it can be figured out somehow usually. I wish it was determined by some sort of enum value instead that is being reported by a device.

Issue 4 would remain; solving it would require some kind of extension to the evdev protocol to allow devices to change their capabilities

If events contain that enum, the system could decide to treat is as joystick movement, while ignoring any device capabilities, couldn't it?

This way it also would make it easily possible to run 5 or 6 input handling tools simultaneously, since each tool can shadow the input device that was already shadowed by the previous tool without needing those tools to even be aware that there are other tools running as well.

image

@KarsMulder something like this? When the hardware reports "a", each shadowed device receives the event for "a", and each tool reads "a". What if each tool decides to not map this key and just forward it, will "aa" be written to "Keyboard"?

@KarsMulder
Copy link

KarsMulder commented Nov 10, 2023

Something like this. Imagine that the default setup is like this:

nomap

(I suppose this is slightly oversimplified since the read() system call that Wayland makes needs to pass through the kernel as well, but anyway.)

A physical keyboard emits events to the kernel. The kernel sends those events to an event device keyboard-1. Wayland and other processes on the system can read those events.

Now suppose a program "Mapper #1" comes along which issues the hypothetical EVIOCSHADOW on keyboard-1. The kernel will then adjust the topology to become like the following:

onemap

The kernel stops writing the events from the physical keyboard to keyboard-1. Instead, it writes them to shadow-1, an event device that is only accessible to Mapper #1 and no other part of the system. Mapper #1 get a file descriptor for shadow-1, but shadow-1 does not show up in /dev/input or anywhere else. The state of shadow-1 is identical to the state of keyboard-1 at the time that EVIOCSHADOW was issued, e.g. any keys that were pressed on keyboard-1 are also considered to be pressed on shadow-1.

The program Mapper #1 can now read events from shadow-1 like it can read them from any other event device. If Mapper #1 does nothing, then no events get written to keyboard-1 and the whole system loses access to the keyboard just as if it had been grabbed. Mapper #1 can write events to keyboard-1 like it can write events to any other uinput device. The events it writes to keyboard-1 can be read by Wayland and so on.

The mapper scripts do not explicitly announce that they want to drop any particular event, events can simply be dropped as consequence of a mapper script reading an event from a shadow device and then not writing that event to its output device.

This is basically the trick of "grab an input device and create another uinput device", except this whole process is invisible to Wayland. Wayland can just keep reading events from keyboard-1 as if nothing happened, whereas with the old method Wayland would have to notice that another input device was created and open it, without even knowing that this new input device was related to another device.

When another script, say Mapper #2 also issues EVIOCSHADOW on keyboard-1, the event chain becomes:

twomap

Just like the events from the keyboard got redirected to shadow-1 when Mapper #1 issued EVIOCSHADOW, a second invocation of EVIOCSHADOW causes the events that Mapper #1 writes to be redirected to shadow-2. This means that all events from the physical keyboard first pass through Mapper #1, then through Mapper #2, and finally back to Wayland and the rest of the system.

@KarsMulder
Copy link

There might be some problems with tools reading the virtual devices if all of them have the same vendor/product-id, but uinput allows that to be changed. @KarsMulder does evsieve support setting those ids?

It currently doesn't because I actually wasn't even aware that event devices hat vendor and product ids. I thought that was something that only existed at the USB-device level, but I guess I was wrong.

It doesn't seem like a difficult feature to add. I'll get around to it when I figure out what the CLI arguments should be.

(Should --output accept a clause like device-id=045e:082c or should that require two clauses like vendor-id=045e product-id=082c? The latter seems unnecessarily verbose, but the former gives the impression that vid:pid are the only two things that matter for a device ID and forgets about the bus number and version number. I suppose I'm going to need clauses like bus=3 and version=111 too, unless there is some standard format to make them fit in a single clause like device-id=3:045e:082c:111. Also, should bus and version number be specified in decimal or hexadecimal format? Usually you think about those things as decimal, but evtest reports them as hexadecimal and the vid:pid are hexadecimal too...)

@kermitfrog
Copy link
Owner Author

Couldn't get it to work for a stylus, but yeah, it can be figured out somehow usually. I wish it was determined by some sort of enum value instead that is being reported by a device.
[..]
If events contain that enum, the system could decide to treat is as joystick movement, while ignoring any device capabilities, couldn't it?

An enum for the device type would be nice :)
But I wouldn't send it with every event. If I had to handle a specific device in my end-user-application/GUI-library, I'd want my handler to be able to rely on all events belonging to one device type and use different handler for a different device class. An enum per event would just add bloat at multiple levels. Mixing such stuff is what we do ;)

Hm.. that makes me wonder if we could speed up input events on linux by truncating the timestamp to the final 16 bits. I'm not sure the rest is really needed anyway.

[..] When another script, say Mapper # 2 also issues EVIOCSHADOW on keyboard-1, the event chain becomes: [..]

This would also reduce the systems number of virtual devices. We would still need them for events that don't fit into existing ones..

Some things to decide on:

  • does Mapper # 2 see the original events, or those sent by Mapper # 1?
    • if the latter: lets asume Mapper # 1 translates a keyboard event to a mouse event.. does that affect Mapper # 2?
      • do we inject the new event into a shadowed mouse if capabilities match or always into a virtual one?

I'll get around to it when I figure out what the CLI arguments should be.

I'd use device-id=045e:082c with a possible shorthand device-id=:082c when vendor-id is 0000, as both are always needed to identify a device.
Hex seems the better choice -- all tools seem to report them in hex and deviating from that would just make it harder for the user.

@KarsMulder
Copy link

does Mapper # 2 see the original events, or those sent by Mapper # 1?

Those sent by Mapper #1. The effect is the same as if Mapper #1 created a virtual device shadow-2 which was subsequently grabbed by Mapper #2.

the latter: lets asume Mapper # 1 translates a keyboard event to a mouse event.. does that affect Mapper # 2?

The shadow-* devices all must have the same capabilities as the original keyboard-1 device. Assuming that keyboard-1 didn't just happen to have an integrated mouse, it is not possible for Mapper #1 to write mouse events to keyboard-1.

When Mapper #2 starts and silently replaces keyboard-1 with shadow-2, this transition is supposed to be invisible to both Mapper #1 and to Wayland. As such, Mapper #1 can still not write mouse events to keyboard-1/shadow-2.

It would be possible for Mapper #1 to shadow another mouse device (or create a new virtual one) and write mouse events to that device.

Either way, Mapper #2 will not be able to observe any mouse events getting emitted by the keyboard device. If Mapper #2 does want to observe mouse events, it should listen to or shadow a mouse device as well.

do we inject the new event into a shadowed mouse if capabilities match or always into a virtual one?

I imagine that writing events to keyboard-1/shadow-2/whatever would follow the same rules as writing events to any other virtual device: events that do not match the capabilities of the virtual device get silently dropped. It is the job of the mapper script to ensure that it is writes its events to devices that are capable of them.

@pallaswept
Copy link

pallaswept commented Nov 10, 2023

This thread is pure gold so far and I want to thank you all sincerely for your input. I hoped but never imagined I'd have such a positive response, thanks so much.

@pallaswept: do you need to have the same events processed be multiple tools that grab a device?

Yes it's pretty frequent. Just to make matters worse, it's also fairly common to need to process the same event (say, pressing the ctrl key) by multiple tools, from multiple devices. Like say, maybe one day I can't use my left hand, so I'll rebind a mouse button to a ctrl key, and I'll need the footswitch to read the ctrl keypress regardless of where it came from, from the keyboard, some other device (bluetooth keyboard), the re-bound mouse button, on-sreen keboard, etc, to modify the footswitch's behaviour, or I might use that same ctrl key to modify the behaviour of some other keybind, in another tool. Just to give a curly example.

I'd echo everything that's been said ITT so far. A middle layer is the least reliant on outside support, but it does have shortcomings. I also feel like the kernel is the best place to be doing this, from a functional point of view. A Wayland protocol might also be as functional, but then there's a reliance on its implementation from every compositor, and it might take a very long time to become a reality, or just never happen. I do have similar fears about doing this in the kernel, though. I wonder how hard it would be to get the kernel maintainers interested in such a thing, enough that it could become a reality. Requiring a custom patched kernel would make it somewhat prohibitive. That being said, if it could be a kernel module, then that makes things a lot simpler for the end-user to implement.

And yeh, thanks again for this amazing conversation. Your input is priceless. Please forgive my lack of input, I'm mostly just trying to stay out of the way right now :)

@sezanzeb
Copy link

sezanzeb commented Nov 19, 2023

"read this keymap, add a character あ corresponding to a currently unused scancode, and then serialize the result as a new XKB file"

Mapper A can associate character あ with the previously-unused scancode 250

I did that at some point, more or less: https://github.com/sezanzeb/input-remapper/blob/xkb/keymapper/injection/xkb.py

However xkbcommon/libxkbcommon#223 (comment)

Wayland does not implement a generic way for clients to change the keymapping; you'd have to work through environment-specific configuration API.


This does mean that each mapper should know ahead of time which keys it could possibly desire to generate

Which might be impossible with mapper scripts that allow sophisticated user-defined custom scripts

@KarsMulder
Copy link

KarsMulder commented Nov 19, 2023

According to The Wayland Book [CC-BY-SA 4.0]:

Note that the server can send a new keymap at any time, and all future key events should be interpreted in that light.

So it seems like the protocol already expects clients to be able to deal with changes in the keymap. All we need is a new protocol to tell the compositor that a new keymap should be used.

@sezanzeb
Copy link

sezanzeb commented Nov 20, 2023

"read this keymap, add a character あ corresponding to a currently unused scancode, and then serialize the result as a new XKB file"

So it seems like the protocol already expects clients to be able to deal with changes in the keymap. All we need is a new protocol to tell the compositor that a new keymap should be used.

That would be nice. Simple wayland input remapping utilities might already exist at this point if this was possible.

Regardless of how more sophisticated mapping tools can be made to work with wayland, the above might already be an improvement.

So, should we ask wayland developers to consider this? Mailing list idk?


The clients on their turn are expected to individually link to libxkbcommon to turn those scancodes into characters.

Other events like joysticks are probably not passing through libxkbcommon? Because maybe one could hook a mapping script into libxkbcommon somehow. Maybe libxkbcommon can be modified to provide something like a devnode for reading scancodes and writing characters. But it's probably synchronous and you can't really do anything funky with it (like writing multiple characters with delay), isn't it?

@KarsMulder
Copy link

Regardless of how more sophisticated mapping tools can be made to work with wayland, the above might already be an improvement.

So, should we ask wayland developers to consider this? Mailing list idk?

According to the original draft image, the programs that are likely to change the keymap are also the programs that are likely to change the events themselves:

284016251-42800cc8-dc10-4cd7-955b-14b527e7802c

The programs that map the events need to be put in some order in a consistent way, lest there is only some% chance that your whole setup works during any individual reboot. To that end, we need to figure out a way to order the event mapping programs, and that same ordering system will probably be reused for ordering the keymap changes when multiple programs want to modify the keymap.

And, of course, we still need to figure out how to authenticate the programs that are or are not allowed to map events or change the keyboard layout. Wayland devs don't want that programs running in a sandbox becomes able to keylog the whole system just because they're allowed to display stuff on screen.

The point is that the event mapping protocol and the keymap-changing protocol will probably end up both relying on some common basis. I think it is best to tackle the event-mapping problem and the keymap-changing problem at the same time, than to rush one part of the solution, only to later discover that it doesn't interact nicely with the other half of the solution.

In case we end up giving up on finding an event-mapper protocol, then it may be a good time to propose an independent keymap-changing protocol.

@KarsMulder
Copy link

Maybe libxkbcommon can be modified to provide something like a devnode for reading scancodes and writing characters. But it's probably synchronous and you can't really do anything funky with it (like writing multiple characters with delay), isn't it?

libxkbcommon's API involves the programmer passing individual scancodes to functions like xkb_state_key_get_one_sym or xkb_state_key_get_utf8, and those functions return whatever corresponds to that specific scancode given the state of the keyboard. I don't think we should mess those functions.

@kermitfrog
Copy link
Owner Author

Wayland (..) would have to recognize that a unicode character is included in the event payload and use that, instead of looking into the keymap.

Considering the issue with special characters and application-preset-switching, I doubt that a solution that only works via evdev will be really satisfying for everyone. Would it make sense to open an issue on their repo and ask them what they think? https://gitlab.freedesktop.org/wayland/wayland

&

Mapping to keys not on the keyboard

I think the big question is where to inject unicode.. kernel-level makes no sense to me; XKB-level/libinput might maybe work. Some things to think about:

  • when writing applications with Qt, I can also get the kernel-level keycode of the event.. what keycode would I get for a unicode character?
  • will such events cause problems for some applications?
  • how to deal with unicode extended grapheme clusters? (https://tonsky.me/blog/unicode/)

I think what we maybe should do first, is to find out how IMEs work and whether we can use that to write arbitary characters.

Long story short, it seems like we need to escalate this no matter what, and I think the kernel is the place to go, because Wayland is definitely not.

I agree on kernel vs. wayland, but after reading all the new stuff by @KarsMulder, I wonder if we might need to contact the libinput developers as well.

Wayland protocol-level mapping vs Input Daemon

&

Getting a new Kernel API would certainly solve issues regarding to two evdev mapper scripts working together, but there are some problems that we're simply not getting solved on an evdev level:
[..]
And probably some more. Frankly, if we intend to keep our Wayland keymappers working on evdev level, we will never get close to the amount of input customization that Windows offers.

As far as I understand, the basic input stuff on wayland is handled by libinput. Compositors are free to implement some more complicated stuff - but that would lead to fragmentation and because of that is probably better left to another library or the end-user applications. Otherwise some stuff will only work on certain DEs.

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

For our purposes, I think libinput is a much better layer to put a mapping API than wayland. It also reads directly from evdev, so it might be the one place where pretty much anything an input mapper would do comes together.
It is supposed to be boring and leave complicated stuff to other programs by design, which might make the devs resist the addition of the protocol we need.. but I believe we have good arguments to justify it..

Integration with accessibility features

[..] I wonder what kind of accessibility Qt and GTK already have built in. Do they already expose it to the compositor, or would they be able to do so with another protocol extension?
&
I wonder if we can design the event mapper protocol to be able to take advantage of any accessibility information available, or become able to do so in the future after.

I tried to find out how this works and so far understand that there is a protocol called AT-SPI2 (https://www.freedesktop.org/wiki/Accessibility/) which seems to be implemented by Qt , GTK and possibly other toolkits.

When an application starts while some accessibility program is running (not sure how this is checked), that application exports a DBUS interface which exposes metainformation about the GUI as well as some ways to interact with it. See here (https://doc.qt.io/qt-6/qml-qtquick-accessible.html#details) for some information on how it's done in Qt.

I don't think the compositor has anything to do with it.

@pallaswept
Copy link

pallaswept commented Nov 23, 2023

Just wanted to say that if I seem quiet it's not because I'm ignoring all this, it's because you guys are like, light years ahead of me on this, and I'm kinda following along behind you. I really appreciate all the effort you're putting in and sharing your experience and know-how on this. If I'm quiet, it's not because I'm ignoring all that you're giving, it's because I'm standing in awe and appreciation <3

@KarsMulder
Copy link

KarsMulder commented Nov 23, 2023

The current Input Method Editor protocols

It appears that fcitx5 uses the zwp_input_method_context protocol to communicate with the Wayland compositor. The Wayland compositor then allows it to take control of textfields in client applications that use the zwp_text_input protocol.

If all we cared about was mapping text input (which is not the case), then the zwp_text_input protocol does offer some nice features:

  • It allows you to read and change the content of the text fields; (You know, the thing IME's are supposed to do.)
  • It allows you to send a grab_keyboard request to block the client from receiving keyboard keys;
  • It allows you to send a keysym request to directly send an XKB keysym, which can be an Unicode codepoint and does not undergo any transformation based on the active keymap.

As far as keyboard mapping goes, this sounds good so far. Now we get to the bad parts:

  • The protocol does not appear to be designed for composability with multiple IME's. My compositor (kwin) asks you to choose one, and only one, IME in the configuration menu;
  • It only works if a text field is in focus. It is not the kind of thing that would be useful for things like "press down to scroll the webpage down";

And last but not least: as you can see in the protocol of zwp_input_method_context_v1, there is a keysym request. However, if you search through the corresponding zwp_text_input_v3 protocol, you may notice a distinctive lack of a corresponding keysym event that would notice the client that the IME send a keysym request. So what actually happens to the keysym requests that the IME sends?

It turns out zwp_text_input_v1 and zwp_text_input_v2 used to have a keysym event, but that event got removed from zwp_text_input_v3. After digging through the source code of Kwin, it seems that when zwp_text_input_v3 is in use, then the compositor will "convert" all keysyms received from zwp_input_method_context_v1::keysym back to scancodes ("like this"), and then forward that scancode to the application. Scancodes are again subject to the XKB keymap, so we lost the ability to send keysyms that are not on the active keymap.

It seems that zwp_text_input_v3 originates from gtk_text_input [source], but I haven't been able to find anything about the related discussion that went into it beyond being mentioned in this short article. Anyway, I imagine that the keysym event got removed because receiving keysyms directly does not interface nicely with the rest of libxkbcommon and hence is a pain to deal with for the client.

@KarsMulder
Copy link

KarsMulder commented Nov 23, 2023

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

I wonder how games read those joysticks. You generally need to be root or member of the input group in order to directly read event devices. Do games (or Proton?) escalate to root privileges just to be able to read joysticks?

For our purposes, I think libinput is a much better layer to put a mapping API than wayland. It also reads directly from evdev, so it might be the one place where pretty much anything an input mapper would do comes together.

I think that it is kinda unfortunate that Wayland simultaneously handles display and input. Why should it be the display server's job to decide what input reaches the applications? Why can't the user be free to choose their display and input server separately? If only that libinput part that's baked into all Wayland compositors became dynamically swapable...

(Without LD_PRELOAD please.)

But that's pretty much the "Input Daemon" suggestion I made, and that has its drawback too.

(But I still sometimes feel like brazenly suggesting a protocol that basically says "The compositor is no longer in charge of the wl_seat global, all requests to it must be relayed to another application and all events from it will come from another application. With some adjustments the idea might not even be as insane as it sounds at first.)

Anyway, even if we did decide that libinput got extended to work nicely with keymapping, we would still need to figure out the protocol that multiple applications could use to simultaneously keymap. And if we had such a protocol, we could think about why it shouldn't just be a Wayland extension protocol.

(Also, is libinput even aware to which window its input goes? I got sidetracked by the IME's and still haven't gotten to the bottom of that.)

@kermitfrog
Copy link
Owner Author

As far as keyboard mapping goes, this sounds good so far. Now we get to the bad parts:

  • It only works if a text field is in focus. It is not the kind of thing that would be useful for things like "press down to scroll the webpage down";

I only thought about using IME for injecting unicode text that is not in the keymap anyway, so that's not really a problem (I really can't think of a use case for that other than writing text).

  • The protocol does not appear to be designed for composability with multiple IME's. My compositor (kwin) asks you to choose one, and only one, IME in the configuration menu;

This one is :(. Maybe there is a way to put an injection layer in between the IME and the compositor? Otherwise we would need to extend the protocol to use IME for unicode injection.

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

I wonder how games read those joysticks. You generally need to be root or member of the input group in order to directly read event devices. Do games (or Proton?) escalate to root privileges just to be able to read joysticks?

No, its simpler: device nodes handling a joystick get different permissions.

crw-rw----  1 root input        13, 81 Nov 24 09:08 event17  <-- Mouse
crw-rw----+ 1 root input        13, 82 Nov 24 09:23 event18  <-- PS4 Controller joystick part
crw-rw----  1 root input        13, 82 Nov 24 09:23 event19  <-- PS4 Controller Motion Sensors
crw-rw----  1 root input        13, 83 Nov 24 09:23 event20  <-- PS4 Controller Touchpad
crw-rw-r--+ 1 root input        13,  0 Nov 24 09:23 js0      <-- PS4 Controller joystick (old?) protocol

getfacl event18 prints:

# file: event18
# owner: root
# group: input
user::rw-
user:arek:rw-
group::rw-
mask::rw-
other::---

When switching to a different user (without logging out the first), the username gets changed to that one (user:tmp:rw-), so I guess polkit or something similiar is involved.

Also, if you're wondering: grabbing event18 will block events at js0.

Anyway, even if we did decide that libinput got extended to work nicely with keymapping, we would still need to figure out the protocol that multiple applications could use to simultaneously keymap. And if we had such a protocol, we could think about why it shouldn't just be a Wayland extension protocol.

Yeah, maybe we should focus on defining the protocol first..

(Also, is libinput even aware to which window its input goes? I got sidetracked by the IME's and still haven't gotten to the bottom of that.)

I'm pretty sure it is not. That's the thing we really need the compositor to provide and it would be great if we could make it part of the wayland core protocol. Until then, kwin might be the only choice for people who need this.

@sezanzeb
Copy link

I only thought about using IME for injecting unicode text that is not in the keymap anyway, so that's not really a problem (I really can't think of a use case for that other than writing text).

I remember that sometimes applications have keyboard shortcuts that use special characters, which aren't accessible on my german layout without using modifiers.

@KarsMulder
Copy link

KarsMulder commented Nov 26, 2023

After looking at libinput some more, it does seem to have some seriously useful features such as such as button debouncing and palm detection, which filter out events sent by the hardware that were never intended by the user. You generally want your keymapping scripts to skip over those as well.

If we were to map after libinput, then we run into the problem that libinput merges all input devices into seats, where all similar devices get merged together into one device. This would make it impossible to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

However, taking a closer look at the libinput source code, the situation may not that bad: libinput does report for each event from which device it originates (libinput_event_get_device), and as far as I can see, it does generate multiple KEY_DOWN events if the same key is pressed on multiple keyboards, it just also sends a seat_button_count along with each event, telling you how often that particular key has been pressed across all devices belonging to that seat.

However, if we add our mappers after libinput, then we do have to (?) map libinput events. The problem with mapping libinput events is that they're kind of unwieldy. For example, this is the libinput event for pointer events:

// Code taken from libinput. Copyright © 2013 Jonas Ådahl, © 2013-2018 Red Hat, Inc.
// Licensed under MIT. See the header of the original file for the full license:
// https://gitlab.freedesktop.org/libinput/libinput/-/blob/b600cc35c5b001cbc6685d4d95ce2f3d36fb3ae4/src/libinput.c

struct libinput_event_pointer {
	struct libinput_event base;
	uint64_t time;
	struct normalized_coords delta;
	struct device_float_coords delta_raw;
	struct device_coords absolute;
	struct discrete_coords discrete;
	struct wheel_v120 v120;
	uint32_t button;
	uint32_t seat_button_count;
	enum libinput_button_state state;
	enum libinput_pointer_axis_source source;
	uint32_t axes;
};

That's quite a lot more than what Wayland reports to applications. Some of it is redundant, like the same coords in different coordinate formats; a mapper script would have to take care of modifying all of them at once. It also contains a painful seat_button_count, which tells you how often a particular key or button has been pressed across all devices assigned to a seat. If you were to map only one device, you'd mess up the seat_button_count on all other devices. And last but not least, I feel like this kind of event leaks too many implementation details to be a good candidate for standardization.

The ideal solution would involve rewriting libinput with a more modular architecture where the various features it provides are implemented as different layers, and where third party modules can be inserted in the middle of the processing chain (e.g. after filtering out palms, before gesture detection and before the coordinates are formatted in a bazillion different ways), but I have my doubts that we can get the original libinput developers to go along with that plan.

The Wayland protocol does not send the entirety of the libinput events to applications either. Maybe we can get away with simplifying the event format after it leaves libinput? [Edit: this sentence is false, Wayland does send the approximate entirety of the libinput events to applications.]

@pallaswept
Copy link

to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

FWIW, this is a thing disabled users need, too. Definitely a real use-case.

@kermitfrog
Copy link
Owner Author

I remember that sometimes applications have keyboard shortcuts that use special characters, which aren't accessible on my german layout without using modifiers.

Now that you mention it, it seems soooo obvious... As someone who uses the programmer dvorak layout, I have often run into programms (mostly games) which expect me to press things like '1', '2' or '3' without a modifier and it's a real pain in the a.. :(

Although I have a rough plan on how I can avoid most of these issues in the future, it would be great to have a proper solution that does not involve editing xkb layouts.

But considering how many different approaches to handling keys there seem to be in different programms / frameworks, I am very doubtsful about how much events with unicode codepoints will be able to help with this mess.
But it's still an approach worth thinking about..

to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

FWIW, this is a thing disabled users need, too. Definitely a real use-case.

I think multiple keyboards might not be uncommon among users who use event mapping.
I myself have a keyboard, a footswitch and a keypad, all of which register as a keyboard.

However, if we add our mappers after libinput, then we do have to (?) map libinput events. The problem with mapping libinput events is that they're kind of unwieldy. For example, this is the libinput event for pointer events:

I would not have thought the output struct is that big O.O

The ideal solution would involve rewriting libinput with a more modular architecture where the various features it provides are implemented as different layers, and where third party modules can be inserted in the middle of the processing chain (e.g. after filtering out palms, before gesture detection and before the coordinates are formatted in a bazillion different ways), but I have my doubts that we can get the original libinput developers to go along with that plan.

Yes, I believe that would be best, too. And share your doubts as well :/.

But what are the alternatives (at least if we want a full-parts-mapping-protocol)?
Some ideas:

  • Fork libinput and hope that wayland makes it easy to choose which one is used.
  • Remap at different levels, meaning pre and post libinput.
  • Maybe libinput-devs would agree to make things like palm detection callable via API. Then we could call these for preprocessing as needed. I'm not sure how much this would help though..

Maybe we should start by compiling a list of features we need from libinput.. I need to think about this some more..

@pallaswept
Copy link

pallaswept commented Nov 27, 2023

I have my doubts that we can get (people) to go along with
The ideal solution....

When I first read this I wrote and then deleted a few angry responses.

Nobody can be forced to help, but nobody should be allowed to stand in the way of fixing this. If somebody prevents fixing this, they are as much a cause of the problem, and their removal from the system is as much a part of the solution, as any code, protocol, or design concept.

I can't stand it when people fork or build alternatives, rather than improving existing solutions; it usually just creates a mess and makes it harder for end users to have a coherent system, usually they end up having to choose between two incomplete solutions.... I really dislike forks in general.... but if one is not allowed to improve existing solutions, one has little choice but to build an alternative, be it from a fork or from scratch.

I like to hope that the devs of any project which would be involved, will recognise any shortcomings in their implementation and not only be willing to take contributions, but also to assist in contributing themselves. I mean, if you built a thing, you'd surely want it to be the best thing it could be, and not have giant problems that make the entire operating system unusable for a significant percentage of human beings. I would like to remain optimistic that the libinput devs would take all of this on board with a positive response.

If it's just some random crippled greybeard retired dev and a small handful of FOSS-enthusiast disabled folk having a cry about it, while all their friends, fellow cripples and demanding high end gamers alike, joke about what a nerd they are and just use Windows or iOS, then I can see it going nowhere - because that's what's happened so far!

However, with knowledgeable and experienced input (pardon the pun) from experts, which moves from just having problems towards building solutions, like you all are contributing, I think this thread amounts to the beginnings of a very convincing proposal to improve existing solutions, and I like to think (hope...pray.......) that the devs of whatever project might need enhancements, would take it seriously and view it as constructive, and not be defensive about it.

@kermitfrog
Copy link
Owner Author

[..] I think this thread amounts to the beginnings of a very convincing proposal to improve existing solutions, and I like to think (hope...pray.......) that the devs of whatever project might need enhancements, would take it seriously and view it as constructive, and not be defensive about it.

From the libinput docs:

What libinput is not
[..] There are plenty of use-cases to provide niche features, but libinput is not the place to support these. [..] libinput is boring. It does not intend to break new grounds on how devices are handled. [..]

I think these are the descriptions that make us sceptical about acceptance of a big change in libinput. But you are right: what we are preparing here is constructive and needed for various reasons and we shouldn't make the mistake of letting (possibly unwarrented) worries of rejection slow us down.
The libinput devs might just as well embrace the new stuff, or at least participate in an alternative solution. In any case it won't hurt to ask!

I think the next steps should be:

  1. Collect what needs to be changed -- I started a new issue for this: Protocol requirements for multiple input mappers #3
  2. Write rough proposal
  3. Send it to the libinput devs

I probably won't have enough time for this before wednesday (or friday) though.

[..] Back in windows-land, it's not even a bat of an eyelid to be running 5 or 6 input handling tools like this simultaneously. Nobody talks about it because it's normal. [..]

Out of curiosity I looked at the windows input API docs. From maybe an hour of reading, this is what I understand:

There seems to be only one input stream which only distinguishes devices between Mouse, Keyboard and other.
There is one call BlockInput, which seems to block all keyboard and mouse input from reaching other applications.
The thread that used this to block input, can then still get physical events inject new events into the input stream.

Also: keyboard events can carry unicode characters (16-bit, I'm sure it means UCS-2 encoding). If that happens, it generates a virtual event (I think).

But I really don't understand how multiple applications are supposed to work together - from what I read so far I'd expect the situation to be a lot worse than on linux. I'm most likely missing some knowledge about how the input stream works.

@pallaswept
Copy link

I think these are the descriptions that make us sceptical about acceptance of a big change in libinput

Yeh I kinda honestly feel like there's a strong likelihood they'll vehemently "nope" this, on the spot. Then again, I've heard a bit recently about them adding support for IIO (as in, Industrial IO; accelerometers, light sensors, weird input devices) and there's the very closely related libei they've recently added to their stack, so ...mehhh I dunno I have strong doubts for the same reasons you mentioned, but kinda also feel like maybe they'll be really feeling all this and might just get involved. I really wish some of those devs were in this thread right now. I feel like even if they "nope" it for their own project, they'd tell us how to make it happen in some other way. Even in the worst case scenario, they say "hell no, and the only way it would happen is if we say yes, so you'll have to fork libinput, now stop wasting our time and don't talk about it any more" at least we know what's in front of us. It feels like there's a pool of knowledge among those devs, that we're missing out on....so

In any case it won't hurt to ask!

Yeh! 🙂 I feel like getting their input is definitely on the cards. Thanks so much for getting the ball rolling on that one. I'm glad you started a nice new clean issue for it too.

Also I might just tag @MerlijnWajer here, who hasn't updated uinput-mapper in a decade but was very early in this game and might have some interesting thoughts here. Sorry if @'ing you was an annoyance, Merlijn! I just thought you could be a valuable player here :)

@KarsMulder
Copy link

I've been thinking about the new Wayland protocol and posted my current (incomplete) draft in a new issue: #4

I've got a good feeling about this one, but there's still quite some work that I need to do. It is neither fully implemented nor fully documented yet, some parts of the current spec are broken, et cetera. Anyway, I just wanted to post my current progress to show that something is getting done.

@KarsMulder
Copy link

libei

This is pretty big.

It is basically an API for creating virtual input devices. Combined with our ability to just grab all input devices, we basically have the necessary API's for creating an "input daemon" as I mentioned earlier.

While an input daemon is not the perfect solution, it does provide a big possibility: suppose we create some Wayland protocol and write a library that implements it, but compositors are reluctant to implement it. Then we could write a daemon which grabs all event devices, processes those events through libinput and our library, and then makes the resulting events available through libei.

Mapper scripts could then check if the compositor natively supports the protocol, and if the compositor doesn't but does support libeis, start the daemon as fallback.

The daemon approach still has disadvantages such as requiring another process to run, another program to install, would make all devices show up as "virtual devices", prevents other applications like Qemu from grabbing the evdev devices, may not be able to change the keymap, may not be able to perfectly switch the active mapper based on the active window, et cetera. But it could provide a somewhat suitable fallback for users who are stuck on a compositor that does not support the new protocol but does contain libeis.

@kermitfrog
Copy link
Owner Author

libei

This is pretty big.

Yes, it could be useful.
In addition to creating the daemon, it offers possibilities to directly transform evdev-level events to post-libinput level events. I wonder if there are good use cases for that..

@pallaswept
Copy link

pallaswept commented Jan 9, 2024

I hope nobody minds, but I came across a related thread, where the above issues were discussed (well, brought up but not discussed much), so I linked this thread in the hopes that some of the (very important, respected, and capable) individuals there might perhaps weigh in on the conversation. The thread is over here https://discuss.kde.org/t/new-ideas-using-wayland-input-methods/3444/19. I just thought I should let this end of the conversation know that I'd linked it. Again, I hope this is OK, apologies if I've done the wrong thing.... Just.... a lot of the people in that thread are a pretty big deal and they're all working in this field at a fairly high level.

@kermitfrog
Copy link
Owner Author

I'm back! Well.. at least I should have some capacity for input stuff again :)

One of the time-consuming things I did the last weeks was to switch my keyboard to a Dygma Defy. This made me re-evaluate how I use my keyboard and I ended up modifying my layout (xkb-wise) as well. This gave me an idea for a (partial) workaround to the type-arbitary-unicode-symbols problem.

Let's start with a few often overlooked facts about xkb:

  • You can map any keycode that a keyboard could send. (in contrast to windows, which limits you to the visible-characters-without-numpad part of a 105-key keyboard)
  • It features key composition, which can be used to type characters that are not directly found on the layout.
    • + Composition does include dead keys as well as the Compose key (aka Multi-Key)
    • + The Multi-Key is a mappable key itself, which should mean that we can place it on any key.
    • + It seems that you can use input characters that require a modifier.
    • + Compositions are defined system-wide under /usr/share/X11/locale/*/Compose. Users can have their own compositions in ~/.XCompose
      • - changing these settings might require a re-login
    • - typing arbitary unicode values is not a xkb/compose feature! Mentions on the internet of doing that with Ctrl+Shift+U seem to rely on ibus being set as input method.
    • - this will probably not work in every application.

Now one sometimes forgotten fact about input remapping via uinput:

  • The keys supported on the virtual keyboard (output) are not limited by the keys supported by any real input devices.

That means: as long as we know which unicode characters can possibly occurr (defined by user configuration) and the mapper is aware of the current layout (I already wrote some working proof-of-concept code for this last year), we can:

  • Find the Multi-Key on the current layout or help the user to map it to some extra key that is only present on the virtual keyboard.
  • Generate compositions for all missing characters.
  • Trigger the necessary key combinations.

@pallaswept
Copy link

pallaswept commented Feb 13, 2024

Good to see you back, Kermit :) And Hi,all! This tab stays open in my browser for the time being... I think we're on a long road, here....

I came across a reddit post about this article, entitled, "Input method on Wayland is broken and it's my fault" which rang some bells here. The reddit thread mentioned ibus-typing-booster which I've tried out lately, and it's promising on a few fronts discussed above, but rather bug-prone at the moment. I leave it installed on my machine, in hope, but presently, it remains disabled.

Thought I might share the article in case it might be food for thought, or perhaps bring a 'new recruit' to this issue 😆 At least maybe if the author were to see this thread, they would not feel quite so much the lone bearer of fault in this situation... I don't think it's anyone's fault really. We are in need of a hero, or ten 😉 Do you think we should maybe send them a message?

@KarsMulder
Copy link

Thought I might share the article in case it might be food for thought, or perhaps bring a 'new recruit' to this issue 😆 At least maybe if the author were to see this thread, they would not feel quite so much the lone bearer of fault in this situation... I don't think it's anyone's fault really. We are in need of a hero, or ten 😉 Do you think we should maybe send them a message?

It seems we've found a real expert here. According one of their other articles, we're talking about the person who designed all the Wayland extension protocols around input methods. Feel free to message them.


To make visiting this thread maybe worth their time, here are some of my thoughts about "Mistake 2: bad synchronization" mentioned in the linked article.

(I am not actually sure if I understood the problem correctly. Does "commit" mean that the preedit string is to be turned into definitive content of the text box? If yes, the why does the second preedit string "Mo" still contain the "M" which should've become permanent content already? If no, why did the "M" character get reported as content of the text box due to lag? Anyway, here are my thoughts for as far as I think I understand the article.)

I get the impression that the fundamental problem is that the IME does not know which of its proposed changes were accepted and which were rejected. If it does not resend its changes when they do not show up, it is possbile that input gets lost when a web document is edited by somebody else. When the IME does aggresively resend any change that does not observe as having shown up in the text box, then there is probably a whole other can of bugs about to spring open.

If changing the protocol is still on the table (the protocol is still unstable after all), then I think this could be solved by making the "commit" message include from which state to which state is transformed, which makes it possible for the input method to figure out which of its actions were discarded.

Both the application and IME start at state 0. When either of them wants to change the content of the textbox, they must include both the old state number and the new state number in part of the commit message. The IME always uses even numbers for states that it creates, whereas the application always uses odd numbers for states that they create, to avoid clashing state numbers.

So, typing "Mo" would result in the following exchange of messages:

no_contest

All of these states were created upon initiative of the IME, so they all use even numbers. The application acknowledges each state transition explicitly, so the IME knows that all of its keys were accepted.

Now let's consider the laggy situation where the user is trying to type "Mo", but a collaborator on a web document types an "a" while the IME is still busy composing:

contested_commit

While the IME was trying to compose "Mo", the application received some TCP packet telling it to insert an "a" key after it read the "M" key from the IME but before it read the "o" key. From the application's perspective, two state transitions have happened:

  • 0 → 2, the transition that added the "M" due to user input
  • 2 → 3, the transition that added the "a" by a collaborator across the internet

At this point, the application is in state 3. It then receives a request from the IME to transition from state 2 to 4, but the application rejects it because it is not in state 2. The application informs the IME that it has observed two state transitions: 0 → 2 → 3.

The IME sees that the transition 0 → 2 was acknowledged by the application and thus the "M" key was accepted, but it has also sent a request to transition "2 → 4". Because the application moved from "2 → 3" instead, the IME knows that its second request has been or will be rejected, and thus that the application has not received the "o" key.

It knowing that the "o" key has been rejected, it then tries to play back all rejected requests, but this time based on the last state reported by the application. The user tried to type "Mo", the text now shows "Ma". If the "o" key went through, the text would be showing "Moa", so it sends a new request to transition 3 → 6 and change the text to "Moa".


A minor thought: maybe the state should not be assumed to start at 0, but instead at a value declared by the compositor. Furthermore, the split between the state numbers allocable by the IME/the application should maybe not be even/odd, but "within a range that is allocated by the compositor". This could maybe make it possible to use multiple IME's at once if they are all allocated distinct ranges, but that's another can of worms I haven't fully thought through.

@kermitfrog
Copy link
Owner Author

I read through the messages we wrote here and got inspired with a new idea. For now I call it UInput Orchestrator (UIO).
In short, it is a daemon that manages connections between mappers by creating and assigning multiple uinput devices, but could be extended to something else later.

It is not meant as the final solution, but rather an extensible starting point that we could implement without changing anything in evdev, libinput or wayland.

As this will likely be a longer topic, I created a new issue here: #5

@pallaswept
Copy link

Saw some news today about the KDE Plasma 6.1 release and they mentioned the "Input Capture Portal" which immediately captured (heh) my attention. Apparently its intended use-case is allowing software which shares keyboard/mouse between PC's, but perhaps we might find some way to use it to get our keyboards working locally?

Left this here in case UIO is not the final direction (although, it looks like it might be!)

@kermitfrog
Copy link
Owner Author

I discovered another promising remapper: https://github.com/xremap/xremap and would like to invite it's main developer to the discussion.

@k0kubun : if you are wondering why you are mentioned in an unknown project: this is an invitation to take part in our discussion.
I originally started this to see if we could avoid duplicate work by working together and possibly merge some input remappers. This is probably best described in the original post: #2 (comment)

In time we got more focussed on making it possible for multiple remappers to work together. The relevant post that started this is: #2 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants