-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wayland communication protocol draft, v0.2 #4
Comments
LayersLike the filter protocol at kernel level I proposed earlier, I think that event processing should be separated into layers. Here is a simplified image with only four layers drawn, but I think we should have about 2^32 of them: Unlike my previous proposal, this time layers are a concept that exist on a compositor level, not on a device level. Devices do not have layers; if anything, layers have devices. Or more accurately, they all have their own view of the properties and state of the devices. Before doing anything, a mapper script must attach itself to one of those layers (or multiple if it wants to, but that's potentially a really bad idea.) Let's say that our hypothetical script attaches itself to layer 1000. Once attached to a layer, the script can inquire the following:
Devices are layer dependent A script at layer 1000 can create a virtual input device. That device will be visible to everything attached to layer 1001 or higher, and invisible to everything at a lower layer. Device properties are layer dependent
These properties can depend on the layer as well. Suppose that a device named "Plain Keyboard" exists at layer 0. Then a script running at layer 1000 is allowed to change this name to "Fancy Keyboard". Any script running at layer 1001 and higher will see this device's name as "Fancy Keyboard" whereas any script at lower layer will still see it as "Plain Keyboard". Unless a script explicitly changes the properties of a device, all devices at later layers are assumed to have the same properties as those on earlier layers. There are some use cases for wanting to change the device properties:
But most interestingly, the associated XKB keymap is treated as a device property. Mapper scripts are able to change the associated keymap in a way that later layers will observe their changes without interfering with previous layers that are maybe changing the keymap as well. EventsI haven't figured out the right model for events to use. Maybe we find someway to make it work with the evdev format. Maybe we'll just have to use the libinput format. Whatever format we end up using, I want to define some new concepts. States Other events are transient, such as EV_REL events. There is no significant semantic meaning to be associated to the value of the last EV_REL event you saw. The collection of the last value observed of every persistent event of a particular device is called the state of that device. For a keyboard, the state might be "the K key is pressed, everything else is released". For a joystick, the state might be "the X axis is 94 and the Y axis is -4". Anyway, remember that at every layer, a set of input devices exist. The compositor should keep track of the state of each input device at each layer, even (in theory) of layers to which no scripts are attached. Of course it does so sparsely. If there is a script attached to layer 1000 and the next occupied layer is 50000, then it should just know that the devices and their states are identical at layers 1000–49999. To recap:
Event flow Listening Unlike grabbing, a listened device does not become unavailable at later layers, it just means that they receive no events. A script listening to a device is free to write events back to a device it is listening to, and those events will appear again in the event stream at the next layer. This means that there is no need to create virtual devices unless you really want to. Writing back to the original device is usually a good choice because that means that two scripts that are configured to listen specifically to a device called "Fancy Keyboard" can both work without having to agree on the name of a virtual device. Most mapper scripts that were not configured otherwise should probably just use a policy of "listen to all keyboard-type devices", so it won't matter whether those keyboard devices were "real" or "virtual". Unresolved question:
Initializing listening and unlistening Whenever a mapper listens to a new device, the compositor will send the mapper the current state of all input events (probably assuming that all keys that do not get sent are released, for efficiency). The mapper then sends output events telling the compositor what the state should have been if the mapper had been here earlier. For example, if the mapper maps K→Q and the current state says that K is pressed, then the mapper sends an input event "press Q" and tells the compositor it is done initializing. The compositor then checks for the difference between the current state and what the mapper thinks the state should've been, and then inserts events "release K, press Q" into the event chain to bridge the difference. Whenever a device gets unlistened, the compositor should know what its state was on the previous layer, and its state on the current layer. It then bridges the difference, for example by inserting "release Q, press K" in the event chain. The ability for the compositor to handle these state transitions is important to maintain the integrity of the event chain in case a script crashes. The first layer Maybe they come directly from event devices. Maybe they've the output of libinput. Maybe libinput's internal structure gets split into layers so that libinput internally uses the described event flow as well, although with a more effficent internal API that allows it to skip over the "communicate with the compositor" part. State synchronisationSometimes the enviroment of a layer changes. Changes in environment may include:
In these cases, it is possible that the state of the input devices (= the last value of all persistent events) needs to change even if users didn't actually input any events. For example, suppose there are two open windows Firefox and Dolphin. There is a script that maps the key K to the key Q on Dolphin but not on Firefox. Suppose that Firefox window is active and the user is currently holding key K down. Then the user clicks the Dolphin window. This means that by the time the Dolphin window gets activated, it should look like the key "K" has been pressed since before this window got the focus. It should not first observe "key K is down, release K, press Q". Nor should the Firefox window receive the events "release K, press Q" before it loses focus. For such cases, we need to change the state atomically. To facilitate this, there are two Wayland events and two Wayland requests available:
Whenever the environment changes, say the active window has changed, the compositor should first send an event The mapper must reply to There are some difference between "normal event sending mode" and the "sync mode" which is used between
|
Reserved. |
Here is the protocol in the standard XML format that Wayland protocols use. Compatible with wayland-scanner:
|
Here is a sample C++ program that uses the above protocol to transpose the A and B keys with each other. Or that's what it would do if only the compositor supported it.
|
Reserved. |
Changelog
|
I didn't read all the code parts yet. So far it sounds like a good start :)
That would open interesting possibilities, but I worry if it will be difficult for existing UI-toolkits to adapt to this. Maybe simply sending unicode data would be easier.
This would solve problems. I hope it can be efficiently implemented.
At first I wondered why you bother with layers for this. A protocol that mappers (as well as every other interested application) can subscribe to and then change the configuration accordingly seemed sufficient. But it might make sense if you send information about states (e.g. keys) that are currently triggered, so the mapper knows which keys to release/press. There is also the question of "What is a change to surface?" or rather "What triggers may a mapper want to react to?". So I request that changes to the window title to be part of the protocol, too. There might be other interesting stuff to react to, that is best obtained from wayland. |
I agree that sending Unicode would be better, but the issue is that the current Wayland input protocol (the one that applications use to talk with the compositor) simply expect an XKB map plus scancodes. I think that that was a mistake during the design of the protocol, but it's probably hard to fix now. The workaround would be for the input_mapper module to translate scancodes into Unicode characters, feed those to the mapper scripts, and at the end translate those Unicode characters back to scancodes again. This does have some drawbacks though. First, it may not always be possible to translate from Unicode back to characters. For example, if Unicode gets sent, then "lowercase a" and "uppercase A" become two different characters. What is the compositor supposed to do if an application wants to send "lowercase a" while the user is holding the shift key? None of the keys on the standard XKB layout will correspond to "lowercase a" while shift is held. Should the compositor send some different grapheme (uppercase A) than requested? Should it briefly release and repress shift? Should it modify the XKB layout to add an unconditional "lowercase a" button to it? Second, it does not necessarily make writing scripts easier. For example, if a mapper now wants to press the shift key, it must not only map to the shift event, but it is now also responsible for turning all "lowercase a" graphemes into "uppercase A" graphemes. Similarly, if a script wants to map the key K to Q, it must now apply two maps: "k→q" and "K→Q", otherwise either the lowercase variant or the uppercase variant would get skipped over. (Also, the uppercase-lowercase character pairs depend on your locale. Have fun!) More fundamentally, it adds another layer of abstraction, increasing the difference between how mappers handle input and how applications handle input. Mapper scripts themselves can add this abstraction themselves by using a library, but it is harder to remove that layer of abstraction in case a mapper script really needs fine-grained control, for example it might start fighting against the compositor because this script really wants to use a specific scancode but the compositor decided on a wholly different scancode↔Unicode map. The current solution But more fundamentally, the current Wayland setup is mostly designed to have only a single XKB layout active at once. If different keyboards have different layouts, the compositor is allowed to switch the active map based on which keyboard was last typed on, but that has its obvious drawbacks as well. Libinput doesn't seem to actually do that, so I doubt many compositors go that far either. I think this is another big mistake in the design of the Wayland input protocol. But either way, if we go with the "the XKB layout is a property of the input device", then which layout should the compositor use when multiple devices end up with different layouts? That question could be answered by "pick one arbitrarily" or "pick the biggest one", and then scripts could deal with it by always making sure that they modify all keyboard layouts in the same way, but that is still a non-ideal solution. I could change the protocol to make the active XKB keymap a global property instead of a device-dependent property, but then we'd be further entrenching the current status quo and would actually say that at a Wayland protocol level, there must be a single keymap that applies to all input devices, rather than merely "to keep the implementation simple, the compositors have decided to assume that there is only a single keymap for all input devices." Tl;dr: the Wayland keyboard input protocol is causing trouble no matter from which angle I approach this issue.
Yes, that should be possible. (On a side note, I've read a bit of the source code of libevdev and libxkbcommon, and both of them decide to keep track of the states in a dense array, rather than a sparse one. They can get away with that because there are only a thousand-or-so keys, so at one bit per key that takes like 128 bytes of memory. If we want to keep track of all 2^21 unicode characters, it will have to be done sparsely. But that's an implementation detail.)
The window title seems to be part of the XDG shell extension protocol rather than the core Wayland protocol. I don't suppose we'll see a compositor which will implement input mapping but not XDG shell, but who knows? Maybe some future mobile operating system will use Wayland without XDG Shell. It may be prudent to implement it in such a way that "window title" is an optional addition to this protocol rather than a core feature of it. Relatedly, I think that mappers should only receive *That means that applications that don't care about |
Before I go on to write my post below, I'd like to clear up any confusion on why I decided to include active window changes in the protocol itself rather than expecting scripts to rely on some standard dbus protocol, here are my reasons: So why is changing the active window part of the protocol anyway?You could wonder why mapper scripts are not supposed to just listen to that by means of some dbus event, which would also be viable for most purposes, but integrating it in the protocol has two advantages:
Atomic window changes The user is currently browsing the web and Firefox has the focus. It then decides it wants to ctrl+click some button in Krita, which transferst the focus to Krita and clicks a button in it with a single action. The user would expect that if he has the foot button pressed when he clicks, Krita will interpret it as a ctrl+click. When active window changes are part of the protocol, this can be taken care of. These are rhoughly the steps that happen:
Now, if the mapper script were to update on the basis of a dbus event, it would be likely that Krita already received focus and the click event before dbus managed to reach your mapper and got the ctrl key pressed. This would mean that the first interaction with any program might be interpreted under the keymap that was active for the previous program. When changing the active window is part of the protocol itself however, the compositor will know when all active mapper scripts have updated their state for the program that is about to receive focus, and ensures that the active maps update seamlessly when switching between windows. Deterministic order Suppose that you have Mapper A which maps a foot button to leftshift for Firefox and to leftctrl for Krita. You also have Mapper B at a later layer which maps shift+K to "X" on Firefox and ctrl+K to "Y" on Krita, also blocking the respective modifier keys. When the focus shifts from Firefox to Krita while the foot button and the K key are held, the desired effect is "release X, press Y", which should of course be processed atomically such that Firefox never saw the X key get released, and Krita thinks Y had been pressed since before it got focus. If both mappers update independently due to dbus, it would be possible for Mapper B to update first, resulting in "shift+K" getting passed on to Krita, before Mapper A also updates to make that "ctrl+K" which Mapper A turns into "Y" again. Krita would get to see the following barrage of events: key X is held since before receiving focus. Release X. Press shift. Press K. Release K. Release shift. Press Y. When the compositor is in charge of notifying mapper scripts that the active window has changed, it can ensure that it starts with the lowest-layer mapper, and only inform later-layer mappers about changes when the lower layers have replied |
Anyway, no update to the protocol yet. I'm just writing what I'm currently thinking about. There is one problem that I'm currently having with both the protocol regarding the device properties, and the protocol regarding updating the state based on dbus notifications: the Wayland data types do not include any kind of dynamic type or sum type. With sum type, I mean something like Basically, there are a variety of types that a device property might have:
For compactness in the protocol definition, I might want to have just a single event and request, e.g.
... but that is pretty stupidiously roundabout. If we receive such an event, then libwayland would parse the bytes received from the Wayland socket as semantic data, and that semantic data once again contains plain bytes which would then have to be parsed as semantic data again by the application. It feels like a real abuse of the Wayland protocol. So the other workaround would be to have several events and requests, one for each datatype: Also, that set of functions makes it possible to use different types for device property values, but not yet for keys. So far I've been asuming that device property names should be strings, but are we sure that a property name like "ABS_X_MAX_VALUE" is best encoded as string? I'd prefer for that to be a tuple ("max_axis", ABS_X). Now if I want to be able to use tuples as property names, things get even more complicated... The same problem also shows up for custom update triggers. In case the state update protocol starts because the active window changed, you'd want to send at least a string to the application containing the name of the window that is now active, and possible some more information. But what can we send for custom triggers like a "dbus signal received"? Do those get to send no information whatsoever, or would that also require some way to send a string or int or two strings or whatever along with the custom signals? |
Yet Another Alternative ApproachThe difference between how events are being handled and how state updates are getting handled feels a bit inelegant. I feel like there is some elegant unified solution out there, and I am somehow missing it. Or maybe there's not. Here's what I'm thinking about right now. Suppose state changes become events
When a mapper script is already running, it will receive the above information like it receives all other events. When a mapper script just launches, it is informed about the current state of all persistent events, and thus naturally receives the current properties of all devices. But how do we then ensure proper synchronisation on state changes? Suppose that the compositor sends these event to the mapper:
In case the mapper script wants to pass these events on as-is, it should write the same events to its Wayland socket with the same sequence numbers. If it wants to replace KEY_K it with KEY_Q, it should write a KEY_Q event with the same sequence number as the KEY_K sequence number. If it wants to map KEY_K event to Ctrl+Shift+Q, it should write these events:
Notice how both Ctrl and Shift have the same sequence number, which is an odd number equal to the sequence of KEY_K minus one. This tells the compositor that both of these events happened before the KEY_K event, but after whatever event came before the KEY_K event. In case it wants to send some event after the KEY_K event rather than before it, e.g. mapping K to K+R, it should assign the sequence number of KEY_K to the last event that was generated in response to it, so the output in response to receiving KEY_A+KEY_K as above should be:
In case the mapper wants to drop those events, it must send some drop notification with those sequence numbers:
Lastly, sequence numbers are specific for each mapper script. Different mapper scripts may see different sequence numbers attached to the same events. This is necessary in case a mapper outputs an event with an odd event number, because the compositor guarantees that the next mapper script will only receive events with unique, even sequence numbers. Changing the active window with sequence numbers
The mapper must now respond to these sequence numbers. And unless that script wants to break everything, it should just send these events back as-is. However, it is allowed to send events with sequence number 1129 in between those two events, to indicate that its state has changed when the focus changed. At the end of the last layer, the compositor will discard all events that were sent in between DEACTIVATE_WINDOW and ACTIVATE_WINDOW, considering those events to have been addressed to nothing. Essentially, the previous Filters That are a lot of input events. And since the compositor needs to maintain a strict order between the events, all events need to be fed into and back from all active mapper scrips. With a few scripts, that could mean like a millisecond of delay. To slightly improve efficiency, I propose that we allow mappers to declare filters, where they can choose to ignore certain events. Here are some examples of filters in a provisory scheme that I spent almost 30 seconds of thought on:
Or maybe instead of "ignoring" it should be the other way around: everything is ignored by default, except the events you explicitly unignored. I'll think more about that when I have more time. Anyway, whenever the mapper has told the compositor that it wants to ignore events matching certain criteria, the compositor will assume that the mapper will pass on all those events as-is. Instead of writing those events to the mapper and waiting for the mapper to write an identical event back, the compositor will do that in the mapper's stead, reducing the latency. But why the sequence numbers? Can't the compositor figure out the order of the events based on the order that the mapper wrote them?
Now the first three options may look indistinguishable from an end-user perspective, but it starts to matter in case another ignored event happened in between KEY_A and KEY_C. Suppose that the mapper has announced that it would ignore all KEY_B events, and the user actually pressed the sequence ABC, then what would the output be? Without sequence numbers, the compositor would have no way of knowing whether it has to be XBZ (option 1), BXZ (option 2), or XZB (option 3). This is why sequence numbers are important: as soon as the mapper writes an event with the same sequence number as the KEY_A event back to the compositor, then the compositor knows that KEY_B will follow immediately after that event. Difficulties: Some sort of state abridging will still have to be implemented in the compositor to handle cases of new mapper scripts showing up and disappearing. In case a mapper script adjusts its filters in response to events, strange things may happen if the compositor already wrote subsequent events to the mapper's input socket. If the mapper script ignores new events, it may still receive events it already declared it wanted to ignore. If it un-ignores events, it may receive its subsequent events out of order. If the compositor waits with writing the next event to mapper's sockets until the mapper has responded to the last event, performance will measurably degrade. This could probably be fixed by the mapper script using a wrapper library a la libevdev that automatically reorders events in such cases. |
Looking at the problem with a higher level of abstractionLike before, I'm going to use the term "persistent event" for events of types like EV_KEY, EV_ABS, and other such types which are assumed to maintain their last value until a different value gets reported, whereas "transient events" are events like EV_REL for which the last value is of no importance. I'm going to ignore the existence of transient events for now. Basically, at each layer and each point in time, there is a certain "state", which encompasses the devices that currently exist, which properties those devices have, and what the last seen value of all persistent events is. Basically the following:
(Like I said in my last post, you may or may not want to treat device properties as a special kind of persistent event. Whether or not you do so is irrelevant for this post.) The state can change with time, for example when the uses presses a key on a real keyboard, then the state on layer 0 changes to set that key from a "key up" state to a "key down" state. Layers are basically things that map one state to another state**When a mapper receives a "key down" event, what it really observes is that the state of the previous layer has changed. If it then writes another "key down" event to its output, what it really is doing is changing the state of the current layer. These events are basically a scheme to relay changes of the state without having to copy the entire state each time any part of it changes. Events are basically a compression scheme Sometimes you do have to relay the whole state, for example when a new mapper script starts up, it needs to know the full state of the previous layer. However, even the full state can be relayed by having the mapper script assume a certain default state involving things like "all keys are released, all absolute axes are 0" and then sending a series of events that represent the difference between the default state and the actual state. A change in the active window are basically a single atomic change in the stateNow suppose that the name of the active window was also part of the state somewhere. (Maybe a property of a certain special device.) When the active window changes and doing so causes a key K to be pressed and another key Q to be released, we want a single transition to the state to happen:
And we do not want three different transitions to the state to happen:
Now if you were to express the first way (single change) and second way (three changes) in events, then both of them would correspond to three events, but semantically they mean something different. If there is a single atomic change to the state, then Firefox could be informed the key K was already pressed since before it got focus, but with three subsequent changes Firefox would have to be informed that Q was pressed since before it got focus and was quickly released thereafter. So what did we just learn?
The difference between these two is that in the first case, you do want the UI to actively respond to the events, but in the second case you don't. The fundamental reason that these two look different is because I had been treating each event as a "state update" by itself, and then required additional methods However, instead of seeing each event as a separate state change, it would be much more natural to consider each state change to consist out of any amount of events; the state changes are the things that happen to which the UI should respond, and the events are merely a compact way of communicating state changes. And look—this has already been solved elegantly in the evdev protocol with EV_SYN. EV_SYN is currently used to group events that you want to happen atomically: for example, if you tap a part of the touch screen, then you want the active X and Y coordinate to update together, not one after another. This could be reused for the the active window changing scenario. If you were to send the event that changes the active window in the same EV_SYN report as the keys that got pressed/released due to that change, then the compositor should be able to see that those presses/releases should happen atomically with the change in focus, and ensure that they happen after the last window lost focus and before the new window gains focus. Problem solved? Unlistened devices / ignored events are basically default maps on the stateI have lumped all devices together into a single state because some libinput features like disable-while-typing treat some events on some devices differently based on what other devices are currently doing, demonstrating that there are practical use cases wherein we cannot treat all devices as independent from each other. If two devices emit two events, we need an order in which those events occurred. However, being used to the current evdev API, it is pretty counterintuitive to automatically listen to all events on all devices; that would e.g. increase latency on joysticks even if you only wanted to map keyboard keys. As such, I have been trying to think of a way to declare some kind of filter that says "I only want to be informed of events on these devices." Such a filter is basically a way to tell the compositor:
In this example, the "specified way" could be "any event on any device except these devices." This basically carves a blind spot on the mappers view of the input state, and delegates control over part of the output state back to the compositor. So what did we just learn? Now the reason that this is difficult is because a change in the filters actually changes the blind spots of the application. Usually, events are used to relay changes in the state, but in this case they're used for a completely different purpose: the input events do not reflect "the state changed from this to that", but instead reflect "you didn't know the state but it was this all along". In other words, events are used to encode information about the state, but they do not semantically represent changes in the state. Now I do not have a solution for an easier way to handle the listen/unlisten protocol, but now I at least have more insight in why the protocol ends like a mess every time I try. (Frankly, everything would become simpler if I were to remove the ability to listen to only certain devices and required all mapper scripts to relay all events on all devices, whether they want to or not.) |
Just a quick note: You might have noticed that I'm not very active lately. That's because of a pile of stuff I need to take care of. January will add more to the pile, so I don't expect to have much capacity left for another 4 weeks or so. |
I'm relieved to hear that; I was worried that the whole project was getting slowed down because of my own tardiness. My current thoughts: Device properties If device properties were required to remain static, then it would be easy for a script to just declare "I want to listen to all keyboard devices" and we'd never have to deal with the hassle that ensues when a non-keyboard device changes to a keyboard device or vice versa. It really simplifies many things. In case you want to change the device properties, the same good old method remains: create a virtual device and redirect all events from the old device to the new device. The problem with the old method is that this makes it harder to configure devices in configuration screens, because the device you want to configure is now "virtual device #n" instead of "Dell Keyboard". This problem could be somewhat mitigated by being able to declare hints like "this virtual device originates from that other device", but that does not solve all issues. Keyboard layout If two different keyboards want a different layout, then a certain layer is free to change that active layout property every time it receives an event from each keyboard. E.g. if there is an US keyboard and a German keyboard active, then a layer (either a mapper script or a built-in part of libinput) can set the active layout to "German" whenever the German keyboard presses a key, and set it to "US" whenever the US layout presses a key. Having a global "active keyboard layout" property is better because it lies closer to how the Wayland input protocol actually works, enabling lower level of control over the input. E.g. if the user is dissatisfied with the logic libinput uses for when to switch layout, they will be able to put a mapper script in charge of switching layouts instead of libinput. (What if the US layout and German layout press a key simultaneously? There is no correct answer because the Wayland input protocol is simply not capable of dealing with such situations.) Gestures
The ideal solution would, once again, be to have a modular libinput where an earlier layer does the essential filtering and a later layer does gesture interpretation, but that would require a significant refactor of libinput and the cooperation of the libinput developers. Purpose of the protocol
Purpose #1 could be rhoughly achieved by listening to D-Bus signals. It's not perfect because the switch in the mappings will happen slightly later than the switch in focus, so the first event being sent to the new active application will always use the old mapping instead of the new one. This is mosly relevant if you want to map tap/click events to something else. In practice, it probably won't matter that much for most purposes, especially if hovering the mouse over a window already activates the new mappings for pointers. Purpose #2 could be achieved with a standard protocol for setting the keyboard layout. Purpose #3 is not even achievable with the protocol I'm drafting right now. It is possible on the IME layer, but the IME layer of the Wayland protocol is currently not composable yet. Purpose #4 is as mentioned above. In short, I'm starting to think that I might be looking in completely the wrong direction. I've been trying to find an elegant protocol to map events over Wayland, but in the end I might currently be adding a whole lot of abstraction in order to solve problems that didn't really need all that abstraction, while not solving the actual difficult problems. |
Ok.. now I finally managed to read all of this thread. For now some thoughts:
I already can send scroll events from the mouse to a different window than the one where my keyboard has focus. A possibility to send any event to any window would be nice. But I would not assign it a high priority... Assuming we actually solve it via a wayland-protocol-extension... ..there are a few questions that need to be figured out at some point:
As for sending information about the title: on second thought I think the best solution would be to send a Map<String, String> along with the window change event. The map could be filled with additional Information that either the compositor is configured to send or values sent by the user. That is: the compositor should support a dbus signal that, when triggered would trigger enter_surface(surface=[whatever it was before], focus_type=USER). Mappers could ignore any values that do not apply to them - or even all user events.
Don't worry - I think you've put a lot of good work into this (more than me). Besides we all have limited capacity and doing this kind of thing right needs a lot of concentrated thought.
I agree that the difference of having this as a wayland protocol vs. a DBUS signal probably won't matter that much for most purposes.
For most purposes it might suffice to use Compose as a workaround. Interfacing with an IME would probably be even better.
Assuming I understand these correctly: doing this on the IME layer might be best, but I don't see why it can't be implemented anywhere... sure, it requires to withhold input until you're sure whether it is a hotstrings or not (and yes: no preview), and it might not be a good idea having multiple mappers doing this at the same time... but I don't see how any problems with this can be avoided by any protocol.
Modular libinput would clearly be the way to go.. At least until some programs decide that libinput is not powerful enough and decides to do that part on their own anyway...
Overengineering is a danger that comes with the profession ;P My gut feeling tells me that a double approach might be in order: ask wayland & libinput developers for comments while simultanously going back to the lower level. Or maybe we end up with a mixture of all these.. |
That's because the keyboard and mouse have separate focuses: mouse focus is determined by wl_pointer::enter and wl_pointer::leave, but keyboard focus is determined by wl_keyboard::enter and wl_keyboard::leave.
Since those devices are not handled by Wayland, I suppose the only real answer is to map them on evdev-level the way you used to: by grabbing the event devices and creating a shadow uinput device. That does mean that the joysticks do not get to enjoy the composability benefits that this Wayland protocol is supposed to bring. That could be used as argument to favor an evdev-level solution over a Wayland solution: a Wayland solution is only applicable to keyboards, mice and similar, whereas an evdev-level solution would also benefit joysticks.
I think that the compositor should be allowed to kick out mappers that take an unreasonably long time to process events. When any ordinary GUI or CLI application stops responding, the user will eventually kill it. If a mapper script is blocking the user from sending the input commands to kill that script, we may as well automate the process of killing that mapper script. I think that the protocol should include a way for the compositor to tell the script "I've decided to start ignoring you for whatever reason kthxbai", which may be sent for reasons such as
I do not think that there is much point in standardizing a maximum time delay because that standard would be non-actionable for anyone trying to implement a standard-compliant mapper. For example, imagine that the standard said "the mapper script MUST respond within one second" or "the maximum average response delay MUST be less than 10ms", then what should somebody trying to write a standard-compliant mapper do to ensure compliance with the standards? How long a particular operation takes is highly dependent on the speed of the user's system, how many other processes are competing for the same resources, the mercy of the kernel's scheduler. In the worst case, an otherwise responsive mapper script could gain a latency of several seconds if the kernel decided to swap out some of its memory pages to a hard drive made of spinning rust. If we were to standardize a maximum, then an especially bad implementation could decide to send a dummy response after 0.9s if no proper response has been formulated yet in order to ensure standard-compliance — but that kind of compliance would absolutely not improve the user experience. Any kind of maximum time written into the protocol would restrict the freedom of compositor implementations while being pretty much non-actionable for those who write mapper scripts. The best that a script can do is to just try to handle events as quickly as possible, but that would be a good policy regardless of what the protocol standard says anyway. Therefore, I propose that we do not write any explicit maximum time into the protocol. We should write that scripts are expected to provide timely responses to all events they read (even if their response is just "I discard this event"), but let the compositors decide what the maximum is and how they measure it (maximum delay? average delay? root mean square delay?)
I'm not sure I completely follow this, so could you please clarify this?
Probably best handled at the IME-layer indeed. The problem is that there can currently be only a single IME active at any time and the IME protocol does not seem to be intended to help multiple IME's cooperate, so if somebody were to write a fancy IME that allows for AHK-style hotstrings and whatever, then Japanese and Chinese users would not be able to benefit from it. Then again, thinking some more about it, it may be just conceptually difficult for different IME's to work together, since they both assume they have control over the text field. Furthermore, even if it was possible to simultaneously run some AHK-style mapper together with a Japanese IME, a workflow of "first you type characters, then the IME converts it, THEN the hotstring maps apply" would simply be bad UX. Imagine a hotstring that adds the current date like this:
The second step should be skipped. The best UX involves immediately jumping from step 1 to step 3 using the same UI as the rest of the IME uses. In fact, any Japanese IME worth its salt should support custom user-defined dictionary entries; it is traditionally used to make it possible to type names, but I think you could abuse it to at least map some arbitrary fixed input to some arbitrary fixed output. (Also, state of the Linux-compatible Japanese IME's is currently so bad that the lack of hotstrings is nowhere near the biggest issue.) In short, maybe we do not actually need composability of multiple IME's. Maybe we just need an English/Non-CJK language IME that supports hotstrings? I'm actually starting to feel inspired to try writing one. I'm not sure if it will actually be useful to anyone, but the experience of actually implementing an IME may turn out to be valuable. I'll look into this if I find some time.
Good idea. I think we should resume the plan of improving the Kernel API for shadowing input devices. If nothing else, it will still be handy for composably mapping joysticks even after we get a Wayland API. |
Ok, point taken.
What about a man-in-the-middle type IME that forwards things between one real active IME and the application, and only intervenes to do some extra transformations? I don't know if that is possible, but if so it might be sufficient to enable those features for any IME. So one of the next steps is to contact the wayland/libinput devs.. I guess the right place would either be the wayland mailing list or the wayland issue tracker on gitlab.freedesktop.org. The initial message should give an overview of the problem (multiple input mappers are not working together well, see #3) as well the possible solutions we discussed so far (this issue and the multitude of posts in #2 regarding evdev-level solutions). It should probably link to #3 as well as this thread. Or maybe the whole discussion (including #2)? Which important parts am I forgetting? And: am I overthinking this? While looking at the wayland issue tracker I came across this WIP-MR: https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/62 |
Okay, I understand it now. I don't know what the performance implementations would be, but it does seem to make sense: compositors can volunteer to send any additional information they want without requiring a new protocol version. Mappers could probe which information the current compositor sends, and if it knows that the current compositor sends a particular piece of information X (e.g. It may be prudent to include an event ID or surface ID in both the Even if there is a 0% dropout rate on both the D-Bus and the Wayland socket, there is a bit of a tricky situation right at the startup of the script since there will be a slight delay between the point at which you connect to the wayland socket and the point at which you connect to the D-Bus socket. If right after the script starts it for example receives both a D-Bus event and a
An intermediate layer for the IME already exists. The IME does not usually talk to the compositor directly, but throught an intermediary IME framework such as Fcitx 5 or IBus. In fact, there are so many intermediary's available that the Arch Wiki needs a whole table to explain which IME can work with which framework. Fcitx 5 is an "input method framework with extension support" [source]. In particular, it already appears to have a plugin QuickPhrase that functions like a lightweight AutoHotkey. This means that we may not really need a composable Wayland protocol for IME's, since we can just write Fcitx 5 addons or contribute patches to make your favourite IME framework even more extensible than it already is.
I think the three most important things are to post are: (1) what is the problem that needs to be solved, (2) why it must be solved, (3) why we think Wayland is the right place to solve that problem. We should of course link to the disscussion we had here, but I do not expect many (any?) of them to read through everything we've posted here. I think too many ideas may have flown around to accurately summare them in the post we make at the Wayland channels, but it may be worth to provide a short summary of the most major roadblocks we've encountered while trying to solve the problem and post that on the Wayland channels. (And "provide a short summary" probably means "I should write a summary", and "should write a summary" probably means "should've written a summary during the past week". I apologize again for my decreased amount of engagement with this issue and thank @kermitfrog for his continued effort.)
It's a real pity that this never got merged, because a lot of the issues we're dealing with would become solvable if Wayland didn't require us to have only a single keyboard with a single layout. The good news is that I don't see any hard "this won't happen" comments on the proposed protocol. I wonder if it is possible to make the proposal get taken seriously again? |
I've compiled a list of problems that are still difficult or finicky to solve, or "this is why I still do not have a full proposal for a new protocol yet". The basic idea is that we have a set of mapper scripts, each of which processes the events generated by the previous mapper script: Now this is just a simple concept. More involved schemes have been proposed already, but the basic idea of "one mapper script reads the events generated by the previous mapper script" still feels like a good idea. However, there are still problems for which we have no solution or an overly complicated solution, or otherwise points that require consideration when designing a protocol. Here's a compressed list of problems that are not easy to solve: Monolithic libinput It is important for users to be able to feed the output of one mapper script into another. As such, the best model is probably to have the events from the physical event input devices flow first into Mapper 1, then to Mapper 2, etc., and finally to the application. At some point between "event device" and "application", the events have to go through libinput. However, libinput currently doesn't fit well before the mapper scripts nor after the mapper scripts. The first issue is that libinput does several different things at the same time. It simultaneously applies fundamental filters such as button debouncing and palm detection, which you would probably want to apply before doing any kind of event mapping, as well as high-level filters like gesture detection, which you'd probably want to have applied after any kind of event mapping. Ideally, libinput would be refactored to be more modular, so it becomes possible to insert a script after the fundamental filters like button debouncing yet before gesture detection. Gesture support In case a script wants to handle a gesture similar to a libinput gesture, then we may also need a way to disable a particular gesture detection from libinput. Event format Somewhere in the chain from "event device" to "application", events will have to be converted from the evdev format to the libinput format, but it is not clear whether that should happen at the start, the end, or somewhere in the middle. Having only a single keyboard layout active at any time I suppose that this situation was deemed acceptable since it was deemed unlikely that a user would ever actually press keys on multiple keyboards at the same time, but when virtual input devices become involved, the situation becomes pricklier. Does each virtual input device get its own keyboard layout, or are they supposed to send scancodes according to whatever is the currently active layout of the system? If each device gets its own layout, then that could lead to problematic switching of layouts, for example a foot button trying to execute the sequence Ctrl+S on a QWERTY layout, while the user is typing something with his hands on a AZERTY layout. If each virtual device mimics the currently active layout, then the implementation of both mappers and the mapping protocol, especially if we want to properly handle cases where a virtual keyboard is holding a key while the active keyboard layout changes. Mapping to keys not on the active keyboard layout Synchronisation between multiple input devices Within each individual input device, the compositor should obviously maintain the order of the events, e.g. if "ABC" gets typed, then the events A, B, and C should be relayed in that order, and not in the order "A, C, B". We should consider whether we also want to consistently maintain a global order of events. For example, if the combined event stream from the keyboard and mouse is "[ctrl] [left click]", would it be permissible for the compositor to reorder that to "[left click] [ctrl]"? If not, then we do need additional synchronisation primitives. For example, if a script has elected to map the events of the keyboard but not the mouse, and the user does "[ctrl] [left click]", then the event "[left click]" cannot be processed further until the mapper script has decided how to handle the [ctrl] event, so the mapper needs some way to tell the compositor "I have handled the [ctrl] event". On a side note, for some features as palm detection, it is important for scripts to be able to listen to multiple devices simultaneously. (Many parts of the protocol would become simpler if all mapper scripts had to process all events from all input devices, whether they care about those events or not. But that would result in lower performance.) Active-window-dependent mappings There are various models that can be used for this, such as informing scripts that the window has changed, running a separate instance of each script for each window, or using a different virtual event device for each window. The first option probably has the least overhead. It does however require us to figure out how to communicate changes in the active window to mapper scripts. This could be done either over the Wayland protocol itself, or over another communication layer like D-Bus. Using D-Bus has the advantage that there is more freedom in what kind of information the compositor wants to transmit, but we will still need some kind synchronisation primitives in the Wayland protocol to make sure that the state of the mapper scripts change atomically with which window (or surface) currently has focus. Ideally, we'd also have a way for mapper scripts to change their current mapping without generating input events upon a change in the active window, e.g. if the key "A" is being held, the map "A→X" is used for the current window, and the focus changes to a window where the map "A→Y" should apply, then the window that receives focus should ideally see the key Y as being pressed in the wl_keyboard::enter::key event, rather than seeing X as being pressed and then receiving "release X, press Y" events. |
Ah, good to know :)
Hm.. yes, desynchronisation could be a problem.. But what if the mapper ignores
(1) & (2): absolutely yes!
Really? My impression is that I'm the one blocking this with weeks of inactivity in between, while you're quickly reacting. In the end, we all have the non-digital-life that tends to cause interruptions ;) In any case I'm planning to put a stronger focus on this, so we might soon reach that consensus that we once wanted to have before escalating it to all the other relevant people.
And here some even more compressed comments (based on the goal to make things as simply as possible).
I wonder if it might suffice to simply be able to call individual parts of libinput (e.g. "debounce this button") and maybe configure it to disable some features for specific devices. That shouldn't be too hard to do (I hope).
As long as we can somehow convert them to evdev events we should be able to feed it back to an evdev-based mapper. That should take care of most use cases. For the rest (e.g. transforming gestures), mappers need to be adapted anyway.
Would be nice, but getting support for multiple layouts at the same time might take a while..
Arbitary codes can be handled by the compose workaround.
That is something I need to think about more.. but in the best case it might not actually matter, as the processing should be faster than the user triggers the events. Cases I imagine:
|
Overview
So I have been thinking about a new protocol for a while now, and have written down quite a bit about it. It is still pretty half-baked, insufficiently implemented, and several parts are not even implemented yet, but I thought it would be a good idea to post my current progress.
This protocol focuses entirely on how the communication between the Wayland compositor and a mapper script running as Wayland client could work. Libinput may or may not be able to be refactored to use the same structure internally as the compositor is supposed to use according to this protocol, but I haven't checked the feasibility of that yet; this is not a proposal on how libinput should be refactored.
There are several changes that still need to be made to this protocol, so I've created a new issue for it where I can update the top few posts as I progress.
The text was updated successfully, but these errors were encountered: