Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device identity #614

Open
wants to merge 69 commits into
base: main
Choose a base branch
from
Open

device identity #614

wants to merge 69 commits into from

Conversation

piotrbartman
Copy link
Member

@marmarek
Copy link
Member

Can you drop commits adding gui, instead of adding+removing gui? the PR is quite big already...

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

Attention: Patch coverage is 74.94253% with 218 lines in your changes missing coverage. Please review.

Project coverage is 69.39%. Comparing base (7b755c7) to head (727133f).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
qubes/device_protocol.py 74.12% 104 Missing ⚠️
qubes/ext/pci.py 64.70% 30 Missing ⚠️
qubes/ext/admin.py 22.22% 28 Missing ⚠️
qubes/ext/block.py 84.72% 22 Missing ⚠️
qubes/devices.py 82.75% 10 Missing ⚠️
qubes/vm/__init__.py 62.96% 10 Missing ⚠️
qubes/ext/utils.py 90.66% 7 Missing ⚠️
qubes/vm/qubesvm.py 0.00% 4 Missing ⚠️
qubes/api/admin.py 92.30% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #614      +/-   ##
==========================================
+ Coverage   69.32%   69.39%   +0.07%     
==========================================
  Files          58       58              
  Lines       11993    12388     +395     
==========================================
+ Hits         8314     8597     +283     
- Misses       3679     3791     +112     
Flag Coverage Δ
unittests 69.39% <74.94%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@piotrbartman piotrbartman marked this pull request as ready for review August 30, 2024 06:09
Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing noticed so far (not a full review yet)

Comment on lines 143 to 161
def confirm_device_attachment(device, frontends) -> str:
try:
# pylint: disable=consider-using-with
proc = subprocess.Popen(
["attach-confirm", device.backend_domain.name,
device.port_id, device.description,
*[f.name for f in frontends.keys()]],
stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
(target_name, _) = proc.communicate()
return target_name.decode()
except Exception as exc:
print("attach-confirm", exc, file=sys.stderr)
return ""
Copy link
Member

@marmarek marmarek Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few problems with this function:

  1. It must be async, since it may block for some time, and it's not acceptable to block the whole qubesd for this time. In fact, attach-confirm probably won't work this way at all if it tries to talk to qubesd, since it's blocked on waiting for attach-confirm...
  2. subprocess.Popen -> asyncio variant
  3. The tool name is IMO too generic for a tool in a common /usr/bin/
  4. The tool belongs to desktop-linux-manager repo, which looks like a layering violation - dom0 code should also work without any of the GUI frontends installed in dom0.
  5. Extension of the above: this also will need adjustment to the GUI domain threat model: verify the response is one of allowed ones (on the frontends list?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the code of attach-confirm already is async. Maybe simply put its code here, instead of calling external program? It means you will need to keep the params dict format in sync, but changes there needs to be done in compatible way anyway (due to the GUI domain case, where both ends may be updated independently). Plus, you won't need to make external get_system_info() call, as by running inside qubesd you already have all the info here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The tool belongs to desktop-linux-manager repo, which looks like a layering violation - dom0 code should also work without any of the GUI frontends installed in dom0.

I agree that this to some level introduces a dependency of the lower layer on the upper one, and that's why I decided to implement it as an independent program. If we can think of another way to confirm device attaching, it should be easy to implement. Currently, however, the GUI is required to implement ask-to-attach, and if the package is not available, such an assignment will simply be ignored. So the implementation is now: for ask-to-attach, ask the user for confirmation if possible, otherwise ignore the assignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sys-gui case, since this is socket protocol already, it's very easy to pipe it to a qrexec service. As said above, I'd prefer the interface between those two packages be "send this to a socket, read response in this format", instead of "call this executable". It will allow adding qrexec support without needing to still have the GUI-related package installed in dom0.

And yes, for now the approach "deny if confirmation not available" is fine.

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fuller review of this part. I haven't done more testing since last time.
And also, it got few conflicts in the meantime...

There needs to be a documentation somewhere about API changes (compared to R4.2). It should include the following:

  1. API for device class extensions (like the one for mic or usb)
  2. Extension controlling Admin API access to devices (extension that handles admin-permission:... events)
  3. Usage of the client part (that's more relevant for the core-admin-client part)
  4. Changes to qrexec methods (in case somebody was using Admin API directly, or has alternative client implementation)

It doesn't need to be very detailed, more like a checklist of changes to look into when updating 3rd-party code interacting with devices.

f"when expected port: {expected.port_id}.")
properties.pop('port_id', None)

if expected.devclass == 'peripheral':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this special value? When denying attaching any device?

qubes/device_protocol.py Outdated Show resolved Hide resolved
qubes/device_protocol.py Outdated Show resolved Hide resolved
qubes/device_protocol.py Outdated Show resolved Hide resolved
qubes/device_protocol.py Outdated Show resolved Hide resolved
qubes/devices.py Show resolved Hide resolved
qubes/ext/admin.py Outdated Show resolved Hide resolved
self.notify_auto_attached(vm, device, options)
device = assignment.device
if assignment.mode.value == "ask-to-attach":
if vm.name != confirm_device_attachment(device, {vm: assignment}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted before, this needs to be async

if len(frontends) > 1:
# unique
device = tuple(frontends.values())[0].device
target_name = confirm_device_attachment(device, frontends)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this needs to be async too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be also Admin API methods for reading/writing deny list. If nothing more fancy, then at least similar to qrexec policy ones that:

  1. Verify syntax when writing
  2. Allow race-free edits (read returns a "token" being hash of the file, and write gets the token to verify if nobody changed it in the meantime).

If you prefer to add it in a separate PR, convert this note to an issue.

@marmarek
Copy link
Member

As for the API changes, looks good. Should it be new file in doc/ here (linked from index and doc/qubes-devices.rst)? Or maybe added to the existing doc/qubes-devices.rst directly?

Just minor thing:

admin.vm.device.{endpoint}.Set.persistent - renamed too, and {endpoint} remained

Comment on lines +205 to +206
while not socked_call.done():
await asyncio.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this loop? the await socked_call below should be enough, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't wait for the task to complete, using await in this case results in tasks being destroyed (including resolve_conflicts_and_attach) so attachment is skipped. I'm not entirely sure what causes this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I'll try to find out what is going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried and it works for me just fine without this part. I've tried by assigning block device with ask, and then connecting and disconnecting it - I got the prompt and when accepted the device got attached.

BTW, attaching device could use some log entry, especially when automatic or with ask (so you have some trail to which qube it got attached).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got once "Task was destroyed but it is pending". Skipping create task (ask_response = await call_socket_service(...))helped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my case skipping creating task always ended up with Task was destroyed but it is pending

"default_target": front_names[0] if number_of_targets == 1 else "",
"icons": {
(
dom.name if dom.klass != "DispVM" else f"@dispvm:{dom.name}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be getattr(dom, "template_for_dispvms", False) check instead? DispVM class is already running one, not a template to create new one from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's for running DispVMs to indicate in the vm list which vms are disposible

If a connected device has multiple assignments to different `frontend_domain`
instances, the user will be asked to choose which domain connect the device to.
If no GUI client is available, the device will not be connected to any domain.
beacon how to get mev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cat on keyboard pressed "paste"? ;)

@qubesos-bot
Copy link

qubesos-bot commented Oct 26, 2024

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024111214-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024091704-4.3&flavor=update

  • system_tests_extra

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      AssertionError: 12.919985863592906 not less than 2.0
  • system_tests_usbproxy

  • system_tests_devices

    • TC_00_List_whonix-workstation-17: test_001_list_loop_mounted (failure)
      AssertionError: Device test-inst-vm:loop0::0 (/tmp/test.img) should...
  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...
  • system_tests_gui_tools@hw7

    • qubesmanager_vmsettings: unnamed test (unknown)
    • qubesmanager_vmsettings: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...
  • system_tests_gui_tools

    • qubesmanager_vmsettings: unnamed test (unknown)
    • qubesmanager_vmsettings: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...

Failed tests

63 failures
  • system_tests_extra

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      AssertionError: 12.919985863592906 not less than 2.0
  • system_tests_usbproxy

  • system_tests_devices

    • TC_00_List_whonix-workstation-17: test_001_list_loop_mounted (failure)
      AssertionError: Device test-inst-vm:loop0::0 (/tmp/test.img) should...
  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...
  • system_tests_basic_vm_qrexec_gui_zfs

    • switch_pool: Failed (test died)
      # Test died: command 'dnf install -y ./zfs-release.rpm' failed at /...
  • system_tests_gui_tools@hw7

    • qubesmanager_vmsettings: unnamed test (unknown)
    • qubesmanager_vmsettings: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...
  • system_tests_gui_tools

    • qubesmanager_vmsettings: unnamed test (unknown)
    • qubesmanager_vmsettings: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/112766#dependencies

201 fixed

Unstable tests

  • system_tests_suspend

    suspend/ (1/5 times with errors)
    suspend/Failed (1/5 times with errors)
    • job 115081 # Test died: no candidate needle with tag(s) 'xscreensaver-prompt' ...
    suspend/wait_serial (1/5 times with errors)
    • job 115081 # wait_serial expected: qr/2E8vz-\d+-/...
  • system_tests_basic_vm_qrexec_gui

    TC_20_NonAudio_whonix-workstation-17/test_140_qrexec_filecopy_unsafe_name (1/5 times with errors)
    • job 115635 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_pvgrub_salt_storage

    TC_41_HVMGrub_debian-12-xfce/test_000_standalone_vm (1/5 times with errors)
    • job 115648 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    TC_41_HVMGrub_fedora-40-xfce/test_000_standalone_vm (2/5 times with errors)
    • job 114628 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115648 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    TC_41_HVMGrub_debian-12-xfce/test_010_template_based_vm (1/5 times with errors)
    • job 115648 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    TC_41_HVMGrub_fedora-40-xfce/test_010_template_based_vm (3/5 times with errors)
    • job 114628 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115078 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115648 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
  • system_tests_extra

    TC_00_QVCTest_whonix-workstation-17/test_020_webcam (1/5 times with errors)
    • job 115072 AssertionError: 'qubes-video-companion webcam' exited early (0): b'...
  • system_tests_usbproxy

    TC_20_USBProxy_core3_fedora-40-xfce/test_070_attach_not_installed_front (1/5 times with errors)
    • job 117582 NameError: name 'santizied_stderr' is not defined
  • system_tests_qrexec

    TC_00_Qrexec_fedora-40-xfce/test_065_qrexec_exit_code_vm (1/5 times with errors)
    • job 115649 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_Qrexec_fedora-40-xfce/test_080_qrexec_service_argument_allow_default (1/5 times with errors)
    • job 115649 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_network_ipv6

    VmIPv6Networking_fedora-40-xfce/test_520_ipv6_simple_proxyvm_nm (1/5 times with errors)
    • job 115076 AssertionError: 1 != 0 : nm-applet window not found
  • system_tests_network_updates

    VmUpdates_fedora-40-xfce/test_000_simple_update (1/5 times with errors)
    • job 116867 AssertionError: 1 not found in [0, 100] : dnf clean all; dnf check-...
    TC_10_QvmTemplate_whonix-gateway-17/test_000_template_list (1/5 times with errors)
    • job 115077 qvm-template: error: No matching templates to list
    VmUpdates_debian-12-xfce/test_020_updates_available_notification (1/5 times with errors)
    • job 117610 subprocess.CalledProcessError: Command '/usr/lib/qubes/upgrades-sta...
    VmUpdates_debian-12-xfce/test_120_updates_available_notification_qubes_vm_update (1/5 times with errors)
    • job 115077 subprocess.CalledProcessError: Command '/usr/lib/qubes/upgrades-sta...
    VmUpdates_debian-12-xfce/test_121_updates_available_notification_qubes_vm_update_cli (1/5 times with errors)
    • job 116867 subprocess.CalledProcessError: Command '/usr/lib/qubes/upgrades-sta...
  • system_tests_audio

    TC_20_AudioVM_Pulse_fedora-40-xfce/test_223_audio_play_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_224_audio_rec_muted_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_224_audio_rec_muted_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_225_audio_rec_unmuted_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_225_audio_rec_unmuted_hvm (2/5 times with errors)
    • job 115053 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_228_audio_rec_unmuted_pipewire (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.41970521541950...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_250_audio_playback_audiovm_pipewire (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.06471655328798...
    TC_20_AudioVM_PipeWire_debian-12-xfce/test_251_audio_playback_audiovm_pipewire_late_start (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.2878231292517,...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_251_audio_playback_audiovm_pipewire_late_start (1/5 times with errors)
    • job 115623 AssertionError: too short audio, expected 10s, got 9.34507936507936...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_252_audio_playback_audiovm_switch_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_fedora-40-xfce/test_252_audio_playback_audiovm_switch_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_252_audio_playback_audiovm_switch_hvm (2/5 times with errors)
    • job 115053 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_PipeWire_debian-12-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 115623 AssertionError: too short audio, expected 10s, got 9.05353741496598...
  • system_tests_basic_vm_qrexec_gui_btrfs

    TC_30_Gui_daemon/test_002_clipboard_300k (1/5 times with errors)
    • job 116856 : Clipboard copy operation failed - content...
  • system_tests_basic_vm_qrexec_gui_ext4

    TC_20_NonAudio_debian-12-xfce-pool/test_105_qrexec_filemove (1/5 times with errors)
    • job 115067 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_NonAudio_whonix-gateway-17-pool/test_105_qrexec_filemove (1/5 times with errors)
    • job 115067 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_NonAudio_whonix-workstation-17-pool/test_130_qrexec_filemove_disk_full (1/5 times with errors)
    • job 115067 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_NonAudio_debian-12-xfce-pool/test_141_qrexec_filecopy_unsafe_symlink (1/5 times with errors)
    • job 115067 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_basic_vm_qrexec_gui@hw1

    TC_20_NonAudio_whonix-workstation-17/test_140_qrexec_filecopy_unsafe_name (1/5 times with errors)
    • job 115635 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_suspend@hw1

    suspend/ (1/5 times with errors)
    suspend/Failed (1/5 times with errors)
    • job 115081 # Test died: no candidate needle with tag(s) 'xscreensaver-prompt' ...
    suspend/wait_serial (1/5 times with errors)
    • job 115081 # wait_serial expected: qr/2E8vz-\d+-/...
  • system_tests_audio@hw1

    TC_20_AudioVM_Pulse_fedora-40-xfce/test_223_audio_play_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_224_audio_rec_muted_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_224_audio_rec_muted_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_225_audio_rec_unmuted_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_225_audio_rec_unmuted_hvm (2/5 times with errors)
    • job 115053 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_228_audio_rec_unmuted_pipewire (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.41970521541950...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_250_audio_playback_audiovm_pipewire (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.06471655328798...
    TC_20_AudioVM_PipeWire_debian-12-xfce/test_251_audio_playback_audiovm_pipewire_late_start (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 9.2878231292517,...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_251_audio_playback_audiovm_pipewire_late_start (1/5 times with errors)
    • job 115623 AssertionError: too short audio, expected 10s, got 9.34507936507936...
    TC_20_AudioVM_Pulse_debian-12-xfce/test_252_audio_playback_audiovm_switch_hvm (1/5 times with errors)
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_fedora-40-xfce/test_252_audio_playback_audiovm_switch_hvm (2/5 times with errors)
    • job 115053 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_Pulse_whonix-workstation-17/test_252_audio_playback_audiovm_switch_hvm (2/5 times with errors)
    • job 115053 qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
    • job 115623 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_20_AudioVM_PipeWire_debian-12-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 115053 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    TC_20_AudioVM_PipeWire_whonix-workstation-17/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 115623 AssertionError: too short audio, expected 10s, got 9.05353741496598...

@marmarek
Copy link
Member

marmarek commented Oct 26, 2024

First start of sys-usb fails:

Oct 26 22:49:26 dom0 kernel: xhci_hcd 0000:05:00.3: USB bus 1 deregistered
Oct 26 22:49:26 dom0 kernel: pciback 0000:05:00.3: xen_pciback: seizing device
Oct 26 22:49:26 dom0 kernel: xen: registering gsi 37 triggering 0 polarity 1
Oct 26 22:49:26 dom0 kernel: Already setup the GSI :37
Oct 26 22:49:26 dom0 kernel: xhci_hcd 0000:05:00.4: remove, state 4
Oct 26 22:49:26 dom0 kernel: usb usb4: USB disconnect, device number 1
Oct 26 22:49:26 dom0 kernel: xhci_hcd 0000:05:00.4: USB bus 4 deregistered
Oct 26 22:49:26 dom0 kernel: xhci_hcd 0000:05:00.4: remove, state 1
Oct 26 22:49:26 dom0 kernel: usb usb3: USB disconnect, device number 1
Oct 26 22:49:26 dom0 kernel: usb 3-3: USB disconnect, device number 2
Oct 26 22:49:26 dom0 kernel: audit: type=1137 audit(1729982966.997:111): pid=2308 uid=0 auid=4294967295 ses=4294967295 msg='op="removed-device" device="/devices/pci0000:00/0000:00:08.1/0000:05:00.4/usb4" device_rule=616C6C6F7720696420316436623A30303033206E616D652022784843492048>
Oct 26 22:49:26 dom0 audit[2308]: USER_DEVICE pid=2308 uid=0 auid=4294967295 ses=4294967295 msg='op="removed-device" device="/devices/pci0000:00/0000:00:08.1/0000:05:00.4/usb4" device_rule=616C6C6F7720696420316436623A30303033206E616D6520227848434920486F737420436F6E74726F6C6C657>
Oct 26 22:49:27 dom0 audit[2308]: USER_DEVICE pid=2308 uid=0 auid=4294967295 ses=4294967295 msg='op="removed-device" device="/devices/pci0000:00/0000:00:08.1/0000:05:00.4/usb3/3-3" device_rule=616C6C6F7720696420316436623A30313034206E616D652022436F6D706F73697465204B564D204465766>
Oct 26 22:49:27 dom0 audit[2308]: USER_DEVICE pid=2308 uid=0 auid=4294967295 ses=4294967295 msg='op="removed-device" device="/devices/pci0000:00/0000:00:08.1/0000:05:00.4/usb3/3-4" device_rule=626C6F636B20696420303430383A35333433206E616D65202248502048442043616D65726122207669612>
Oct 26 22:49:27 dom0 audit[2308]: USER_DEVICE pid=2308 uid=0 auid=4294967295 ses=4294967295 msg='op="removed-device" device="/devices/pci0000:00/0000:00:08.1/0000:05:00.4/usb3" device_rule=616C6C6F7720696420316436623A30303032206E616D6520227848434920486F737420436F6E74726F6C6C657>
Oct 26 22:49:27 dom0 kernel: usb 3-4: USB disconnect, device number 3
Oct 26 22:49:27 dom0 kernel: xhci_hcd 0000:05:00.4: USB bus 3 deregistered
Oct 26 22:49:27 dom0 kernel: pciback 0000:05:00.4: xen_pciback: seizing device
Oct 26 22:49:27 dom0 kernel: xen: registering gsi 38 triggering 0 polarity 1
Oct 26 22:49:27 dom0 kernel: Already setup the GSI :38
Oct 26 22:49:27 dom0 qubesd[2321]: vm.sys-usb: Start failed: Node device not found: no node device with matching name 'usb_usb3'
Oct 26 22:49:27 dom0 qubesd[2321]: vm.sys-usb: start failed
Oct 26 22:49:27 dom0 qubesd[2321]: Traceback (most recent call last):
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/api/admin.py", line 868, in vm_start
Oct 26 22:49:27 dom0 qubesd[2321]:     await self.dest.start()
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 253, in start
Oct 26 22:49:27 dom0 qubesd[2321]:     await super().start(**kwargs)
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/vm/qubesvm.py", line 1184, in start
Oct 26 22:49:27 dom0 qubesd[2321]:     for device in ass.devices:
Oct 26 22:49:27 dom0 qubesd[2321]:                   ^^^^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/device_protocol.py", line 1285, in devices
Oct 26 22:49:27 dom0 qubesd[2321]:     for dev in self.backend_domain.devices[self.devclass]:
Oct 26 22:49:27 dom0 qubesd[2321]:                ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/devices.py", line 438, in get_exposed_devices
Oct 26 22:49:27 dom0 qubesd[2321]:     yield from self._vm.fire_event("device-list:" + self._bus)
Oct 26 22:49:27 dom0 qubesd[2321]:                ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/events.py", line 195, in fire_event
Oct 26 22:49:27 dom0 qubesd[2321]:     sync_effects, async_effects = self._fire_event(event, kwargs,
Oct 26 22:49:27 dom0 qubesd[2321]:                                   ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:         pre_event=pre_event)
Oct 26 22:49:27 dom0 qubesd[2321]:         ^^^^^^^^^^^^^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/events.py", line 168, in _fire_event
Oct 26 22:49:27 dom0 qubesd[2321]:     effects.extend(effect)
Oct 26 22:49:27 dom0 qubesd[2321]:     ~~~~~~~~~~~~~~^^^^^^^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib/python3.13/site-packages/qubes/ext/pci.py", line 358, in on_device_list_pci
Oct 26 22:49:27 dom0 qubesd[2321]:     if "pci" not in dev.listCaps():
Oct 26 22:49:27 dom0 qubesd[2321]:                     ~~~~~~~~~~~~^^
Oct 26 22:49:27 dom0 qubesd[2321]:   File "/usr/lib64/python3.13/site-packages/libvirt.py", line 6624, in listCaps
Oct 26 22:49:27 dom0 qubesd[2321]:     raise libvirtError('virNodeDeviceListCaps() failed')
Oct 26 22:49:27 dom0 qubesd[2321]: libvirt.libvirtError: Node device not found: no node device with matching name 'usb_usb3'
Oct 26 22:49:27 dom0 qvm-start[3421]: Error: Start failed: Node device not found: no node device with matching name 'usb_usb3', see /var/log/libvirt/libxl/libxl-driver.log for details

Retrying sys-usb start works.

This one is weird, but I think it's related to some caching, maybe on libvirt side? See the USB controllers were disconnected from dom0 just above, yet "usb_usb3" was still listed there. I think the cleanest way is to catch this exception in dev.listCaps() call and skip the device (following logic as if caps can't be listed, it isn't "pci"). Alternatively maybe there is some cache flush to be done, but that still may be racy (if you have two usbvms for example)...

nodedev list

taken before starting sys-usb

root@dom0:~# virsh -c xen  nodedev-list
block_sda_SanDisk_SSD_PLUS_240GB_22106H803526
computer
drm_card0
drm_renderD128
net_lo_00_00_00_00_00_00
pci_0000_00_00_0
pci_0000_00_00_2
pci_0000_00_01_0
pci_0000_00_01_2
pci_0000_00_01_3
pci_0000_00_02_0
pci_0000_00_02_1
pci_0000_00_02_4
pci_0000_00_08_0
pci_0000_00_08_1
pci_0000_00_08_2
pci_0000_00_14_0
pci_0000_00_14_3
pci_0000_00_18_0
pci_0000_00_18_1
pci_0000_00_18_2
pci_0000_00_18_3
pci_0000_00_18_4
pci_0000_00_18_5
pci_0000_00_18_6
pci_0000_00_18_7
pci_0000_01_00_0
pci_0000_02_00_0
pci_0000_03_00_0
pci_0000_04_00_0
pci_0000_04_00_1
pci_0000_04_00_2
pci_0000_04_00_3
pci_0000_05_00_0
pci_0000_05_00_1
pci_0000_05_00_2
pci_0000_05_00_3
pci_0000_05_00_4
pci_0000_05_00_5
pci_0000_05_00_6
pci_0000_06_00_0
pci_0000_06_00_1
scsi_2_0_0_0
scsi_generic_sg0
scsi_host0
scsi_host1
scsi_host2
scsi_host3
scsi_target2_0_0
usb_1_0_1_0
usb_1_3
usb_1_4
usb_2_0_1_0
usb_3_0_1_0
usb_3_3
usb_3_3_1_0
usb_3_3_1_1
usb_3_4
usb_4_0_1_0
usb_usb1
usb_usb2
usb_usb3
usb_usb4

@marmarek
Copy link
Member

Looking at the traceback a bit closer, here:

Oct 26 22:49:27 dom0 qubesd[2321]: for device in ass.devices:

it tries to get device for a specific assignment (so, it knows which device it wants). Yet it results in

Oct 26 22:49:27 dom0 qubesd[2321]: File "/usr/lib/python3.13/site-packages/qubes/devices.py", line 438, in get_exposed_devices
Oct 26 22:49:27 dom0 qubesd[2321]: yield from self._vm.fire_event("device-list:" + self._bus)

... listing all of them. This sounds quite bad performance-wise (quadratic number of calls to whatever device backend it has)

@piotrbartman
Copy link
Member Author

... listing all of them. This sounds quite bad performance-wise (quadratic number of calls to whatever device backend it has)

This is not entirely true. The property: devices searches all devices only if the assignment requires it (i.e. it included * as port_id). I guess it not a case here, so the last commit should help.

@marmarek
Copy link
Member

Ok, so remaining issue is about the unit tests, for example:

  File "templates/libvirt/xen.xml", line 160, in block 'devices'
    {% for device in assignment.devices %}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
jinja2.exceptions.UndefinedError: 'qubes.device_protocol.DeviceAssignment object' has no attribute 'devices'

And there are also 2 minor pylint compains

@fepitre
Copy link
Member

fepitre commented Oct 30, 2024

Please note that from our previous diagnosis, mypy complains with your fix * for positional argument:

qubes/ext/utils.py:51: error: Too many positional arguments for "DeviceInfo"  [misc]
qubes/ext/utils.py:59: error: Too many positional arguments for "DeviceInfo"  [misc]
qubes/ext/utils.py:62: error: Too many positional arguments for "DeviceInfo"  [misc]
qubes/ext/utils.py:65: error: Too many positional arguments for "DeviceInfo"  [misc]
Found 4 errors in 1 file (checked 60 source files)

@marmarek
Copy link
Member

Note to self: check what happens when starting VM with PCI device that got removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants