Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixup! drm/asahi: Add the Asahi driver for Apple AGX GPUs #135

Open
wants to merge 562 commits into
base: gpu/rust-wip
Choose a base branch
from

Conversation

jannau
Copy link
Member

@jannau jannau commented Apr 11, 2023

No description provided.

ojeda and others added 30 commits March 5, 2023 23:16
Upgrade the Rust version from 1.62.0 to 1.66.0.

The overwhelming majority of the commit is about upgrading our `alloc`
fork to the new version from upstream [1]. License has not changed [2][3]
(there were changes in the `COPYRIGHT` file, but unrelated to `alloc`).

As in the previous version upgrades (done out of tree so far), upgrading
`alloc` requires checking that our small additions (`try_*`) still match
their original (non`-try_*`) versions.

With this version upgrade, the following unstable Rust features were
stabilized: `bench_black_box` (1.66.0), `const_ptr_offset_from` (1.65.0),
`core_ffi_c` (1.64.0) and `generic_associated_types` (1.65.0). Thus
remove them.

This also implies that only two unstable features remain allowed for
non-`rust/` code: `allocator_api` and `const_refs_to_cell`.

There are some new Clippy warnings that we are triggering (i.e.
introduced since 1.62.0), as well as a few others that were not
triggered before, thus allow them in this commit and clean up or
remove them as needed later on:

  - `borrow_deref_ref` (new in 1.63.0).
  - `explicit_auto_deref` (new in 1.64.0).
  - `bool_to_int_with_if` (new in 1.65.0).
  - `needless_borrow`.
  - `type_complexity`.
  - `unnecessary_cast` (allowed only on `CONFIG_ARM`).

Furthermore, `rustdoc` lint `broken_intra_doc_links` is triggering on
links pointing to `macro_export` `macro_rules` defined in the same
module (i.e. appearing in the crate root).

However, even if the warning appears, the link still gets generated
like in previous versions, thus it is a bit confusing. An issue has
been opened upstream [4], and it appears that the link still being
generated was a compatibility measure, thus we will need to adapt
to the new behavior (done in the next patch).

In addition, there is an added `#[const_trait]` attribute in
`RawDeviceId`, due to 1.66.0's PR "Require `#[const_trait]` on `Trait`
for `impl const Trait`") [5].

Finally, the `-Aunused-imports` was added for compiling `core`. This was
fixed upstream for 1.67.0 in PR "Fix warning when libcore is compiled
with no_fp_fmt_parse" [6], and prevented for the future for that `cfg`
in PR "core: ensure `no_fp_fmt_parse builds` are warning-free" [7].

Reviewed-by: Björn Roy Baron <[email protected]>
Reviewed-by: Martin Rodriguez Reboredo <[email protected]>
Tested-by: Martin Rodriguez Reboredo <[email protected]>
Reviewed-by: Gary Guo <[email protected]>
Reviewed-by: Vincenzo Palazzo <[email protected]>
Reviewed-by: Alice Ferrazzi <[email protected]>
Tested-by: Alice Ferrazzi <[email protected]>
Reviewed-by: Neal Gompa <[email protected]>
Tested-by: Neal Gompa <[email protected]>
Link: Rust-for-Linux#947
Link: https://github.com/rust-lang/rust/tree/1.66.0/library/alloc/src [1]
Link: https://github.com/rust-lang/rust/blob/1.66.0/library/alloc/Cargo.toml#L4 [2]
Link: https://github.com/rust-lang/rust/blob/1.66.0/COPYRIGHT [3]
Link: rust-lang/rust#106142 [4]
Link: rust-lang/rust#100982 [5]
Link: rust-lang/rust#105434 [6]
Link: rust-lang/rust#105811 [7]
Signed-off-by: Miguel Ojeda <[email protected]>
Commit reference: 3dfc5eb

Signed-off-by: Asahi Lina <[email protected]>
Co-developed-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
This abstraction enables Rust drivers to walk Device Tree nodes and
query their properties.

Signed-off-by: Asahi Lina <[email protected]>
…ation

In order for modpost to work and correctly generate module aliases from
device ID tables, it needs those tables to exist as global symbols with
a specific name. Additionally, modpost checks the size of the symbol, so
it cannot contain trailing data.

To support this, split IdArrayIds out of IdArray. The former contains
just the IDs. Then split out the device table definition macro from the
macro that defines the device table for a given bus driver, and add
another macro to declare a device table as a module device table.
Drivers can now define their ID table once, and then specify that it
should be used for both the driver and the module:

// Generic OF Device ID table.
kernel::define_of_id_table! {ASAHI_ID_TABLE, &'static hw::HwConfig, [
    (of::DeviceId::Compatible(b"apple,agx-t8103"), Some(&hw::t8103::HWCONFIG)),
    (of::DeviceId::Compatible(b"apple,agx-t8112"), Some(&hw::t8112::HWCONFIG)),
    // ...
]}

/// Platform Driver implementation for `AsahiDriver`.
impl platform::Driver for AsahiDriver {
    /// Data associated with each hardware ID.
    type IdInfo = &'static hw::HwConfig;

    // Assign the above OF ID table to this driver.
    kernel::driver_of_id_table!(ASAHI_ID_TABLE);

    // ...
}

// Export the OF ID table as a module ID table, to make modpost/autoloading work.
kernel::module_of_id_table!(MOD_TABLE, ASAHI_ID_TABLE);

Signed-off-by: Asahi Lina <[email protected]>
This patch adds a logic similar to `devm_platform_ioremap_resource`
function adding:
  - `IoResource` enumerated type that groups the `IORESOURCE_*` macros.
  - `get_resource()` method that is a binding of `platform_get_resource`
  - `ioremap_resource` that is newly written method similar to
    `devm_platform_ioremap_resource`.

Lina: Removed `bit` dependency and rebased

Co-developed-by: Asahi Lina <[email protected]>
Signed-off-by: Maciej Falkowski <[email protected]>
Allows drivers to configure the DMA masks for a device. Implemented
here, not in device, because it requires a mutable platform device
reference this way (device::Device is a safely clonable reference).

Signed-off-by: Asahi Lina <[email protected]>
Apple SoCs require non-posted mappings for MMIO, and this is
automatically handled by devm_ioremap_resource() and friends via a
resource flag. Implement the same logic in kernel::io_mem, so it can
work the same way.

Signed-off-by: Asahi Lina <[email protected]>
TODO: This isn't abstracted properly yet

Signed-off-by: Asahi Lina <[email protected]>
Apple Silicon SoCs (M1, M2, etc.) have a GPU with an ARM64 firmware
coprocessor. The firmware and the GPU share page tables in the standard
ARM64 format (the firmware literally sets the base as its TTBR0/1
registers). TTBR0 covers the low half of the address space and is
intended to be per-GPU-VM (GPU user mappings and kernel-managed
buffers), while TTBR1 covers the upper half and is global (firmware
code, data, management structures shared with the AP, and a few
GPU-accessible data structures).

In typical Apple fashion, the permissions are interpreted differently
from traditional ARM PTEs. By default, firmware mappings use Apple SPRR
permission remapping. The firmware only uses that for its own
code/data/MMIO mappings, and those pages are not accessible by the GPU
hardware. We never need to touch/manage these mappings, so this patch
does not support them.

When a specific bit is set in the PTEs, permissions switch to a
different scheme which supports various combinations of firmware/GPU
access. This is the mode intended to be used by AP GPU drivers, and what
we implement here.

The prot bits are interpreted as follows:

- IOMMU_READ and IOMMU_WRITE have the usual meaning.

- IOMMU_PRIV creates firmware-only mappings (no GPU access)
- IOMMU_NOEXEC creates GPU-only structures (no FW access)
- Otherwise structures are accessible by both GPU and FW

- IOMMU_MMIO creates Device mappings for firmware
- IOMMU_CACHE creates Normal-NC mappings for firmware (cache-coherent
  from the point of view of the AP, but slower)
- Otherwise creates Normal mappings for firmware (this requires manual
  cache management on the firmware side, as it is not coherent with the
  SoC fabric)

GPU-only mappings (textures/etc) are expected to use IOMMU_CACHE and are
seemingly coherent with the CPU (or otherwise the firmware/GPU already
issue the required cache management operations when correctly
configured).

There is a GPU-RO/FW-RW mode, but it is not currently implemented (it
doesn't seem to be very useful for the driver). There seems to be no
real noexec control (i.e. for shaders) on the GPU side. All of these
mappings are implicitly noexec for the firmware.

Drivers are expected to fully manage per-user (TTBR0) page tables, but
ownership of shared kernel (TTBR1) page tables is shared between the
firmware and the AP OS. We handle this by simply using a smaller IAS to
drop down one level of page tables, so the driver can install a PTE in
the top-level (firmware-initialized) page table directly and just add an
offset to the VAs passed into the io_pgtable code. This avoids having to
have any special handling for this here. The firmware-relevant data
structures are small, so we do not expect to ever require more VA space
than one top-level PTE covers (IAS=36 for the next level, 64 GiB).

Only 16K page mode is supported. The coprocessor MMU supports huge pages
as usual for ARM64, but the GPU MMU does not, so we do not enable them.

Signed-off-by: Asahi Lina <[email protected]>
DRM drivers need to be able to declare which driver-specific ioctls they
support. This abstraction adds the required types and a helper macro to
generate the ioctl definition inside the DRM driver.

Note that this macro is not usable until further bits of the
abstraction are in place (but it will not fail to compile on its own, if
not called).

Signed-off-by: Asahi Lina <[email protected]>
Add the initial abstractions for DRM drivers and devices. These go
together in one commit since they are fairly tightly coupled types.

A few things have been stubbed out, to be implemented as further bits of
the DRM subsystem are introduced.

Signed-off-by: Asahi Lina <[email protected]>
A DRM File is the DRM counterpart to a kernel file structure,
representing an open DRM file descriptor. Add a Rust abstraction to
allow drivers to implement their own File types that implement the
DriverFile trait.

Signed-off-by: Asahi Lina <[email protected]>
The DRM GEM subsystem is the DRM memory management subsystem used by
most modern drivers. Add a Rust abstraction to allow Rust DRM driver
implementations to use it.

Signed-off-by: Asahi Lina <[email protected]>
There doesn't seem to be a way for the Rust bindings to get a
compile-time constant reference to drm_gem_shmem_vm_ops, so we need to
duplicate that structure in Rust... this isn't nice...

Signed-off-by: Asahi Lina <[email protected]>
@asahilina asahilina force-pushed the gpu/rust-wip branch 10 times, most recently from 69330aa to 98f5852 Compare May 28, 2023 14:15
@asahilina asahilina force-pushed the gpu/rust-wip branch 2 times, most recently from 51a96a2 to 89651c2 Compare June 15, 2023 04:11
@asahilina asahilina force-pushed the gpu/rust-wip branch 2 times, most recently from b3002e4 to 5c5368d Compare August 3, 2023 10:24
@asahilina asahilina force-pushed the gpu/rust-wip branch 2 times, most recently from 77166d0 to 9c6bcb8 Compare September 9, 2023 03:33
@asahilina asahilina force-pushed the gpu/rust-wip branch 2 times, most recently from 499c582 to 5ccaf76 Compare January 17, 2024 08:25
marcan pushed a commit that referenced this pull request Jan 19, 2024
syzkaller report:

 kernel BUG at net/core/skbuff.c:3452!
 invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135
 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
 Call Trace:
 icmp_glue_bits (net/ipv4/icmp.c:357)
 __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
 ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
 icmp_push_reply (net/ipv4/icmp.c:370)
 __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
 ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
 __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
 ip_output (net/ipv4/ip_output.c:427)
 __ip_queue_xmit (net/ipv4/ip_output.c:535)
 __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
 __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
 tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
 tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
 tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)

The panic issue was trigered by tcp simultaneous initiation.
The initiation process is as follows:

      TCP A                                            TCP B

  1.  CLOSED                                           CLOSED

  2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...

  3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT

  4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED

  5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...

  // TCP B: not send challenge ack for ack limit or packet loss
  // TCP A: close
	tcp_close
	   tcp_send_fin
              if (!tskb && tcp_under_memory_pressure(sk))
                  tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
           TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag

  6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...

  // TCP B: send challenge ack to SYN_FIN_ACK

  7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack

  // TCP A:  <SND.UNA=101>

  8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic

	__tcp_retransmit_skb  //skb->len=0
	    tcp_trim_head
		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
		    __pskb_trim_head
			skb->data_len -= len // skb->len=-1, wrap around
	    ... ...
	    ip_fragment
		icmp_glue_bits //BUG_ON

If we use tcp_trim_head() to remove acked SYN from packet that contains data
or other flags, skb->len will be incorrectly decremented. We can remove SYN
flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
can fix the problem mentioned above.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Co-developed-by: Eric Dumazet <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Dong Chenchen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
@asahilina asahilina force-pushed the gpu/rust-wip branch 3 times, most recently from 22f3e5c to 1213cf2 Compare May 10, 2024 12:25
@asahilina asahilina force-pushed the gpu/rust-wip branch 2 times, most recently from 38f6258 to 01f6f6f Compare July 17, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.