Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xdma] alonbl's stable patchset #240

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

[xdma] alonbl's stable patchset #240

wants to merge 15 commits into from

Conversation

alonbl
Copy link

@alonbl alonbl commented Nov 10, 2023

Xilinx developers do not maintain the xdma driver and does not cooperate on github as well, I collected stable fixes of the xdma driver for people use and review.

WARNING: The xdma driver is far from being production grade, it crashes the kernel when aio is used, crashes when readv/writev is used, it have fundamental issues that are solved in this series, it loses sync at random time splitting buffers into parts, it also does not respect small buffer boundary.

XDMA DMA engine was merged into mainline since linux-6.3, staging is at https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/dma/xilinx/xdma.c a user bridge is missing so we can switch.

The following patchset at least provides some sanity for EOP and stream based synchronous continues allocation operations.

Thanks for all contributors.

Notice

The name of the module is renamed from xdma.ko to xdma-chr.ko due to a conflict with latest merge of xdma.ko dmaengine into upstream.

Patchset includes

Reference Description
xdma: rename module to xdma-chr
xdma: build improvements
#319 xdma: cdev_events: Do not leak ERESTARTSYS leak to userspace
#144 xdma: explicitly stop engine on transfer completion
#161 xdma: fix decs_cmpl and flush when eop received
#190 xdma: fixed PCIe domain and bus values in info ioctl
#92 xdma: simplify next_adj setting
#158 xmda: set module parameter timeouts as milliseconds
#313 xdma: libxdma: handle ERESTARTSYS for both wait_ and swait_*, with and without timeout
#315 xdma: libxdma: engine_status_dump: avoid buffer overflow
xdma: libxdma: fix resource free on open
#226 xdma: cdev_ctrl: use resource_size_t for pci_resource_start()
#218 xdma: correct typos in xdma_mod.c
#315 libxdma: improve messages
#315 libxdma: fix message typo

@hmaarrfk
Copy link

I remember when I was looking at options for XDMA, I ran into https://gitlab.com/WZab/Artix-DMA1

We have been maintaining a significant set of patches for QDMA (not XDMA), at the time required to avoid crashes, and kernel panics.

Over the many years, Xilinx did address some of the crashes, we still avoid things like aoi, and maybe preadv.

Xilinx also tended to break compatibility by changing the order inside enums. This has caused us great headaches with backward compatibility. Ultimately, we have decided to simply undo their ordering changing, for our own compatibility purposes.

We periodically rebase with Xilinx's patches, but their refusal to acknowledge anything posted here has been very frustrating.

@alonbl
Copy link
Author

alonbl commented Nov 10, 2023

Thank you @hmaarrfk,
Thank you for your insights, yes, Xilinx is doing a very bad job in supporting their own product, I am unsure what they gain provided it simply does not work.
Alon

@hmaarrfk
Copy link

Working in a small company, and I presume the team in charge of support is quite small, I believe there is a directive to simply ignore whatever is posted here.

They do however, monitor their forum. That isn't to say it is particularly amazing advice they gave, mostly due to the fact that sharing code, designs, and environment is inherently hard with hardware, but I have been able to get replies from people that were knowledgeable about the problems by posting there.

Its juts a shame that they won't take a look here when specific resolutions, less than 10 lines long each, would fix real problems in the code.

@eniv
Copy link

eniv commented Feb 13, 2024

Hi Alon,
Thanks a lot for maintaining your fork!
Can I suggest that you avoid using forced-push and simply commit in a conventional way?
When you force push the commit hashes change. This make it harder for someone who wants to keep up with the new changes and just cherry pick them.

@alonbl
Copy link
Author

alonbl commented Feb 13, 2024

@eniv this is a downstream patchset, it is not a development, the patchset is a set of patches each is performing a specific change on top of upstream, it will always be rebased (forced push) and ordered by the most trivial to the most intrusive to ease upstream merge. this behavior is same as any downstream distribution/subsystem/maintainer.

@dmitrym1
Copy link

dmitrym1 commented Jan 17, 2025

Please consider merging #313 #315 #316 #319
And thanks for your effort!

dmaleev and others added 9 commits January 17, 2025 23:42
transfer_desc_init already sets the control field to
cpu_to_le32(DESC_MAGIC) so it is useless to mask and rewrite the magic.
Just write next_adj at the right location in the control field.
struct pci_dev->devfn has the PCIe device and function values only,
and not the PCIe bus. Hence the previous bus value was always zero.

Signed-off-by: Tal Zilcer <[email protected]>
When used with the eop_flush flag, the driver may be interrupted before
all descriptors has been processed by the XDMA core, and it may so
happen that during the interrupt processing, the engine has already
processed all descriptors, but the driver is not aware of it.

In libxdma.c:1139 there is a check if the engine is busy
    if ((engine->running && !(engine->status & XDMA_STAT_BUSY))
at the engine is stopped if it not busy.

When that code is executed, the engine may indeed be busy, but by the
time we get the descriptor count on line 1155, all descriptors may have
already been processed, and the engine is not busy anymore.

Since the running status of the engine in the driver has not changed
(engine->running == 1), subsequent transfers are only *queued*, but not
processed by the engine, which leads to an eventual timeout and error
512.

The proposed change explicitly stops the engine when a transfer is
complete.
Replace it with the appropriate for userspace error codes
tools:

* Support program prefix.
* Support install target.
* Support CFLAGS and LDFLAGS from build system.

module:

* depmod should be optional.
* Accept installation target.
* Support modules_install target (avoid removal of leftovers)
the xdma upstream driver conflict with this driver with different
functionality.
@alonbl
Copy link
Author

alonbl commented Jan 17, 2025

Please consider merging #313 #315 #316 #319 And thanks for your effort!

Done, except of #316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants