Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request filter for SocketCAN device #1247

Open
CWNE88 opened this issue Dec 6, 2023 · 15 comments
Open

Request filter for SocketCAN device #1247

CWNE88 opened this issue Dec 6, 2023 · 15 comments

Comments

@CWNE88
Copy link

CWNE88 commented Dec 6, 2023

I’d like to set capture filter for CAN ID.
Either specific CAN ID variable, or being able to specify which bytes of the payload to match.
Similar to this for selecting ID 5c2:

ether[2:2] == 0x05c2

That “ether” filter will not work, because the device type is SocketCAN.
However, that above filter does actually work if reading from a captured file.
I would like to be able to filter at the capture stage.

From current code:

case DLT_CAN_SOCKETCAN:
bpf_error(cstate, "CAN link-layer type filtering not implemented");

CAN only has an ID, length, and payload (of up to 8 bytes).
Even if we can only filter with specific bytes, that would be enough.

For instance, something like this maybe:

can[2:2] == 0x05c2

At least that way, we could define what bytes in the packet to filter for the CAN type.

Thanks
Paul.

@infrastation
Copy link
Member

ether[2:2] is the same as link[2:2], although it is not immediately obvious why offline filtering works and online filtering does not. What would be the cheapest piece of hardware to test these code paths and to reproduce the problem?

@CWNE88
Copy link
Author

CWNE88 commented Dec 6, 2023

I did originally try link[2:2] as well, but nothing got captured.
The cheap device I'm using is the isolated (although you probably don't need isolation for a desk test) version of this:
https://www.aliexpress.com/item/1005006032351087.html

@CWNE88
Copy link
Author

CWNE88 commented Dec 6, 2023

I've also just noticed (could be unrelated though) that while doing the capture with Wireshark, an ID sent with 0x0ABC comes up in the packet bytes window correctly at 0abc but in the packet details window it shows as 0x2bc. Lower ID of 0x123 seems fine
Screenshot from 2023-12-06 21-51-11

@CWNE88
Copy link
Author

CWNE88 commented Dec 6, 2023

can-request.pcapng.gz

@infrastation
Copy link
Member

Thank you, with the file post-capture filtering works as expected. I could get the USB dongle, would it need to be connected to something to capture any packets?

The Wireshark decoder uses only the low 11 bits for the ID, so 0x0abc (0b101010111100) becomes 0x2bc (0b001010111100). I am not familiar with the packet format to confirm whether that's the correct behaviour.

@CWNE88
Copy link
Author

CWNE88 commented Dec 6, 2023

Okay, it seems the can ID must be a value that fits within 7 bits, so that explains the Wireshark information.

To set this up, you don't need any other equipment, just the adapter.
Install can-utils to get the cansend command.
When you connect the device, it will come up as can0 (down)

Bring it up with:

ip link set can0 type can bitrate 500000
ip link set up dev can0

Send a message with:
cansend can0 123#11223344
where 123 is the ID and 11223344 is the payload (maximum 8 bytes for the payload)

You can run the capture on a different CLI to test filtering.

@infrastation
Copy link
Member

Thank you for the comments, I have ordered a very similar USB dongle and hopefully will be able to reproduce this problem soonish. Meanwhile other developers may want to make additional input.

@infrastation
Copy link
Member

This is how my USB CAN device identifies itself:

ID 1d50:606f OpenMoko, Inc. Geschwister Schneider CAN adapter
Product: candleLight USB to CAN adapter
Manufacturer: bytewerk

On Debian 12/AMD64 the problem seems to reproduce as described, in that link[2:2] = 0x0123 works as expected when reading from a savefile, but during the live capture no packets pass the filter:

# tcpdump -vni can0 'link[2:2] = 0x0123'
tcpdump: listening on can0, link-type CAN_SOCKETCAN (CAN-bus with SocketCAN headers), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

Without the filter the capture has the packets of interest (which take 2-3 seconds to appear in the capture after the cansend command) plus sometimes other packets. The DLT is CAN_SOCKETCAN in both cases and the filter looks correct:

(000) ldh      [2]
(001) jeq      #0x123           jt 2	jf 3
(002) ret      #262144
(003) ret      #0

This way, on the surface everything looks fine on the user-space side of the capture device, so maybe something could be wrong in the kernel.

@infrastation
Copy link
Member

Using the previously discussed setup, I tried to debug this problem and started three processes:

  • cangen can0 to generate some random packets
  • candump -x can0 to see what the packets look like
  • tcpdump -evni can0 'link[0:1] == 0' to see if the filter matches anything at all

candump was steadily receiving a few packets per second. tcpdump seemingly was stuck, but after a few minutes I noticed it actually captured a few packets (below are the hex dumps only):

0x0000:  0000 0400 0800 0000 d1fc 7929 fcf1 5346  ..........y)..SF

0x0000:  0000 0400 0000 0000 0000 0000 0000 0000  ................

0x0000:  0000 0100 0400 0000 6dea 312e 0000 0000  ........m.1.....

0x0000:  0000 0300 0400 0000 925a 4847 0000 0000  .........ZHG....

0x0000:  0000 0400 0100 0000 ee00 0000 0000 0000  ................

candump printed these packets as follows:

  can0  TX - -  400   [8]  D1 FC 79 29 FC F1 53 46
  can0  TX - -  400   [0]
  can0  TX - -  100   [4]  6D EA 31 2E
  can0  TX - -  300   [4]  92 5A 48 47
  can0  TX - -  400   [1]  EE

As far as I understand this observed behaviour, link[0:1] tests the least significant 8 bits of the ID, not the first octet of the message (the term "payload" would need to be carefully defined to be able to reason about it). As far I understand the expected behaviour, it should test the first octet of the link-layer packet, but the low 8 bits of the ID are three octets past the beginning of the link-layer, so the observed behaviour is not consistent with the spec either.

@infrastation
Copy link
Member

Apparently, this bug does not even require a CAN hardware to reproduce, e.g. on Debian 12:

ip li add type vcan name vcan0
ip li set dev vcan0 up

Then run the three programs as shown above using vcan0 as the device. Also, the filter expression link[0:1] % 16 <= 3 could be used to match 1/4 of the random packets rather than 1/256.

@CWNE88
Copy link
Author

CWNE88 commented Dec 3, 2024

Thanks for confirming that and looking into it.

@infrastation
Copy link
Member

The problem reproduces only for a live capture that uses kernel filtering. With the change below link[0:1] means the very first octet of the captured packet (which in my test seems always to be 0) and link[3:1] means the low 8 bits of the ID. This behaviour is consistent with reading from a file (as discussed earlier), which uses userland filtering too.

--- a/pcap-linux.c
+++ b/pcap-linux.c
@@ -4598,7 +4598,7 @@ pcap_setfilter_linux(pcap_t *handle, struct bpf_program *filter)
 
        /* Install kernel level filter if possible */
 
-       if (handle->fcode.bf_len > USHRT_MAX) {
+       if (handle->fcode.bf_len > 0) {
                /*
                 * fcode.len is an unsigned short for current kernel.
                 * I have yet to see BPF-Code with that much

@infrastation
Copy link
Member

The problem reproduces exactly the same using two USB-CAN adapters plugged into two separate hosts and connected with wire, where on one host cangen is sending and on the other tcpdump/libpcap are capturing.

@infrastation
Copy link
Member

As it turns out, capturing on the "any" pseudo-interface in Linux captures CAN frames as well, but the encoding is different and the difference looks related to this problem. This is what the remote end sends:

# cansend can0 123#0102030405060708

This is what local end receives on the SocketCAN device:

# tcpdump -eni can0 -c 1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on can0, link-type CAN_SOCKETCAN (CAN-bus with SocketCAN headers), snapshot length 262144 bytes
00:26:28.282106 UNSUPPORTED
	0x0000:  0000 0123 0800 0000 0102 0304 0506 0708  ...#............
1 packet captured
1 packet received by filter
0 packets dropped by kernel

The same frame looks different on the "any" pseudo-interface:

# tcpdump -eni any -c 1 ifindex 13
tcpdump: WARNING: any: That device doesn't support promiscuous mode
(Promiscuous mode not supported on the "any" device)
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:37:14.430497 can0  B   ifindex 13 ethertype Unknown (0x000c), length 36: 
	0x0000:  2301 0000 0800 0000 0102 0304 0506 0708  #...............
1 packet captured
1 packet received by filter
0 packets dropped by kernel

The first four bytes look endian-swapped, which looks consistent with the way Linux kernel source stores these bytes:

typedef __u32 canid_t;
struct can_frame {
        canid_t can_id;  /* 32 bit CAN_ID + EFF/RTR/ERR flags */

A test confirms that the encoding seen on the "any" pseudo-interface is the encoding the kernel mode filter gets on input when capturing on the "can0" interface: link[0:1] == 0x23 && link[1:1] == 0x01 && link[2:1] == 0x00 && link[3:1] == 0x00 && link[4:1] == 0x08 captures the above packet on "can0" contrary to what the hex dump shows. In other words, the root cause has to do with the order of the first four bytes rather than an offset. Note that the same expression does not match the packet on "any" because link begins to match in the SLL2 header rather than the CAN header.

One other odd thing is that when a filter accesses the first four octets, the first CAN packet does not match the filter, but subsequent identical packets do:

3 packets captured
4 packets received by filter
0 packets dropped by kernel

It is not clear whether this effect is related.

@infrastation
Copy link
Member

This seems most likely related to the byte order processing in pcap_handle_packet_mmap(), which converts the host byte order can_id that comes from Linux kernel to the network byte order nominal link-layer encoding, see the earlier commits 4357b2d and fe47b89 for more detail.

One issue here is that this conversion does not occur consistently for both DLT_CAN_SOCKETCAN and DLT_LINUX_SLL2, which causes the difference in the CAN message contents in the two live capture results just before. I suppose it would be more consistent to deliver the same data if possible.

Another issue is that the kernel filter runs on the CAN message before the conversion, so as a long-term solution the filter bytecode would need to be rewritten to implement the reverse conversion on little-endian hosts. In particular, a new function similar to the existing fix_offset() would need to modify the load instructions such that byte 3 becomes byte 0, byte 2 becomes byte 1, byte 1 becomes byte 2 and byte 0 becomes byte 3. This would be not as trivial as it seems if an instruction loads a 16-bit or a 32-bit integer that spans both the byte-swapped and non-byte-swapped parts of the packet, e.g. link[3:2] or link[2:4]. For such filters the only practicable workaround could be using userland filtering. A short-term workaround for the problem could be using userland filtering always or only when a load instruction accesses one of the first 4 bytes. In other words, CAN filtering should default to userland and should use kernel mode only if there is a proof (with or without rewriting the bytecode) the filter will work correctly in the kernel.

@guyharris, does it make sense? Are there any other factors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants