-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor some SPI transactions #988
Conversation
I came across this tactic when writing the rust implementation. By telling `read_register()` to read 0 bytes from a register that is actually a reserved SPI command byte, we don't need to use a special form of `write_register(uint8_t, uint8_t)`. The same transaction still occurs with less code executed to do it. This only affects `get_status()`, `flush_tx()`, and `flush_rx()`. Luckily, those SPI commands already have 0x20 bit asserted, so there's no need to mask the command byte with `W_REGISTER` as is done in `write_register()`. This needs testing on the various supported platforms to ensure a 1-byte buffer behaves as expected.
Memory usage change @ 0829b87
Click for full report table
Click for full report CSV
|
Hmm, flash size on ATmega328P increased by 2 bytes. Its probably because a pointer data type is slightly larger than a single byte. I still expect speed-ups though because I could try using // only the following 1-byte SPI commands have 0xE0 bits asserted
// FLUSH_TX, FLUSH_RX, RF24_NOP, REUSE_TX_PL
if ((reg & 0xE0) != 0xE0) {
result = _SPI.transfer(0xff);
} |
9c0ba58
to
ba12195
Compare
I don't particularly like this idea. It basically goes back to having a specialized While there is a 2 byte increase in flash on ATmega328p. there is also a not-insignificant decrease in flash on ATSAMD21. I think this change is still worth it considering coretx-M chips are a much more desirable purchase now-a-days; the price is almost the same as the older ATmega328 but with a lot more benefits (like extra RAM and native USB support). |
I'd also like to refactor Although, I'm afraid such a refactor might have an undesirable change in compile size. At this point, I'm only concerned with possible speed-ups without increasing the compile size (preferably reducing it). |
I would add my opinion here, but I don't have one yet. I need to look over the code etc and understand what you are doing. 😄 I haven't looked at the RF24 code base for quite some time now, so gimme some time to review here. |
I'm not adamant about these changes. I just opened the PR as a draft to see compile size reports. Compile size aside, I believe these changes should speed up the SPI transactions a little. |
Testing looks good on Linux. I'm seeing a noticeable speed-up during the streamingData example. The ackPayload example seems to be running about the same speed. Its hard to measure because the speed-up may be a matter nanoseconds (dependent on CPU cycles). |
This is just weird. How you figured out to use a read instead of a write is beyond me. It seems to work too, so I am again very impressed. |
It was just from taking a fresh look. Its kinda fun to write something from scratch, especially in a new/different language. When I first joined up with you here, I just finished writing the pure python implementation, so I brought my "improvement" ideas with me then. Now that I wrote a pure rust implementation, I have some work (related to these changes) to do in the CirPy lib... The "read" and "write" terms are more of a human perception when every transaction over the SPI bus is full duplex (both read and write). I lucked out with the 0x20 bit being asserted in the 1-byte SPI commands. |
Yes, but one needs a complete understanding of SPI and RF24 behaviors along with putting two-and-two together to pull something like this off! |
This comment was marked as resolved.
This comment was marked as resolved.
`R_REGISTER` is defined as 0. `reg | 0` always results in `reg`
`REGISTER_MASK` is defined as 0x1F. All register offsets are all under 0x1F. Any register offset masked with REGISTER_MASK results in the same register offset value, so we don't need to compute this at runtime. Note: all register offsets used are a known number. There is no public API that allows users to read a register offset larger than 0x1F.
When writing an ACK payload to TX FIFO, static payload size is not applicable. So, bypass the additional checks in `write_payload()` about ensuring static payload size is satisfied. Note, the `W_ACK_PAYLOAD` command already has 0x20 bit (`W_REGISTER`) asserted.
7abc73d
to
0829b87
Compare
Finally had some time to test this more, and my testing shows a speed-up of at least 4-6 KB/s transferring data from a Nano to a Due at 1MBPS. That's pretty significant! |
I want to check that this still works with the Pico SDK. I doubt there will be any problems, but it doesn't hurt to verify anyway. |
This works fine with Pico SDK. Seen as how I've tested Linux and PicoSDK (and Arduino-pico core for good measure), are there any objections to merging this? |
Nope, I’m pretty happy about these changes
|
We should probably publish a new release to get these optimizations out into the Arduino world. This idea also beckons a release crusade because of the change |
Sounds good to me! In looking at it, there are a lot of changes all the way through the stack that probably should have been pushed out sooner. I haven't done much coding through the summertime though... been slacking lol |
It is a bit difficult to remember what changes are unreleased; we currently need to look at the git history since the last tag. If the immanent release crusade goes well, then I could set up a CI workflow to output unreleased changes in a job summary (like what I did here for a different project). The unreleased changes are not going to be kept in the CHANGELOGs because that would be rather cumbersome on local git clones. |
Hmm, well Its not that hard to do a compare as in v1.4.9...master |
It is a little annoying to get to that page. Usually I have to type it manually or first compare a release and change the comparison once the page loads. I'm sure there's a |
Release crusade completedThere's going to be some mis-categorization on behalf of
I'm still not sure why the word "Add" (or the PR label ) didn't cause it to be categorized properly into "Added". 🤷🏼♂️ |
Refactor 1-byte SPI commands
I came across this tactic when writing the rust implementation.
By telling
read_register()
to read 0 bytes from a register that is actually a reserved SPI command byte, we don't need to use a special form ofwrite_register(uint8_t, uint8_t)
. The same transaction still occurs with less code executed to do it.This only affects
get_status()
,flush_tx()
,flush_rx()
, andreUseTX()
.Luckily, those SPI commands already have 0x20 bit asserted, so there's no need to mask the command byte with
W_REGISTER
as is done inwrite_register()
.This needs testing on the various supported platforms to ensure a 1-byte buffer behaves as expected.
Removed mnemonics bit-wise operations
R_REGISTER
is defined as 0. So,reg | 0
always results inreg
. We don't need to compute this at runtime.REGISTER_MASK
is defined as 0x1F. All register offsets are all under 0x1F. Any register offset masked withREGISTER_MASK
results in the same register offset value, so we don't need to compute this at runtime.Note, all register offsets used are a known number. There is no public API that allows users to read a register offset larger than 0x1F.
Additional SPI refactoring
When writing an ACK payload to TX FIFO, static payload size is not applicable. So, bypass the additional checks in
write_payload()
about ensuring static payload size is satisfied; instead letwriteAckPayload()
usewrite_register()
directly.Note, the
W_ACK_PAYLOAD
command already has 0x20 bit (W_REGISTER
) asserted.