Replies: 6 comments 11 replies
-
Architectures used to drive WS2812 and other one-wire pixelsIn each design, there is an array of the target intensity (e.g., RGB, RGBW, etc.) values to be transmitted to the pixels. However, there is no hardware unit that will convert those intensity values into the transmission format that the pixels require, so the intensity values have to be converted in some form to allow the hardware to "do the right thing". Here's three models (architectures) that I've seen fairly commonly used to drive strands of pixels. Option 1: bit-bangingIf you have an AVR, or other microcontroller where you can disable interrupts, then this is definitely an option. It tends to require disabling interrupts when sending data, to prevent interrupts from disrupting the very sensitive timing requirements ... the timing of T0H and T1H are entirely dependent on the clock cycles of the processor. On the positive side, these microcontrollers tend to enable cycle-accurate deterministic execution. RAM constraints and input options tend to be limiting factors. This design is not a good fit for any microcontroller with real-time interrupts that cannot be disable, and would interfere with the bit-banging timing. Option 2: Pre-convert the data, then transmit in one DMA operationIn this design, the entire set of intensity values are pre-converted into the format that the hardware needs. Quite often, this can result in RAM requirements of 3x-10x the size of the original intensity array. This RAM requirement reduces the maximum number of pixels that can be controlled accordingly. In addition, because the pixel data is processed entirely, before any transmission begins, it adds latency that scales with the number of pixels. At the same time, once the buffer is prepared, the entire pixel strand's data is sent as a single DMA operation. This gives significant flexibility, and if the signal to start the strand's transmission is distinct from when the data arrives, can provide rock-solid results, in the face of otherwise difficult conditions. Option 3: Real-time conversion of small chunks, with multiple hardware operationsIn this design, a fixed-sized chunk (as small as a few bits at a time) are converted from the intensity values into the form used for programming the hardware to transmit those bits. Generally, this includes setting up an ISR (Interrupt Service Routine) that fires after the hardware transmits data, with the ISR doing the following:
This design is outstanding when RAM is the constraining factor, such as controlling long strings of pixels. At the same time, it imposes hard-real-time processing requirements on the overall system, which increases the complexity. ESPixelStickThe ESPixelStick firmware uses the third model. For example, the v3 firmware on v3 ForkInEye hardware uses option 3 (using UART for the hardware control). The v4 firmware (at time of writing) continues to use the same model (at least for UART output). It therefore has hard real-time requirements. At the same time, at least for WS2812 (e.g., NeoPixels), some tricks are used (whether they knew it at the time or not) that greatly reduced the difficulty of meeting the real-time requirements. (see later post) Firmware v4 using RMT is very similar; see Martin Mueller2003's response. |
Beta Was this translation helpful? Give feedback.
-
WS2812 - Actual Timing RequirementsProtocol flexibilitySee this excellent four-part series, which describes the REAL timing requirements that WS2812 (e.g., NeoPixels) have. That series of articles exposed some really useful information, when desiring to create rock-solid controllers. In particular, some of the timing requirements are much looser than they appear in the specification. The only ones that really appear to have tight tolerances are T0H and T1H ... how many nanoseconds the line is driven high. Moreover, each of the WS2812 pixels (due to manufacturing tolerances, heat, wire capacitance, etc.) will have slightly different timing behavior. This suggests the following goals for designing a "solid" controller:
Real-time requirements driving WS2812 using ESP8266's UARTWhile T0H and T1H have extremely tight timing requirements, the ESPixelStick firmware uses hardware to ensure the timing for those two are accurately met. In particular, it (ab)uses the UART hardware, such that both T0H and T1H timings are ensured by the UART hardware. Using a d1 Mini (ESP8266)'s UART to drive WS2812:
Thus, with a full FIFO buffer used entirely for TX, that allows over 300us of jitter (e.g., caused by other higher-priority ISRs). |
Beta Was this translation helpful? Give feedback.
-
ESP, ISRs, and flash memoryIn trying to figure out the project coding standards, I tried to summarize the apparent preference for MartinMueller2003 wrote:
This post provides some of the gory details, to help understand why it's ... complex ... to use ISRs to get near-deterministic timing, as would be useful for driving WS2812 pixels, for example. ESP technical restrictions
Limitations on ISRs placed in IRAMWhen there is an ISR that uses the
SummaryEnsuring deterministic execution of ISRs on the ESP platform requires significant additional thought and work to ensure only RAM (and not FLASH) are accessed, and pitfalls (such as jump tables, switch/case lookup tables, compiler bugs) can make it hard to get right the first few times around. |
Beta Was this translation helpful? Give feedback.
-
A "timeslot" view of one-wire pixel protocolsThis overview will gloss over some details, such as when chipsets require A look at a typical one-wire pixel protocolWith this focus, there are only four numbers needed to define a
When the values for Example chipset timings + timeslot view
Timeslot-based depiction using bit patternsUsing the timeslot view, there are two possibilities for the signal being sent NOTE: for wire transmission, the bits will be listed left-to-right as they Three-timeslot protocol (UCS8903) example
Four-timeslot protocol (SK6812) example
Five-timeslot protocol (UCS1903) example
| UCS1903 | 250ns | 750ns | 250ns | 250ns | 1/3/1 | 5 |
As one can see, for a 3-timeslot protocol, it's possible to define the output for any three-bit pattern, using only nine (9) bits. Similarly, for 4-timeslot and 5-timeslot protocols, it's possible to define the output for any two-bit pattern, using only eight (8) or ten (10) bits, respectively. A particularly astute reader would also note that the first on-wire bit for these bit patterns will always be one (1), and the last on-wire bit for such sequences will always be zero (0). This is a key insight to realizing how ESPixelStick uses the UART hardware buffers efficiently, to ensure accurate timing (see later posts). |
Beta Was this translation helpful? Give feedback.
-
From timeslot-based protocol to hardware UART outputA little background on UART protocolThe UART protocol sends data in UART frames. A UART frame has the following format:
One might see the variables defined in shorthand, with the first char being a digit representing the number of data bits, the second being either N, E, or O (for no parity, even parity, or odd parity), and the last being either a 1 or 2 for the number of stop bits. For ESPixelStick purposes, parity will always be UART protocol encoding shorthand
Example UART frame encodings
The following table shows how a random set of binary data would be encoded into a UART frame. For the UART frame, the start and stop bits will be shown in parenthesis for clarity. The inverted UART frame will also be shown, as the ESP allows the UART output to be inverted by its hardware.
As noted in the prior post, various pixel protocols' transmission of a 0 vs. 1 bit can being defined via a small number of fixed-length "timeslots". The next sections will provide concrete examples of how multi-bit compact tables (for outputting multiple bits of a protocol via a single UART frame) can be constructed for 3-timeslot, 4-timeslot, and 5-timeslot protocols. 4-timeslot Example (SK6812)
Any two bit transmission of a four-timeslot protocol can be defined using eight bits, the first of which is always a NOTE: Wire output is read left-to-right, UART sends least-significant data bit first
The UART frame is thus encoded as uint8_t SK6812_2bit_to_UART_data[4] = {
0b110_111, // (0) 111 011 (1) == (1) 000 100 (0)== transmit 0b00
0b100_111, // (0) 111 001 (1) == (1) 000 110 (0)== transmit 0b01
0b110_110, // (0) 011 011 (1) == (1) 100 100 (0)== transmit 0b10
0b100_110, // (0) 011 001 (1) == (1) 100 110 (0)== transmit 0b11
}; Sending an entire byte of intensity data then becomes a matter of splitting the intensity byte into four (4x) two-bit chunks, and "sending" the corresponding byte from the lookup table. 5-timeslot Example (UCS1903)
Two bits of a five-timeslot protocol can be defined using ten bits, the first of which is always a NOTE: Wire output is read left-to-right, UART sends least-significant data bit first
The UART frame is thus encoded as uint8_t USC1903_2bit_to_UART_data[4] = {
0b1110_1111, // (0) 1111 0111 (1) == (1) 0000 1000 (0)== transmit 0b00
0b0000_1111, // (0) 1111 0000 (1) == (1) 0000 1111 (0)== transmit 0b01
0b1110_1000, // (0) 0001 0111 (1) == (1) 1000 1000 (0)== transmit 0b10
0b0000_1000, // (0) 0001 0000 (1) == (1) 1000 1111 (0)== transmit 0b11
}; Sending an entire byte of intensity data then becomes a matter of splitting the intensity byte into four (4x) two-bit chunks, and "sending" the corresponding byte from the lookup table. 3-timeslot Example (UCS8903)
Three bits of a three-timeslot protocol can be defined using nine bits, the first of which is always a NOTE: Wire output is read left-to-right, UART sends least-significant data bit first
The UART frame is thus encoded as uint8_t USC8903_3bit_to_UART_data[4] = {
`0b10_110_11`, // (0) 11 011 01 (1) == (1) 00 110 10 (0) == transmit 0b000
`0b00_110_11`, // (0) 11 011 00 (1) == (1) 00 110 11 (0) == transmit 0b001
`0b10_100_11`, // (0) 11 001 01 (1) == (1) 00 100 10 (0) == transmit 0b010
`0b00_100_11`, // (0) 11 001 00 (1) == (1) 00 100 11 (0) == transmit 0b011
`0b10_110_10`, // (0) 01 011 01 (1) == (1) 10 110 10 (0) == transmit 0b100
`0b00_110_10`, // (0) 01 011 00 (1) == (1) 10 110 11 (0) == transmit 0b101
`0b10_100_10`, // (0) 01 001 01 (1) == (1) 10 100 10 (0) == transmit 0b110
`0b00_100_10`, // (0) 01 001 00 (1) == (1) 10 100 11 (0) == transmit 0b111
}; Because the number of total intensity bits to be sent is not always divisible by three (e.g., if using RGBW pixels), two additional tables are needed for when there are one or two leftover bits to be sent. These must be encoded with the same 7N1 encoding as the protocol uses for the bulk intensity bits:
uint8_t USC8903_3bit_to_UART_single_trailing_bit[4] = {
`0b11_111_11`, // (0) 11 111 11 (1) == (1) 00 000 00 (0) == transmit 0b0
`0b11_111_10`, // (0) 01 111 11 (1) == (1) 10 000 00 (0) == transmit 0b1
};
uint8_t USC8903_3bit_to_UART_two_trailing_bits[4] = {
`0b11_110_11`, // (0) 11 011 11 (1) == (1) 00 100 00 (0) == transmit 0b00
`0b11_100_11`, // (0) 11 001 11 (1) == (1) 00 110 00 (0) == transmit 0b01
`0b11_110_10`, // (0) 01 011 11 (1) == (1) 10 100 00 (0) == transmit 0b10
`0b11_100_10`, // (0) 01 001 11 (1) == (1) 10 110 00 (0) == transmit 0b11
}; When the number of bytes of intensity data to be transmitted is known ahead-of time, it can be pre-computed and stored for use in the ISR. This adds a subtraction, comparison, and conditional jump to the ISR. For example, the jump may be based on whether at least three more bytes (24 bits) of intensity data are to be sent. If yes, sending those three bytes would require eight (8) UART frames. When only 1 or 2 bytes remained, then a separate code path is followed, which sends all but the last 1 or 2 bits of data using the normal translation tables. The final 1-2 bits of data instead use the corresponding "trailing bit" table. SummaryBy using the above methods, the length of all high values is enforced by hardware, and the minimum length of the trailing low values is also enforced by hardware. There is strong evidence that the trailing low bits can be significantly extended (e.g., doubling This is one basis for the rock-solid performance ESPixelStick provides. |
Beta Was this translation helpful? Give feedback.
-
RMT output from pixel protocol timingThis post will be glossing over implementation details, because Quick summary of RMT peripheralThe ESP has a remote control module (RMT). The TRM states:
In layman's terms, the RMT allows hardware-generated signals of Although typically the bit is high for one setting, and low for RMT specific capabilitiesThe RMT has eight (8x) output channels, and each output channel has a set Each of those 32-bit slots stores two 16-bit entries, each entry indicating:
Because the RMT peripheral was designed for transmitting sequences of When controlling all eight output channels, the RMT provides each The RMT peripheral can be configured to continuously loop through its (*) "dedicated" ... but some configurability
(*) When using fewer than eight output channels, it becomes As another example, if only using RMT channels 0, 2, and 6,
ESPixelstick firmware does not currently modify the Example chipset timings where RMT is well-suitedHere are some chipset timings that would be difficult to convert into Expand for table and example RMT mappings
// In practice, the APB_CLK is typically 80MHz (12.5ns per clock), so
// the datasheet values could be off by 6.25ns when encoded ... which
// is a very tight tolerance for pixel protocols!
//
// However, for ease of illustration and comparison with the above
// table, these examples will use a fictional RMT clock of 1GHz (1ns).
rmt_item32_t RMT_PL9823_BIT[2] = {
{ .duration0 = 350, .level0 = 1, .duration1 = 1360, .level1 = 0 }, // 0-bit
{ .duration0 = 1360, .level0 = 1, .duration1 = 350, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_SK6822_BIT[2] = {
{ .duration0 = 375, .level0 = 1, .duration1 = 1375, .level1 = 0 }, // 0-bit
{ .duration0 = 1375, .level0 = 1, .duration1 = 375, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_SM16703_BIT[2] = {
{ .duration0 = 300, .level0 = 1, .duration1 = 900, .level1 = 0 }, // 0-bit
{ .duration0 = 900, .level0 = 1, .duration1 = 300, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_TM1809_BIT[2] = {
{ .duration0 = 350, .level0 = 1, .duration1 = 800, .level1 = 0 }, // 0-bit
{ .duration0 = 800, .level0 = 1, .duration1 = 350, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_TM1829_BIT[2] = { // No simple T1/T2/T3 conversion
{ .duration0 = 100, .level0 = 1, .duration1 = 500, .level1 = 0 }, // 0-bit
{ .duration0 = 400, .level0 = 1, .duration1 = 200, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_UCS1903B_BIT[2] = { // No simple T1/T2/T3 conversion
{ .duration0 = 400, .level0 = 1, .duration1 = 900, .level1 = 0 }, // 0-bit
{ .duration0 = 850, .level0 = 1, .duration1 = 450, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_USC1904_BIT[2] = { // No simple T1/T2/T3 conversion
{ .duration0 = 400, .level0 = 1, .duration1 = 850, .level1 = 0 }, // 0-bit
{ .duration0 = 800, .level0 = 1, .duration1 = 450, .level1 = 0 } // 1-bit
};
rmt_item32_t RMT_WS2812_BIT[2] = { // No simple T1/T2/T3 conversion
{ .duration0 = 250, .level0 = 1, .duration1 = 1000, .level1 = 0 }, // 0-bit
{ .duration0 = 875, .level0 = 1, .duration1 = 375, .level1 = 0 } // 1-bit
}; |
Beta Was this translation helpful? Give feedback.
-
This thread will contain posts with some more technical deep dives into ESPixelStick internals, WS2812 timing requirements, etc.
Beta Was this translation helpful? Give feedback.
All reactions