Technical Deep Dives #521

henrygab · 2022-04-28T00:01:55Z

henrygab
Apr 28, 2022

This thread will contain posts with some more technical deep dives into ESPixelStick internals, WS2812 timing requirements, etc.

henrygab · 2022-04-28T01:57:56Z

henrygab
Apr 28, 2022
Author

Architectures used to drive WS2812 and other one-wire pixels

In each design, there is an array of the target intensity (e.g., RGB, RGBW, etc.) values to be transmitted to the pixels. However, there is no hardware unit that will convert those intensity values into the transmission format that the pixels require, so the intensity values have to be converted in some form to allow the hardware to "do the right thing". Here's three models (architectures) that I've seen fairly commonly used to drive strands of pixels.

Option 1: bit-banging

If you have an AVR, or other microcontroller where you can disable interrupts, then this is definitely an option. It tends to require disabling interrupts when sending data, to prevent interrupts from disrupting the very sensitive timing requirements ... the timing of T0H and T1H are entirely dependent on the clock cycles of the processor. On the positive side, these microcontrollers tend to enable cycle-accurate deterministic execution. RAM constraints and input options tend to be limiting factors.

This design is not a good fit for any microcontroller with real-time interrupts that cannot be disable, and would interfere with the bit-banging timing.

Option 2: Pre-convert the data, then transmit in one DMA operation

In this design, the entire set of intensity values are pre-converted into the format that the hardware needs. Quite often, this can result in RAM requirements of 3x-10x the size of the original intensity array. This RAM requirement reduces the maximum number of pixels that can be controlled accordingly. In addition, because the pixel data is processed entirely, before any transmission begins, it adds latency that scales with the number of pixels.

At the same time, once the buffer is prepared, the entire pixel strand's data is sent as a single DMA operation. This gives significant flexibility, and if the signal to start the strand's transmission is distinct from when the data arrives, can provide rock-solid results, in the face of otherwise difficult conditions.

Option 3: Real-time conversion of small chunks, with multiple hardware operations

In this design, a fixed-sized chunk (as small as a few bits at a time) are converted from the intensity values into the form used for programming the hardware to transmit those bits. Generally, this includes setting up an ISR (Interrupt Service Routine) that fires after the hardware transmits data, with the ISR doing the following:

Convert the next chunk of data into the transmission format.
Update the hardware to send that next chunk of data when any prior chunk(s) are complete.

This design is outstanding when RAM is the constraining factor, such as controlling long strings of pixels. At the same time, it imposes hard-real-time processing requirements on the overall system, which increases the complexity.

ESPixelStick

The ESPixelStick firmware uses the third model. For example, the v3 firmware on v3 ForkInEye hardware uses option 3 (using UART for the hardware control). The v4 firmware (at time of writing) continues to use the same model (at least for UART output). It therefore has hard real-time requirements. At the same time, at least for WS2812 (e.g., NeoPixels), some tricks are used (whether they knew it at the time or not) that greatly reduced the difficulty of meeting the real-time requirements. (see later post)

Firmware v4 using RMT is very similar; see Martin Mueller2003's response.

1 reply

MartinMueller2003 May 1, 2022
Collaborator

The RMT and UART outputs use a common method for obtaining the next set of bits to transmit and both use tight hardware timing to achieve consistent output timing.

henrygab · 2022-04-28T07:25:32Z

henrygab
Apr 28, 2022
Author

WS2812 - Actual Timing Requirements

Protocol flexibility

See this excellent four-part series, which describes the REAL timing requirements that WS2812 (e.g., NeoPixels) have.

That series of articles exposed some really useful information, when desiring to create rock-solid controllers. In particular, some of the timing requirements are much looser than they appear in the specification. The only ones that really appear to have tight tolerances are T0H and T1H ... how many nanoseconds the line is driven high. Moreover, each of the WS2812 pixels (due to manufacturing tolerances, heat, wire capacitance, etc.) will have slightly different timing behavior. This suggests the following goals for designing a "solid" controller:

T1H is squarely in the middle of the specified range
T0H is squarely in the middle of the specified range
Trying to have fastest per-pixel time should explicitly NOT be a goal
Going slower (overall) per bit is better than going too fast per bit
Intentionally increasing the T0L/T1L times may help drive really long strands

Real-time requirements driving WS2812 using ESP8266's UART

While T0H and T1H have extremely tight timing requirements, the ESPixelStick firmware uses hardware to ensure the timing for those two are accurately met. In particular, it (ab)uses the UART hardware, such that both T0H and T1H timings are ensured by the UART hardware.

Using a d1 Mini (ESP8266)'s UART to drive WS2812:

ESP8266 and ESP32 UART TX FIFO is 0x80 slots (bytes)
It takes four UART slots to send a single "intensity" value (one byte).
Thus, the 128-byte FIFO buffer means up to 32 intensity bytes can be buffered.
Each bit takes 1250ns, so sending one intensity value takes 10,000ns (10us).

Thus, with a full FIFO buffer used entirely for TX, that allows over 300us of jitter (e.g., caused by other higher-priority ISRs).

10 replies

MartinMueller2003 May 2, 2022
Collaborator

Now, which Pixel protocol di you get to a 3:1 level? 1/4-3/4 protocols look like likely candidates.

henrygab May 3, 2022
Author

Using FastLED notation, this should work for any chip / protocol where T0 == T1 == T2.
Specifically, GW6205 (800ns for 400Khz, 400ns for 800Khz).

Using 7N1 framing, you get the start bit (0), eight user-controlled bits, and the end bit (1). Inverting the output, this gives nine effective bits, with the only restriction that the first bit is a 1, and the last bit is a zero. Nine bit can provide exactly the split required.

From memory, click to expand example mapping tables

This is from memory, so I may have reversed bit orders somewhere, but the concept should be clear:

// Inverted UART output should look similar to:
// (1) X0 1Y0 1Z (0)  // for input bits 0bXYZ

From memory, I think the following (primary) eight mappings:
uint8_t GW6205_to_Mapping [8] = {
    // byte       3-bit         UART frame       inverted frame
    0b1011011, // 0b000 --> (0) 11 011 01 (1) --> 100 100 100
    0b0011011, // 0b001 --> (0) 11 011 00 (1) --> 100 100 110
    0b1010011, // 0b010 --> (0) 11 001 01 (1) --> 100 110 100
    0b0010011, // 0b011 --> (0) 11 001 00 (1) --> 100 110 110
    0b1011010, // 0b100 --> (0) 01 011 01 (1) --> 110 100 100
    0b0011010, // 0b101 --> (0) 01 011 00 (1) --> 110 110 100
    0b1010010, // 0b110 --> (0) 01 001 01 (1) --> 110 100 110
    0b0010010, // 0b111 --> (0) 01 001 00 (1) --> 110 110 100 
};

At the end of the transmission, there may be 1 or 2 additional bits to be sent. In such case, it would take one additional byte from one of the following tables:

uint8_t GW6205_to_SingleRemainingBit[2] = {
    // byte       3-bit         UART frame       inverted frame
    0b1111111, // 0b000 --> (0) 11 111 11 (1) --> 100 000 000
    0b1111110, // 0b001 --> (0) 01 111 11 (1) --> 110 000 000
};
uint8_t GW6205_to_TwoRemainingBit[2] = {
    // byte       3-bit         UART frame       inverted frame
    0b1111011, // 0b000 --> (0) 11 011 11 (1) --> 100 100 000
    0b1110011, // 0b001 --> (0) 11 001 11 (1) --> 100 110 000
    0b1111010, // 0b010 --> (0) 01 011 11 (1) --> 110 100 000
    0b1110010, // 0b011 --> (0) 01 001 11 (1) --> 110 110 000
};

Let me know if that makes sense. That's the maximum ratio for this type of protocol (excluding DMX or serial outputs) that I can get. A 1/4-2/4 protocol has four timeslot slots required, so encoding three bits would require twelve slots. Even if you ignore the fact that no single parity setting would work, the maximum number of bits in the UART frame is 8E2 or 8O2 ... which is still only eleven bits (ten bits without parity). Therefore, there is no way to encode the twelve slots in a single byte of the UART protocol.

In contrast, because the above is a 1/3-2/3 protocol, it requires only three timeslots. Therefore, to send three bits, it takes nine UART bits, which fits neatly into the 7N1 framing.

MartinMueller2003 May 3, 2022
Collaborator

Thanks for the explanation. Darn it. Now I have another protocol to add :)

henrygab May 4, 2022
Author

NOTE: The GW6205 uses 12-bits per pixel, and has a bunch of other special transmission requirements. Getting the 3:1 ratio driver created is a neat trick, but without hardware to validate it, it's hard to claim support for this chipset. Also, I don't know of any current source for this pixel type, so ... 😞

Still, it's a neat trick, and maybe a next-gen LED protocol will again use this nice, consistent timing.

henrygab May 15, 2022
Author

It's possible to use this with more pixel protocols, than the above-mentioned restrictions that all three timeslots are equivalent in the specification (T1 == T2 == T3).

In reality, most pixel protocols allow significantly longer low times than the specification states. Thus, while T1 == T2 is still required to be withing specifications, T3 simply needs to be less than (or equal to) those other two values. Then, the code would ignore that the specification indicates T3 should be shorter, and simply treats T3 "as if" specified to be equal to T1 and T2.

Sadly, I don't know of any additional chipsets, even with the above loosening of the restriction for T3, which fit this timing. Still, there's bound to be a chip like this someday. And testing can be done with an oscilloscope, even in the absence of hardware.

henrygab · 2022-04-28T07:47:17Z

henrygab
Apr 28, 2022
Author

ESP, ISRs, and flash memory

In trying to figure out the project coding standards, I tried to summarize the apparent preference for #define over a const integer as "discouraged".

MartinMueller2003 wrote:

It is not that const int is discouraged, you need to be careful that const vars are not used in an ISR. That means knowing when they are being used which can be a bit confusing and potentially a pitfall.

This post provides some of the gory details, to help understand why it's ... complex ... to use ISRs to get near-deterministic timing, as would be useful for driving WS2812 pixels, for example.

ESP technical restrictions

ESP stores most functions / instructions in the flash, loading instructions to an instruction cache on demand.
As a consequence, when the cache doesn't contain the next instruction, execution stalls while the instruction is loaded from the flash.
The espressif SDK allows ISRs to access code/data from flash by default ... to make development easier at cost of potentially non-deterministic ISR timing.
- ESP_Sprite (Espressif guru) has noted the following:
  
  If you use esp_intr_alloc to set up your interrupt, you normally don't need [IRAM_ATTR]: interrupts allocated using esp_intr_alloc will normally be disabled when flash is not available unless you pass it the ESP_INTR_FLAG_IRAM flag.
Functions (such as ISRs) can be marked with IRAM_ATTR, which ensures the instructions are always in RAM.

Limitations on ISRs placed in IRAM

When there is an ISR that uses the ESP_INTR_FLAG_IRAM flag, it places significant effects / limitations on both code used by the ISR, and any data touched by the ISR.

Code (functions) are, by default, loaded from the flash into the processor's instruction cache on-demand. Therefore, any function called from an ISR be placed in IRAM (e.g., via IRAM_ATTR being on each such function)
const variables may also be stored in flash ... the default is the compiler chooses. Therefore, constant variables (including global constants) must be explicitly marked to store in RAM (e.g., via DRAM_ATTR or DRAM_STR attributes).
Per Espressif, GCC optimizations that automatically generate jump tables or switch/case lookup tables place these tables in flash. By default, IDF builds all files with -fno-jump-tables and -fno-tree-switch-conversion flags to avoid this.
- For ESP32, Arduino in PlatformIO also sets these flags by default
- TODO: Check ESP8266...

Summary

Ensuring deterministic execution of ISRs on the ESP platform requires significant additional thought and work to ensure only RAM (and not FLASH) are accessed, and pitfalls (such as jump tables, switch/case lookup tables, compiler bugs) can make it hard to get right the first few times around.

0 replies

henrygab · 2022-05-05T00:05:17Z

henrygab
May 5, 2022
Author

A "timeslot" view of one-wire pixel protocols

This overview will gloss over some details, such as when chipsets require
specific sequences of bits to be sent before/after all the intensity data,
and focus on the transmission of the intensity values for each LED of
the pixels.

A look at a typical one-wire pixel protocol

With this focus, there are only four numbers needed to define a
typical one-wire pixel protocol: T0H, T0L, T1H, T1L. Most times,
this can be further simplified to three numbers: T1, T2, and T3.
The following chart defines these numbers and their relationships:

Timing	Datasheet	Meaning
`T1`	`T0H`	How many ns with high (1) signal for zero bit
`T2 + T3`	`T0L`	How many ns with low (0) signal for zero bit
`T1 + T2`	`T1H`	How many ns with high (1) signal for one bit
`T3`	`T1L`	How many ns with low (0) signal for one bit

When the values for T1, T2, and T3 are small multiples of a
common factor (e.g., 250ns, 375ns, 375ns share a common factor of 125ns),
then the protocol can be described in terms of timeslots, where
a single timeslot corresponds to that common factor. (e.g., 125ns
per timeslot, with T1=2, T2=3, and T3=3).

Example chipset timings + timeslot view

chipset	T1	T2	T3	Timeslot	Timeslot-based timing	Timeslot Count
SK6812	300ns	300ns	600ns	300ns	1/1/2	4
GS8208	250ns	750ns	250ns	250ns	1/3/1	5
UCS1903	250ns	750ns	250ns	250ns	1/3/1	5
UCS8903	420ns	420ns	420ns	420ns	1/1/1	3
WS2811	250ns	375ns	375ns	125ns	2/3/3	8

Timeslot-based depiction using bit patterns

Using the timeslot view, there are two possibilities for the signal being sent
across the wire, corresponding to sending a zero-bit and a one-bit. This can
be compactly represented by a single bit per timeslot, so that the "timeslot count"
column defines the number of bits required to accurately reproduce the
protocol's timing.

NOTE: for wire transmission, the bits will be listed left-to-right as they
would be output on the GPIO pin. Thus, 01011 would indicate a 0
is sent, then a 1, then a 0, and finally two 1s.

Three-timeslot protocol (UCS8903) example

Single bit	on-wire
`0b0`	`100`
`0b1`	`110`

Three bits	on-wire
'0b000'	`100 100 100`
'0b001'	`100 100 110`
'0b010'	`100 110 100`
'0b011'	`100 110 110`
'0b100'	`110 100 100`
'0b101'	`110 100 110`
'0b110'	`110 110 100`
'0b111'	`110 110 110`

Four-timeslot protocol (SK6812) example

Single bit	on-wire
`0b0`	`1000`
`0b1`	`1100`

Two bits	on-wire
'0b00'	`1000 1000`
'0b01'	`1000 1100`
'0b10'	`1100 1000`
'0b11'	`1100 1100`

Five-timeslot protocol (UCS1903) example

| UCS1903 | 250ns | 750ns | 250ns | 250ns | 1/3/1 | 5 |

Single bit	on-wire
`0b0`	`10000`
`0b1`	`11110`

Two bits	on-wire
'0b00'	`10000 10000`
'0b01'	`10000 11110`
'0b10'	`11110 10000`
'0b11'	`11110 11110`

As one can see, for a 3-timeslot protocol, it's possible to define the output for any three-bit pattern, using only nine (9) bits. Similarly, for 4-timeslot and 5-timeslot protocols, it's possible to define the output for any two-bit pattern, using only eight (8) or ten (10) bits, respectively.

A particularly astute reader would also note that the first on-wire bit for these bit patterns will always be one (1), and the last on-wire bit for such sequences will always be zero (0). This is a key insight to realizing how ESPixelStick uses the UART hardware buffers efficiently, to ensure accurate timing (see later posts).

0 replies

henrygab · 2022-05-05T00:28:03Z

henrygab
May 5, 2022
Author

From timeslot-based protocol to hardware UART output

A little background on UART protocol

The UART protocol sends data in UART frames. A UART frame has the following format:

A single start bit (always high aka 1)
A number of data (5, 6, 7, or 8)
Optionally, a parity bit (N = none, E = even parity, O = odd parity)
One or two stop bits (always low aka 0)

One might see the variables defined in shorthand, with the first char being a digit representing the number of data bits, the second being either N, E, or O (for no parity, even parity, or odd parity), and the last being either a 1 or 2 for the number of stop bits.

For ESPixelStick purposes, parity will always be None.

UART protocol encoding shorthand

Encoding	Meaning	Bits per UART Frame
5N1	5 data bits, no parity, 1 stop bits	7 bits
6N1	6 data bits, no parity, 1 stop bits	8 bits
7N1	7 data bits, no parity, 1 stop bits	9 bits
8N1	8 data bits, no parity, 1 stop bits	10 bits
8N2	8 data bits, no parity, 2 stop bits	11 bits

Example UART frame encodings

The following table shows how a random set of binary data would be encoded into a UART frame. For the UART frame, the start and stop bits will be shown in parenthesis for clarity. The inverted UART frame will also be shown, as the ESP allows the UART output to be inverted by its hardware.

Encoding	binary data	UART Frame	Inverted UART Frame
8N1	`0b1010_0111`	`(0) 1110 0101 (1)`	`(1) 0001 1010 (0)`
7N1	`0b110_0111`	`(0) 1110 011 (1)`	`(1) 0001 100 (0)`
6N1	`0b10_0111`	`(0) 1110 01 (1)`	`(1) 0001 10 (0)`
5N1	`0b1_0111`	`(0) 1110 1 (1)`	`(1) 0001 0 (0)`

As noted in the prior post, various pixel protocols' transmission of a 0 vs. 1 bit can being defined via a small number of fixed-length "timeslots".

The next sections will provide concrete examples of how multi-bit compact tables (for outputting multiple bits of a protocol via a single UART frame) can be constructed for 3-timeslot, 4-timeslot, and 5-timeslot protocols.

4-timeslot Example (SK6812)

Any two bit transmission of a four-timeslot protocol can be defined using eight bits, the first of which is always a 1 and the last of which is always a 0.

NOTE: Wire output is read left-to-right, UART sends least-significant data bit first

Two bits	Target Wire Output	Inverted UART frame	UART Frame	Data Bits
`0b00`	`1000 1000`	`(1) 000 100 (0)`	`(0) 111 011 (1)`	`0b110_111`
`0b01`	`1000 1100`	`(1) 000 110 (0)`	`(0) 111 001 (1)`	`0b100_111`
`0b10`	`1100 1000`	`(1) 100 100 (0)`	`(0) 011 011 (1)`	`0b110_110`
`0b11`	`1100 1100`	`(1) 100 110 (0)`	`(0) 011 001 (1)`	`0b100_110`

The UART frame is thus encoded as 6N1, and the following translation table
allows efficient lookup from two-bit value to a single UART frame to be sent:

uint8_t SK6812_2bit_to_UART_data[4] = {
    0b110_111, // (0) 111 011 (1) == (1) 000  100 (0)== transmit 0b00
    0b100_111, // (0) 111 001 (1) == (1) 000  110 (0)== transmit 0b01
    0b110_110, // (0) 011 011 (1) == (1) 100  100 (0)== transmit 0b10
    0b100_110, // (0) 011 001 (1) == (1) 100  110 (0)== transmit 0b11
};

Sending an entire byte of intensity data then becomes a matter of splitting the intensity byte into four (4x) two-bit chunks, and "sending" the corresponding byte from the lookup table.

5-timeslot Example (UCS1903)

Two bits of a five-timeslot protocol can be defined using ten bits, the first of which is always a 1, and the last of which is always a 0.

NOTE: Wire output is read left-to-right, UART sends least-significant data bit first

Two bits	Target Wire Output	Inverted UART frame	UART Frame	Data Bits
`0b00`	`10000 10000`	`(1) 0000 1000 (0)`	`(0) 1111 0111 (1)`	`0b1110_1111`
`0b01`	`10000 11110`	`(1) 0000 1111 (0)`	`(0) 1111 0000 (1)`	`0b0000_1111`
`0b10`	`11110 10000`	`(1) 1110 1000 (0)`	`(0) 0001 0111 (1)`	`0b1110_1000`
`0b11`	`11110 11110`	`(1) 1110 1111 (0)`	`(0) 0001 0000 (1)`	`0b0000_1000`

The UART frame is thus encoded as 8N1, and the following translation table
allows efficient lookup from two-bit value to a single UART frame to be sent:

uint8_t USC1903_2bit_to_UART_data[4] = {
    0b1110_1111, // (0) 1111 0111 (1) == (1) 0000  1000 (0)== transmit 0b00
    0b0000_1111, // (0) 1111 0000 (1) == (1) 0000  1111 (0)== transmit 0b01
    0b1110_1000, // (0) 0001 0111 (1) == (1) 1000  1000 (0)== transmit 0b10
    0b0000_1000, // (0) 0001 0000 (1) == (1) 1000  1111 (0)== transmit 0b11
};

Sending an entire byte of intensity data then becomes a matter of splitting the intensity byte into four (4x) two-bit chunks, and "sending" the corresponding byte from the lookup table.

3-timeslot Example (UCS8903)

Three bits of a three-timeslot protocol can be defined using nine bits, the first of which is always a 1, and the last of which is always a 0.

NOTE: Wire output is read left-to-right, UART sends least-significant data bit first

Two bits	Target Wire Output	Inverted UART frame	UART Frame	Data Bits
`0b000`	`100 100 100`	`(1) 00 100 10 (0)`	`(0) 11 011 01 (1)`	`0b10_110_11`
`0b001`	`100 100 110`	`(1) 00 100 11 (0)`	`(0) 11 011 00 (1)`	`0b00_110_11`
`0b010`	`100 110 100`	`(1) 00 110 10 (0)`	`(0) 11 001 01 (1)`	`0b10_100_11`
`0b011`	`100 110 110`	`(1) 00 110 11 (0)`	`(0) 11 001 00 (1)`	`0b00_100_11`
`0b100`	`110 100 100`	`(1) 10 100 10 (0)`	`(0) 01 011 01 (1)`	`0b10_110_10`
`0b101`	`110 100 110`	`(1) 10 100 11 (0)`	`(0) 01 011 00 (1)`	`0b00_110_10`
`0b110`	`110 110 100`	`(1) 10 110 10 (0)`	`(0) 01 001 01 (1)`	`0b10_100_10`
`0b111`	`110 110 110`	`(1) 10 110 11 (0)`	`(0) 01 001 00 (1)`	`0b00_100_10`

The UART frame is thus encoded as 7N1, and the following translation table allows efficient lookup from a three-bit value to a single UART frame to be sent:

uint8_t USC8903_3bit_to_UART_data[4] = {
    `0b10_110_11`, // (0) 11 011 01 (1) == (1) 00 110 10 (0) == transmit 0b000
    `0b00_110_11`, // (0) 11 011 00 (1) == (1) 00 110 11 (0) == transmit 0b001
    `0b10_100_11`, // (0) 11 001 01 (1) == (1) 00 100 10 (0) == transmit 0b010
    `0b00_100_11`, // (0) 11 001 00 (1) == (1) 00 100 11 (0) == transmit 0b011
    `0b10_110_10`, // (0) 01 011 01 (1) == (1) 10 110 10 (0) == transmit 0b100
    `0b00_110_10`, // (0) 01 011 00 (1) == (1) 10 110 11 (0) == transmit 0b101
    `0b10_100_10`, // (0) 01 001 01 (1) == (1) 10 100 10 (0) == transmit 0b110
    `0b00_100_10`, // (0) 01 001 00 (1) == (1) 10 100 11 (0) == transmit 0b111
};

Because the number of total intensity bits to be sent is not always divisible by three (e.g., if using RGBW pixels), two additional tables are needed for when there are one or two leftover bits to be sent. These must be encoded with the same 7N1 encoding as the protocol uses for the bulk intensity bits:

Single bit	Target Wire Output	Inverted UART frame	UART Frame	Data Bits
`0b0`	`100 000 000`	`(1) 00 000 00 (0)`	`(0) 11 111 11 (1)`	`0b11_111_11`
`0b1`	`110 000 000`	`(1) 10 000 00 (0)`	`(0) 01 111 11 (1)`	`0b11_111_10`

Two bits	Target Wire Output	Inverted UART frame	UART Frame	Data Bits
`0b00`	`100 100 000`	`(1) 00 100 00 (0)`	`(0) 11 011 11 (1)`	`0b11_110_11`
`0b01`	`100 110 000`	`(1) 00 110 00 (0)`	`(0) 11 001 11 (1)`	`0b11_100_11`
`0b10`	`110 100 000`	`(1) 10 100 00 (0)`	`(0) 01 011 11 (1)`	`0b11_110_10`
`0b11`	`110 110 000`	`(1) 10 110 00 (0)`	`(0) 01 001 11 (1)`	`0b11_100_10`

uint8_t USC8903_3bit_to_UART_single_trailing_bit[4] = {
    `0b11_111_11`, // (0) 11 111 11 (1) == (1) 00 000 00 (0) == transmit 0b0
    `0b11_111_10`, // (0) 01 111 11 (1) == (1) 10 000 00 (0) == transmit 0b1
};
uint8_t USC8903_3bit_to_UART_two_trailing_bits[4] = {
    `0b11_110_11`, // (0) 11 011 11 (1) == (1) 00 100 00 (0) == transmit 0b00
    `0b11_100_11`, // (0) 11 001 11 (1) == (1) 00 110 00 (0) == transmit 0b01
    `0b11_110_10`, // (0) 01 011 11 (1) == (1) 10 100 00 (0) == transmit 0b10
    `0b11_100_10`, // (0) 01 001 11 (1) == (1) 10 110 00 (0) == transmit 0b11
};

When the number of bytes of intensity data to be transmitted is known ahead-of time, it can be pre-computed and stored for use in the ISR. This adds a subtraction, comparison, and conditional jump to the ISR. For example, the jump may be based on whether at least three more bytes (24 bits) of intensity data are to be sent.

If yes, sending those three bytes would require eight (8) UART frames.

When only 1 or 2 bytes remained, then a separate code path is followed, which sends all but the last 1 or 2 bits of data using the normal translation tables. The final 1-2 bits of data instead use the corresponding "trailing bit" table.

Summary

By using the above methods, the length of all high values is enforced by hardware, and the minimum length of the trailing low values is also enforced by hardware. There is strong evidence that the trailing low bits can be significantly extended (e.g., doubling T3 generally continues to work fine). Therefore, all the time-critical aspects of the signal are fully enforced by the UART hardware.

This is one basis for the rock-solid performance ESPixelStick provides.

0 replies

henrygab · 2022-05-15T21:58:29Z

henrygab
May 15, 2022
Author

RMT output from pixel protocol timing

This post will be glossing over implementation details, because
the mapping from a pixel protocol to RMT control entries is so
straightforward.

Quick summary of RMT peripheral

The ESP has a remote control module (RMT). The TRM states:

The RMT (Remote Control) module is primarily designed
to send and receive infrared remote control signals that
implement on-off keying in a carrier frequency

In layman's terms, the RMT allows hardware-generated signals of
essentially arbitrary length (both high and low). When programming
the hardware, each 32-bit entry is actually two 16-bit settings.
Each setting has one bit to indicate if the output should be high
or low, and the remaining 15-bits indicate how many RMT clock
cycles it should hold that output for.

Although typically the bit is high for one setting, and low for
the second setting, in any given entry, the hardware does not
require that they differ. This is important, since many pixel
protocols require long periods of time with the output driven low
at the end of transmission.

RMT specific capabilities

The RMT has eight (8x) output channels, and each output channel has a set
of sixty-four (64x) 32-bit slots of dedicated(*) RAM.

Each of those 32-bit slots stores two 16-bit entries, each entry indicating:

A single bit to indicate output as high or low
fifteen bits to indicate how many RMT clocks to hold that output

Because the RMT peripheral was designed for transmitting sequences of
on/off, with varying timing per symbol, mapping a pixel protocol bit
to a 32-bit RMT control entry is clean and simple. In other words,
the RMT peripheral provides a simple, direct means for translating
T1H / T1L, or T0H / T0L as found in a datasheet directly into
a single RMT control entry (one slot for high time, one slot for low time).

When controlling all eight output channels, the RMT provides each
output channel a hardware buffer sufficient for 64 intensity bits,
with all critical timing enforced in hardware.

The RMT peripheral can be configured to continuously loop through its
assigned RAM entries, and can be configured fire an interrupt when N
of those slots have been transmitted. This allows batching updates,
rather than firing an interrupt for each entry.

(*) "dedicated" ... but some configurability

(*) When using fewer than eight output channels, it becomes
possible to re-allocate the unused RAM buffers (in 64-entry
chunks) to other RMT channels. For example, if only using
four output channels, all four channels can have 128
slots dedicated to its use.

As another example, if only using RMT channels 0, 2, and 6,
then they could be configured as follows:

Channel 0 configured to use 128 slots (from channels [0,1])
Channel 2 configured to use 256 slots (from channels [2,3,4,5])
Channel 6 configured to use 128 slots (from channels [6,7])

ESPixelstick firmware does not currently modify the
allocation of slots to each RMT channel, so can buffer up
to 64 protocol bits.

Example chipset timings where RMT is well-suited

Here are some chipset timings that would be difficult to convert into
"timeslots", at least with a small multiplier. Configuring the RMT to
output these protocols, however, remains easy.

Expand for table and example RMT mappings

chipset	`T0H`	`T0L`	`T1H`	`T1L`	`Tbit`
PL9823	350ns	1360ns	1360ns	350ns	1710ns
SK6822	375ns	1375ns	1375ns	375ns	1750ns
SM16703	300ns	900ns	900ns	300ns	1200ns
TM1809	350ns	800ns	700ns	450ns	1150ns
TM1829 @ 1600	100ns	500ns	400ns	200ns	600ns
UCS1903B	400ns	900ns	850ns	450ns	1300ns
USC1904	400ns	850ns	800ns	450ns	1250ns
WS2812	250ns	1000ns	875ns	376ns	1250ns

// In practice, the APB_CLK is typically 80MHz (12.5ns per clock), so
// the datasheet values could be off by 6.25ns when encoded ... which
// is a very tight tolerance for pixel protocols!
//
// However, for ease of illustration and comparison with the above
// table, these examples will use a fictional RMT clock of 1GHz (1ns).
rmt_item32_t RMT_PL9823_BIT[2] =   {
    {  .duration0 =  350, .level0 =    1,  .duration1 = 1360, .level1 =    0 }, // 0-bit
    {  .duration0 = 1360, .level0 =    1,  .duration1 =  350, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_SK6822_BIT[2] =   {
    {  .duration0 =  375, .level0 =    1,  .duration1 = 1375, .level1 =    0 }, // 0-bit
    {  .duration0 = 1375, .level0 =    1,  .duration1 =  375, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_SM16703_BIT[2] =  {
    {  .duration0 =  300, .level0 =    1,  .duration1 =  900, .level1 =    0 }, // 0-bit
    {  .duration0 =  900, .level0 =    1,  .duration1 =  300, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_TM1809_BIT[2] =   {
    {  .duration0 =  350, .level0 =    1,  .duration1 =  800, .level1 =    0 }, // 0-bit
    {  .duration0 =  800, .level0 =    1,  .duration1 =  350, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_TM1829_BIT[2] =   { // No simple T1/T2/T3 conversion
    {  .duration0 =  100, .level0 =    1,  .duration1 =  500, .level1 =    0 }, // 0-bit
    {  .duration0 =  400, .level0 =    1,  .duration1 =  200, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_UCS1903B_BIT[2] = { // No simple T1/T2/T3 conversion
    {  .duration0 =  400, .level0 =    1,  .duration1 =  900, .level1 =    0 }, // 0-bit
    {  .duration0 =  850, .level0 =    1,  .duration1 =  450, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_USC1904_BIT[2] =  { // No simple T1/T2/T3 conversion
    {  .duration0 =  400, .level0 =    1,  .duration1 =  850, .level1 =    0 }, // 0-bit
    {  .duration0 =  800, .level0 =    1,  .duration1 =  450, .level1 =    0 }  // 1-bit
};
rmt_item32_t RMT_WS2812_BIT[2] =   { // No simple T1/T2/T3 conversion
    {  .duration0 =  250, .level0 =    1,  .duration1 = 1000, .level1 =    0 }, // 0-bit
    {  .duration0 =  875, .level0 =    1,  .duration1 =  375, .level1 =    0 }  // 1-bit
};

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical Deep Dives #521

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Technical Deep Dives #521

henrygab Apr 28, 2022

Replies: 6 comments · 11 replies

henrygab Apr 28, 2022 Author

Architectures used to drive WS2812 and other one-wire pixels

Option 1: bit-banging

Option 2: Pre-convert the data, then transmit in one DMA operation

Option 3: Real-time conversion of small chunks, with multiple hardware operations

ESPixelStick

MartinMueller2003 May 1, 2022 Collaborator

henrygab Apr 28, 2022 Author

WS2812 - Actual Timing Requirements

Protocol flexibility

Real-time requirements driving WS2812 using ESP8266's UART

MartinMueller2003 May 2, 2022 Collaborator

henrygab May 3, 2022 Author

MartinMueller2003 May 3, 2022 Collaborator

henrygab May 4, 2022 Author

henrygab May 15, 2022 Author

henrygab Apr 28, 2022 Author

ESP, ISRs, and flash memory

ESP technical restrictions

Limitations on ISRs placed in IRAM

Summary

henrygab May 5, 2022 Author

A "timeslot" view of one-wire pixel protocols

A look at a typical one-wire pixel protocol

Example chipset timings + timeslot view

Timeslot-based depiction using bit patterns

henrygab May 5, 2022 Author

From timeslot-based protocol to hardware UART output

A little background on UART protocol

Summary

henrygab May 15, 2022 Author

RMT output from pixel protocol timing

Quick summary of RMT peripheral

RMT specific capabilities

Example chipset timings where RMT is well-suited

henrygab
Apr 28, 2022

Replies: 6 comments 11 replies

henrygab
Apr 28, 2022
Author

MartinMueller2003 May 1, 2022
Collaborator

henrygab
Apr 28, 2022
Author

MartinMueller2003 May 2, 2022
Collaborator

henrygab May 3, 2022
Author

MartinMueller2003 May 3, 2022
Collaborator

henrygab May 4, 2022
Author

henrygab May 15, 2022
Author

henrygab
Apr 28, 2022
Author

henrygab
May 5, 2022
Author

henrygab
May 5, 2022
Author

henrygab
May 15, 2022
Author