- Brief Description
- Getting Started
- Lint and Formal Verification
- Simulation
- Demo Projects
- Other Open-Sourced DDR3 Controllers
- Developer Documentation
This DDR3 controller was originally designed to be used on the 10-Gigabit Ethernet Project for an 8-lane x8 DDR3 module running at 800 MHz DDR, but this is now being designed to be a more general DDR3 memory controller with multiple supported FPGA boards. This is a 4:1 memory controller with configurable timing parameters and mode registers so it can be configured to any DDR3 memory device. The user-interface is the basic Wishbone.
This memory controller is optimized to maintain a high data throughput and continuous sequential burst operations. The controller handles the reset sequence, refresh sequence, mode register configuration, bank status tracking, timing delay tracking, command issuing, and the PHY's calibration. The PHY's calibration handles the bitslip training, read-DQ/DQS alignment via MPR (read calibration), write-DQ/DQS alignment via write leveling (write calibration), and also an optional comprehensive read/write test.
The optional comprehensive read/write tests made it easier to test the memory controller without needing an external CPU. These tests include a burst access, random access, and alternating read-write access tests. Only if no error is found on these tests will the calibration end and user can start accessing the wishbone interface.
This design is formally verified and simulated using the Micron DDR3 model.
The recommended way to instantiate this IP is to use the top module rtl/ddr3_top.v
, a template for instantiation is also included in that file. Steps to include this DDR3 memory controller IP is to instantiate design, create the constraint file, then edit the localparams.
The first thing to edit are the top-level parameters:
Parameter | Function |
---|---|
CONTROLLER_CLK_PERIOD | clock period of the controller interface in picoseconds. Tested values range from 12_000 ps (83.33 MHz) to 10_000 ps (100 MHz). |
DDR3_CLK_PERIOD | clock period of the DDR3 RAM device in picoseconds which must be 1/4 of the CONTROLLER_CLK_PERIOD . Tested values range from 3_000 ps (333.33 MHz) to 2_500 ps (400 MHz). |
ROW_BITS | width of row address. Use chapter 2.11 DDR3 SDRAM Addressing from JEDEC DDR3 doc (page 15) as a guide. Possible values range from 12 to 16 . |
COL_BITS | width of column address. Use chapter 2.11 DDR3 SDRAM Addressing from JEDEC DDR3 doc (page 15) as a guide. Possible values range from 10 to 12 . |
BA_BITS | width of bank address. Use chapter 2.11 DDR3 SDRAM Addressing from JEDEC DDR3 doc (page 15) as a guide. Usual value is 3 . |
BYTE_LANES | number of bytes based on width of DQ. [1] |
AUX_WIDTH | width of auxiliary line. Value must be >= 4. [2] |
WB2_ADDR_BITS | width of 2nd wishbone address bus for debugging (only relevant if SECOND_WISHBONE = 1). |
WB2_DATA_BITS | width of 2nd wishbone data bus for debugging (only relevant if SECOND_WISHBONE = 1). |
MICRON_SIM | set to 1 if used in Micron DDR3 model to shorten power-on sequence, otherwise 0. |
ODELAY_SUPPORTED | set to 1 if ODELAYE2 primitive is supported by the FPGA, otherwise 0. [3] |
SECOND_WISHBONE | set to 1 if 2nd wishbone for debugging is needed , otherwise 0. |
After the parameters, connect the ports of the top module to your design. Below are the ports for clocks and reset:
Ports | Function |
---|---|
i_controller_clk | clock of the controller interface with period of CONTROLLER_CLK_PERIOD |
i_ddr3_clk | clock of the DDR3 interface with period of DDR3_CLK_PERIOD |
i_ref_clk | reference clock for IDELAYCTRL primitive with frequency of 200 MHz |
i_ddr3_clk_90 | clock required only if ODELAY_SUPPORTED = 0, otherwise can be left unconnected. Has a period of DDR3_CLK_PERIOD with 90° phase shift. |
i_rst_n | Active-low synchronous reset for the entire DDR3 controller and PHY |
It is recommended to generate all these clocks from a single PLL or clock-generator.
Next are the main wishbone ports:
Ports | Function |
---|---|
i_wb_cyc | Indicates if a bus cycle is active. A high value (1) signifies normal operation, while a low value (0) signals the cancellation of all ongoing transactions. |
i_wb_stb | Strobe or transfer request signal. It's asserted (set to 1) to request a data transfer. |
i_wb_we | Write-enable signal. A high value (1) indicates a write operation, and a low value (0) indicates a read operation. |
i_wb_addr | Address bus. Used to specify the address for the current read or write operation. Formatted as {row, bank, column}. |
i_wb_data | Data bus for write operations. In a 4:1 controller, the data width is 8 times the DDR3 pins 8 xDQ_BITS xLANES . |
i_wb_sel | Byte select for write operations. Indicates which bytes of the data bus are to be overwritten for the write operation. |
o_wb_stall | Indicates if the controller is busy (1)and cannot accept any new requests. |
o_wb_ack | Acknowledgement signal. Indicates that a read or write request has been completed. |
o_wb_data | Data bus for read operations. Similar to i_wb_data , the data width for a 4:1 controller is 8 times the DDR3 pins 8 xDQ_BITS xLANES . |
Below are the auxiliary ports associated with the main wishbone. This is not required for normal operation, but is intended for AXI-interface compatibility which is not yet available:
Ports | Function |
---|---|
i_aux | Request ID line with width of AUX_WIDTH . The Request ID is retrieved simultaneously with the strobe request. |
o_aux | Request ID line with width of AUX_WIDTH . The Request ID is sent back concurrently with the acknowledgement signal. |
After main wishbone port are the second-wishbone ports. This interface is only for debugging-purposes and would normally not be needed thus can be left unconnected by setting SECOND_WISHBONE
= 0. The ports for the second-wishbone is very much the same as the main wishbone.
Next are the DDR3 I/O ports, these will be connected directly to the top-level pins of your design thus port-names must match what is indicated on your constraint file. You do not need to understand what each DDR3 I/O ports does but if you're curious, details on each DDR3 I/O pins are described on 2.10 Pinout Description from JEDEC DDR3 doc (page 13).
Finally are the debug ports, these are connected to relevant registers containing information on current state of the controller. Trace each o_debug_*
inside ddr3_controller.v
to edit the registers to be monitored.
-
One example of constraint file is from the Kintex-7 Ethernet Switch Project [4] , highlighted are all the DDR3 pins. This constraint file assumes a dual-rank DDR3 RAM (thus 2 pairs of
o_ddr3_clk
,o_ddr3_cke
,o_ddr3_s_n
, ando_ddr3_odt
) with 8 lanes of x8 DDR3 (thus 8o_ddr3_dm
, 8io_ddr3_dqs
, and 64io_ddr3_dq
). The constraint file also has set_property required for proper operation. The propertyINTERNAL_VREF
must be set to half of the bank voltage (1.5V thus set to0.75
). The propertyBITSTREAM.STARTUP.MATCH_CYCLE
(page 240 of UG628: Command Line Guide) is verified to work properly when value is set to6
. Kintex-7 has HP bank where the DDR3 is connected thus allow the use of DCI (Digitally-Controlled Impedance) for impedance matching by usingSSTL15_T_DCI
type ofIOSTANDARD
. -
Another example of constraint file is for the Arty-S7 project, highlighted are the DDR3 pins. The Arty-S7 has x16 DDR3 and it works like two x8 (thus 2
ddr3_dm
, 2ddr3_dqs
, and 16io_ddr3_dq
) [1] . Arty-S7 only has HR bank where the DDR3 is connected, this restricts the design to use on-chip split-termination (UG471 7-Series Select Guide page 33) for impedance matching instead of DCI used in HP banks.IN_TERM UNTUNED_SPLIT_50
signifies that the input termination is set to an untuned split termination of 50 ohms. The constraint file was easily created by retrieving the pin constraints generated by the Vivado Memory Interface Generator (MIG) together with the.prj
file provided by Digilent for Arty-S7. The generated.xdc
file by the MIG can be located at[vivado_proj].gen/sources_1/ip/mig_7series_0/mig_7series_0/user_design/constraints/mig_7series_0.xdc
The verilog file rtl/ddr3_controller
contains the timing parameters that needs to be configured by the user to align with the DDR3 device. User should base the timing values on Chapter 13 Electrical Characteristics and AC Timing from JEDEC DDR3 doc (page 169). The default values on the verilog file should generally work for DDR3-800.
[1]: For x16 DDR3 like in Arty S7, use BYTE_LANES
of 2. If the memory configuration is a SO-DIMM with 8 DDR3 RAM modules, each being x8 to form a total of 64 bits of data, then BYTE_LANES would be 8.
[2]: The auxiliary line is intended for AXI-interface compatibility but is also utilized in the reset sequence, which is the origin of the minimum required width of 4.
[3]: ODELAYE2 is supported if DDR3 device is connected to an HP (High-Powered) bank of FPGA. HR (High-Rank) bank does not support ODELAYE2 as based on UG471 7-Series Select Guide (page 134).
[4]: This is the open-sourced 10Gb Ethernet Project.
The easiest way to compile, lint, and formally verify the design is to run ./run_compile.sh
on the top-level directory. This will first run Verilator lint.
Next is compilation with Yosys, this will show warnings:
Warning: Replacing memory ... with list of registers.
Disregards this kind of warning as it just converts small memory elements in the design into a series of register elements.
After Yosys compilation is Icarus Verilog compilation, this should not show any warning or errors but will display the Test Functions
to verify that the verilog-functions return the correct values, and Controller Parameters
to verify the top-level parameters are set properly. Delay values for some timing parameters are also shown.
Last is the Symbiyosys Formal Verification, this will run the single and multiple configuration sby for formal verification. A summary is shown at the end where all tasks passed:
For simulation, the DDR3 SDRAM Verilog Model from Micron is used. Import all simulation files under ./testbench to Vivado. ddr3_dimm_micron_sim.sv
is the top-level module which instantiates both the DDR3 memory controller and the Micron DDR3 model. This module issues read and write requests to the controller via the wishbone bus, then the returned data from read requests are verified if it matches the data written. Both sequential and random accesses are tested.
Currently, there are 2 general options for running the simulation and is defined by a define
directive on the ddr3_dimm_micron_sim.sv
file: TWO_LANES_x8
and EIGHT_LANES_x8
. TWO_LANES_x8
simulates an Arty-S7 FPGA board which has an x16 DDR3, meanwhile EIGHT_LANES_x8
simulates 8-lanes of x8 DDR3 module. Make sure to change the organization via a define
directive under ddr3.sv (TWO_LANES_x8
must use define x8
while EIGHT_LANES_x8
must use define x16
).
After configuring, run simulation. The ddr3_dimm_micron_sim_behav.wcfg
contains the waveform. Shown below are the clocks:
As shown below, command_used
shows the command issued at a specific time. During reads the dqs
should toggle and dq
should have a valid value, else they must be in high-impedance Z
. Precharge and activate also happens between reads when row addresses are different.
A part of internal test is to do alternate write then read consecutively as shown below. The data written must match the data read. dqs
should also toggle along with the data written and read.
There are counters for the number of correct and wrong read data during the internal read/write test: correct_read_data
and wrong_read_data
. As shown below, the wrong_read_data
must remain zero while correct_read_data
must increment until it reaches the maximum (3499 on this example).
The simulation also reports the status of the simulation. For example, the report below:
[10000 ps] RD @ (0, 840) -> [10000 ps] RD @ (0, 848) -> [10000 ps] RD @ (0, 856) -> [10000 ps] RD @ (0, 864) -> [10000 ps] RD @ (0, 872) ->
The format is [time_delay
] command
@ (bank
, address
), so [10000 ps] RD @ (0, 840)
means 10000 ps delay before a read command with bank 0 and address 840. Notice how each read command has a delay of 10000 ps or 10 ns from each other, since this has a controller clock of 100 MHz (10 ns clock period) this shows that there are no interruptions between sequential read commands resulting in a very high throughput.
A short report is also shown in each test section:
DONE TEST 1: LAST ROW
Number of Operations: 2304
Time Started: 363390 ns
Time Done: 387980 ns
Average Rate: 10 ns/request
This report is after a burst write then burst read. This report means there were 2304 write and read operation, and the average time per request is 10 ns (1 controller clock period of 100 MHz). The average rate is optimal since this is a burst write and read. But for random writes and reads:
DONE TEST 2: RANDOM
Number of Operations: 2304
Time Started: 387980 ns
Time Done: 497660 ns
Average Rate: 47 ns/request
Notice how the average rate increased to 47 ns/request. Random access requires occasional precharge and activate which takes time and thus prolong the time for every read or write access. At the very end of the report shows a summary:
TEST CALIBRATION
[-]: write_test_address_counter = 5000
[-]: read_test_address_counter = 2000
[-]: correct_read_data = 3499
[-]: wrong_read_data = 0
------- SUMMARY -------
Number of Writes = 4608
Number of Reads = 4608
Number of Success = 4604
Number of Fails = 4
Number of Injected Errors = 4
The summary under TEST CALIBRATION
are the results from the internal read/write test as part of the internal calibration. These are the same counters on the waveform shown before where the wrong_read_data
should be zero. Under SUMMARY
is the report from the external read/write test where the top-level simulation file ddr3_dimm_micron_sim.sv
sends read/write request to the DDR3 controller via the wishbone bus. Notice that the number of fails (4) matches the number of injected errors (4) which is only proper.
-
The Arty-S7 demo project is a basic project for testing the DDR3 controller. The gist is that the 4 LEDS should light-up which means reset sequence is done and all internal read/write test passed during calibration. This project also uses a UART line, sending small letters via UART will write those corresponding small letters to memory, meanwhile sending capital letters will read those small letters back from memory.
- To run this project on your Arty-S7 board, import all verilog files and xdc file under
example_demo/arty_s7/
andrtl/
. Run synthesis-to-bitstream generation then upload the bitfile. After around 2 seconds, the 4 LEDS should light up then you can start interacting with the UART line. BTN0 button is for reset. - Or just upload the bitfile already given in the repo.
- To run this project on your Arty-S7 board, import all verilog files and xdc file under
-
The Nexys Video demo project utilizes the DDR3 chip on the Digilent Nexys Video board with xc7a200t. Only one lane is used for simplicity. Supports OpenXC7 toolchain. Makefiles have been included for quick start, just run the following command in the root directory of repo:
- Vivado compilation:
source /opt/Xilinx/Vivado/2019.1/settings64.sh
thenmake -C example_demo/nexys_video -f Makefile.vivado
- OpenXC7 compilation (using toolchain in Docker):
docker run --rm -v .:/mnt -v /chipdb:/chipdb regymm/openxc7 make -C /mnt/nexys_video -f Makefile.openxc7
The bitstream will be compiled as
nexys_video/build/top.bit
.- Board test: after programming bitstream, the 8 LEDs will show some pattern, then become all lit up after calibration. When pressing BTND(D22), LD7/LD6 will show a blinky, and LD5-LD0 will show 101110 after successful calibration. BTNC(B22) resets the controller, and calibration will be redone. 9600 baud UART will be the same as the Arty-S7 case: type small
abcd
to write to memory, and type capitalABCD
to read back. For example, typeabcd
thenABCDEFGH
will showabcd����
(because EFGH memory locations are not written yet).
- Vivado compilation:
-
The QMTech Wukong demo project is just the same as the arty-s7 demo mentioned above.
- To run this project on your QMTech Wukong board, import all verilog files and xdc file under
example_demo/qmtech_wukong/
andrtl/
. Run synthesis-to-bitstream generation then upload the bitfile. After around 2 seconds, the 2 LEDS should light up then you can start interacting with the UART line. SW2 button is for reset. - Or just upload the bitfile already given in the repo.
- To run this project on your QMTech Wukong board, import all verilog files and xdc file under
-
The 10Gb Ethernet Switch project utilizes this DDR3 controller for accessing a single-rank DDR3 module (8 lanes of x8 DDR3) at DDR3-800 (100 MHz controller and 400 MHz PHY).
(soon...)
There is no developer documentation yet. But may I include here the notes I compiled when I did an intensive study on DDR3 before I started this project.
This project is funded through NGI0 Entrust, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.