diff --git a/master/.buildinfo b/master/.buildinfo index 4202515..60a1c94 100644 --- a/master/.buildinfo +++ b/master/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 6ed152ca14bbff66d5bd76982b97c0dc +config: 3e6a844c07a3927961e313365b2069b5 tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/master/.doctrees/VexiiRiscv/BranchPrediction/index.doctree b/master/.doctrees/VexiiRiscv/BranchPrediction/index.doctree index 19401d6..93baaa2 100644 Binary files a/master/.doctrees/VexiiRiscv/BranchPrediction/index.doctree and b/master/.doctrees/VexiiRiscv/BranchPrediction/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Debug/index.doctree b/master/.doctrees/VexiiRiscv/Debug/index.doctree index ca5ee46..f1744c5 100644 Binary files a/master/.doctrees/VexiiRiscv/Debug/index.doctree and b/master/.doctrees/VexiiRiscv/Debug/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Debug/jtag.doctree b/master/.doctrees/VexiiRiscv/Debug/jtag.doctree index d0f63c7..92b620e 100644 Binary files a/master/.doctrees/VexiiRiscv/Debug/jtag.doctree and b/master/.doctrees/VexiiRiscv/Debug/jtag.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Decode/index.doctree b/master/.doctrees/VexiiRiscv/Decode/index.doctree index f19f4d8..bf8825e 100644 Binary files a/master/.doctrees/VexiiRiscv/Decode/index.doctree and b/master/.doctrees/VexiiRiscv/Decode/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/custom.doctree b/master/.doctrees/VexiiRiscv/Execute/custom.doctree index bc92834..2400433 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/custom.doctree and b/master/.doctrees/VexiiRiscv/Execute/custom.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/fpu.doctree b/master/.doctrees/VexiiRiscv/Execute/fpu.doctree index 563909e..b62ec73 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/fpu.doctree and b/master/.doctrees/VexiiRiscv/Execute/fpu.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/index.doctree b/master/.doctrees/VexiiRiscv/Execute/index.doctree index 9b9199d..c45f592 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/index.doctree and b/master/.doctrees/VexiiRiscv/Execute/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/introduction.doctree b/master/.doctrees/VexiiRiscv/Execute/introduction.doctree index 51da92d..073c2d2 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/introduction.doctree and b/master/.doctrees/VexiiRiscv/Execute/introduction.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/lsu.doctree b/master/.doctrees/VexiiRiscv/Execute/lsu.doctree index 2f61d1e..f562849 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/lsu.doctree and b/master/.doctrees/VexiiRiscv/Execute/lsu.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Execute/plugins.doctree b/master/.doctrees/VexiiRiscv/Execute/plugins.doctree index 96a52a0..4cee719 100644 Binary files a/master/.doctrees/VexiiRiscv/Execute/plugins.doctree and b/master/.doctrees/VexiiRiscv/Execute/plugins.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Fetch/index.doctree b/master/.doctrees/VexiiRiscv/Fetch/index.doctree index a7cdf9b..fd9346b 100644 Binary files a/master/.doctrees/VexiiRiscv/Fetch/index.doctree and b/master/.doctrees/VexiiRiscv/Fetch/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Framework/index.doctree b/master/.doctrees/VexiiRiscv/Framework/index.doctree index ad183b6..c8dcfe3 100644 Binary files a/master/.doctrees/VexiiRiscv/Framework/index.doctree and b/master/.doctrees/VexiiRiscv/Framework/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/HowToUse/index.doctree b/master/.doctrees/VexiiRiscv/HowToUse/index.doctree index 5aa8e5f..c637618 100644 Binary files a/master/.doctrees/VexiiRiscv/HowToUse/index.doctree and b/master/.doctrees/VexiiRiscv/HowToUse/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Introduction/index.doctree b/master/.doctrees/VexiiRiscv/Introduction/index.doctree index 363d55a..76e0705 100644 Binary files a/master/.doctrees/VexiiRiscv/Introduction/index.doctree and b/master/.doctrees/VexiiRiscv/Introduction/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Performance/index.doctree b/master/.doctrees/VexiiRiscv/Performance/index.doctree index d790741..7101251 100644 Binary files a/master/.doctrees/VexiiRiscv/Performance/index.doctree and b/master/.doctrees/VexiiRiscv/Performance/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Soc/index.doctree b/master/.doctrees/VexiiRiscv/Soc/index.doctree index a88bae4..57f131e 100644 Binary files a/master/.doctrees/VexiiRiscv/Soc/index.doctree and b/master/.doctrees/VexiiRiscv/Soc/index.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Soc/litex.doctree b/master/.doctrees/VexiiRiscv/Soc/litex.doctree index 7c95a0a..da696bc 100644 Binary files a/master/.doctrees/VexiiRiscv/Soc/litex.doctree and b/master/.doctrees/VexiiRiscv/Soc/litex.doctree differ diff --git a/master/.doctrees/VexiiRiscv/Soc/microsoc.doctree b/master/.doctrees/VexiiRiscv/Soc/microsoc.doctree index 64bafe6..35942bd 100644 Binary files a/master/.doctrees/VexiiRiscv/Soc/microsoc.doctree and b/master/.doctrees/VexiiRiscv/Soc/microsoc.doctree differ diff --git a/master/.doctrees/environment.pickle b/master/.doctrees/environment.pickle index b41ba14..7fc23bc 100644 Binary files a/master/.doctrees/environment.pickle and b/master/.doctrees/environment.pickle differ diff --git a/master/.doctrees/index.doctree b/master/.doctrees/index.doctree index e611c0b..9295765 100644 Binary files a/master/.doctrees/index.doctree and b/master/.doctrees/index.doctree differ diff --git a/master/VexiiRiscv/BranchPrediction/index.html b/master/VexiiRiscv/BranchPrediction/index.html index edb765b..36d1cb3 100644 --- a/master/VexiiRiscv/BranchPrediction/index.html +++ b/master/VexiiRiscv/BranchPrediction/index.html @@ -87,7 +87,7 @@
During fetch, a BTB, GShare, RAS memory is used to provide an early branch prediction (BtbPlugin / GSharePlugin)
In Decode, the DecodePredictionPlugin will ensure that no “none jump/branch instruction”” predicted as a jump/branch continues down the pipeline.
In Execute, the prediction made is checked and eventualy corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.
In Execute, the prediction made is checked and eventually corrected. Also a stream of data is generated to feed the BTB / GShare memories with good data to learn.
Here is a diagram of the whole architecture :
@@ -266,11 +266,11 @@- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Debug/index.html b/master/VexiiRiscv/Debug/index.html index e36466a..4a46418 100644 --- a/master/VexiiRiscv/Debug/index.html +++ b/master/VexiiRiscv/Debug/index.html @@ -87,7 +87,7 @@Decode the words froms the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.
+Decode the words from the fetch pipeline into aligned instructions in the decode pipeline. Its complexity mostly come from the necessity to support having RVC [and BTB], mostly by adding additional cases to handle.
RVC allows 32 bits instruction to be unaligned, meaning they can cross between 2 fetched words, so it need to have some internal buffer / states to work.
The BTB may have predicted (falsly) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.
The BTB may have predicted (falsely) a jump instruction where there is none, which may cut the fetch of an 32 bits instruction in the middle.
The AlignerPlugin is designed as following :
Has a internal fetch word buffer in oder to support 32 bits instruction with RVC
First it scan at every possible instruction position, ex : RVC with 64 bits fetch words => 2x64/16 scanners. Extracting the instruction length, presence of all the instruction data (slices) and necessity to redo the fetch because of a bad BTB prediction.
Then it has one extractor per decoding lane. They will check the scanner for the firsts valid instructions.
Then each extractor is feeded into the decoder pipeline.
Then each extractor is fed into the decoder pipeline.
Decode instruction
Generate ilegal instruction exception
Generate illegal instruction exception
Generate “interrupt” instruction
A execute lane represent a path toward which an instruction can be executed.
A execute lane can have one or many layers, which can be used to implement things as early ALU / late ALU
Each layer will have static a scheduling priority
The DispatchPlugin doesn’t require lanes or layers to be symetric in any way.
+The DispatchPlugin doesn’t require lanes or layers to be symmetric in any way.
@@ -340,7 +340,7 @@- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Execute/custom.html b/master/VexiiRiscv/Execute/custom.html index 79b5212..2611434 100644 --- a/master/VexiiRiscv/Execute/custom.html +++ b/master/VexiiRiscv/Execute/custom.html @@ -87,7 +87,7 @@object VexiiSimdAddSim extends App{
+object VexiiSimdAddSim extends App {
val param = new ParamSimple()
val testOpt = new TestOptions()
@@ -495,7 +495,7 @@ Conclusion
- Version: master git~867f372 2024-09-05
+ Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Execute/fpu.html b/master/VexiiRiscv/Execute/fpu.html
index ca97dbe..698f69a 100644
--- a/master/VexiiRiscv/Execute/fpu.html
+++ b/master/VexiiRiscv/Execute/fpu.html
@@ -87,7 +87,7 @@
Scala / SpinalHDL
Plugin
Database
@@ -251,24 +251,24 @@
FPU
-The VexiiRiscv FPU has the following caracteristics :
+The VexiiRiscv FPU has the following characteristics :
By default, It is fully compliant with the IEEE-754 spec (subnormal, rounding, exception flags, ..)
There is options to reduce its footprint at the cost of compliance (reduced FMA accuracy, and drop subnormal support)
It isn’t a single chunky module, instead it is composed of many plugins in the same ways than the rest of the CPU.
-It is thightly coupled to the execute pipeline
+It is tightly coupled to the execute pipeline
All operations can be issued at the rate of 1 instruction per cycle, excepted for FDIV/FSQRT/Subnormals
By default, it is deeply pipelined to help with FPGA timings (10 stages FMA)
-Multiple hardware ressources are sharred between multiple instruction (ex rounding, adder (FMA+FADD)
-The VexiiRiscv scheduler take care to not schedule an instruction which would use the same ressource than an older instruction
+Multiple hardware resources are shared between multiple instruction (ex rounding, adder (FMA+FADD)
+The VexiiRiscv scheduler take care to not schedule an instruction which would use the same resource than an older instruction
FDIV and FMUL reuse the integer pipeline DIV and MUL hardware
Subnormal numbers are handled by recoding/encoding them on operands and results of math instructions. This will trigger some little state machines which will halt the CPU a few cycles (2-3 cycles)
Plugins architecture
-There is a few fundation plugins that compose the FPU :
+There is a few foundation plugins that compose the FPU :
-FpuUnpackPlugin : Will decode the RS1/2/3 operands (isZero, isInfinit, ..) aswell as recode them in a floating point format which simplify subnormals into regular floating point values
+FpuUnpackPlugin : Will decode the RS1/2/3 operands (isZero, isInfinity, ..) as well as recode them in a floating point format which simplify subnormals into regular floating point values
FpuPackPlugin : Will apply rounding to floating point results, recode them into IEEE-754 (including subnormal) before sending those to the WriteBackPlugin(float)
WriteBackPlugin(float) : Allows to write values back to the register file (it is the same implementation as the WriteBackPlugin(integer)
FpuFlagsWriteback ; Allows instruction to set FPU exception flags
@@ -277,13 +277,17 @@ Plugins architecture
Area / Timings options
-To improve the FPU area and timings (especialy on FPGA), there is currently two main options implemented.
+To improve the FPU area and timings (especially on FPGA), there is currently two main options implemented.
The first option is to reduce the FMA (Float Multiply Add instruction A*B+C) accuracy.
-The reason is that the mantissa result of the multiply operation (for 64 bits float) is 2x(52+1)=106 bits, then we need to take those bits and implement the floating point adder against the third opperand. So, instead of having to do a 52 bits + 52 bits floating point adder, we need to do a 106 bits + 52 bits floating point adder, which is quite heavy, increase the timings and latencies while being (very likely) overkilled.
+The reason is that the mantissa result of the multiply operation (for 64 bits float) is 2x(52+1)=106 bits,
+then we need to take those bits and implement the floating point adder against the third operand.
+So, instead of having to do a 52 bits + 52 bits floating point adder,
+we need to do a 106 bits + 52 bits floating point adder, which is quite heavy,
+increase the timings and latencies while being (very likely) overkilled.
So this option throw away about half of the multiplication mantissa result.
The second option is to disable subnormal support, and instead consider those value as normal floating point numbers.
This reduce the area by not having to handle subnormals (it removes big barrels shifters)
-, aswell as improving timings.
+, as well as improving timings.
The down side is that the floating point value range is slightly reduced,
and if the user provide floating point constants which are subnormals number,
they will be considered as 2^exp_subnormal numbers.
@@ -293,11 +297,11 @@ Area / Timings options
Optimized software
If you used the default FPU configuration (deeply pipelined), and you want to achieve a high FPU bandwidth,
-your software need to be carefull about dependencies between instruction.
+your software need to be careful about dependencies between instruction.
For instance, a FMA instruction will have around 10 cycle latency before providing its results,
-so if you want for instance to multipliy 1000 values against some constants
+so if you want for instance to multiply 1000 values against some constants
and accumulate the results together, you will need to accumulate things using multiple accumulators and then, only at the end, aggregate the accumulators together.
-So think about code pipelining. GCC will not necessarly do a got job about it,
+
So think about code pipelining. GCC will not necessary do a got job about it,
as it may assume assume that the FPU has a much lower latency, or just optimize for code size.
@@ -331,7 +335,7 @@ Optimized software
- Version: master git~867f372 2024-09-05
+ Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Execute/index.html b/master/VexiiRiscv/Execute/index.html
index d916dd2..c5df48a 100644
--- a/master/VexiiRiscv/Execute/index.html
+++ b/master/VexiiRiscv/Execute/index.html
@@ -87,7 +87,7 @@
- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
The main thing about it is that for every uop implementation in the pipeline, there is the elaboration time information for :
How/where to retreive the result of the instruction (rd)
How/where to retrieve the result of the instruction (rd)
From which point in the pipeline it use which register file (rs)
From which point in the pipleine the instruction can be considered as done (completion)
From which point in the pipeline the instruction can be considered as done (completion)
Until which point in the pipeline the instruction may flush younger instructions (mayFlushUpTo)
From which point in the pipeline the instruction should not be flushed anymore because it already had produced side effects (dontFlushFrom)
The list of decoded signals/values that the instruction is using (decodings)
- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
VexiiRiscv has 2 implementions of LSU :
+VexiiRiscv has 2 implementations of LSU :
LsuCachelessPlugin for microcontrollers, which doesn’t implement any cache
LsuPlugin / LsuL1Plugin which can work together to implement load and store through an L1 cache
This LSU implementation is partitionned between 2 plugins :
+This LSU implementation is partitioned between 2 plugins :
The LsuPlugin :
Implement AGU (Address Generation Unit)
Cache miss, MMU miss
Refill / Writeback aliasing (4KB)
Unreaded data bank durring load (ex : load durring data bank refill)
Unread data bank during load (ex : load during data bank refill)
Load which hit the store queue
Store miss while the store queue is full
…
read_cmd : To send memory block aquire requests (invalid/shared -> shared/exclusive)
read_cmd : To send memory block acquire requests (invalid/shared -> shared/exclusive)
read_rsp : For responses of the above requests
read_ack : To send aquire requests completion
write_cmd : To send release a memory block permition (shared/exclusive -> invalid)
read_ack : To send acquire requests completion
write_cmd : To send release a memory block permission (shared/exclusive -> invalid)
write_rsp : For responses of the above requests
probe_cmd : To receive probe requests (toInvalid/toShared/toUnique)
probe_rsp : to send responses from the above requests (isInvalid/isShared/isUnique)
This prefetcher is capable of reconizing instructions which have a constant stride between their +
This prefetcher is capable of recognizing instructions which have a constant stride between their own previous accesses in order to prefetch multiple strides ahead.
Will learn memory accesses patterns from the LsuPlugin traces
Patterns need to have a constant stride in order to be reconized
Patterns need to have a constant stride in order to be recognized
By default, can keep of the access patterns up to 128 instructions (1 way * 128 sets, pc indexed)
This can improve performance dramasticaly (for some use cases). -For instance, on a 100 Mhz SoC in a FPGA, equipied of a 16x800 MT/s DDR3, -the load bandwidth went from 112 MB/s to 449 MB/s. (sequencial load)
+This can improve performance dramatically (for some use cases). +For instance, on a 100 MHz SoC in a FPGA, equipped of a 16x800 MT/s DDR3, +the load bandwidth went from 112 MB/s to 449 MB/s. (sequential load)
Here is a description of the table fields :
“Tag” : Allows to get a better idea if the given instruction (PC) is the one owning the table entry by comparing more PC’s MSB bits. An entry is “owned” by an instruction if its tag match the given instruction PC’s msb bits.
“Address” : Previous virtual address generated by the instruction
“stride” : Number of bytes expected between memory accesses
-“Score” : Allows to know if the given entry is usefull or not. Each time +
“Score” : Allows to know if the given entry is useful or not. Each time the instruction is keeping the same stride, the score increase, else it decrease. If another instruction (with another tag) want to use an entry, the score field has to be low enough.
@@ -357,7 +357,7 @@- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Execute/plugins.html b/master/VexiiRiscv/Execute/plugins.html index 857e618..69490e0 100644 --- a/master/VexiiRiscv/Execute/plugins.html +++ b/master/VexiiRiscv/Execute/plugins.html @@ -87,7 +87,7 @@Implement one register file, with the possibility to create new read / write port on demande
+Implement one register file, with the possibility to create new read / write port on demand
Alows plugins to write integer values back to the register file through a optional sign extender. +
Allows plugins to write integer values back to the register file through a optional sign extender. It uses WriteBackPlugin as value backend.
Implement multiplication operation using partial multiplications and then summing their result
Done over multiple stage
Can optionaly extends the last stage for one cycle in order to buffer the MULH bits
Can optionally extends the last stage for one cycle in order to buffer the MULH bits
Implement load / store through a cacheless memory bus
Will fork the cmd as soon as fork stage is valid (with no flush)
Handle backpresure by using a little fifo on the response data
Handle backpressure by using a little fifo on the response data
Implement a shared on chip ram
Provide an API which allows to staticaly allocate space on it
Provide an API which allows to statically allocate space on it
Provide an API to create read / write ports on it
Used by various plugins to store the CSR contents in a FPGA efficient way
- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Fetch/index.html b/master/VexiiRiscv/Fetch/index.html index 43dec55..4cdb881 100644 --- a/master/VexiiRiscv/Fetch/index.html +++ b/master/VexiiRiscv/Fetch/index.html @@ -87,7 +87,7 @@Note, for the best results, the FetchL1Plugin need to have 2 hardware refill slots instead of 1 (default).
@@ -357,7 +357,7 @@- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
diff --git a/master/VexiiRiscv/Framework/index.html b/master/VexiiRiscv/Framework/index.html index e2e40a0..8288444 100644 --- a/master/VexiiRiscv/Framework/index.html +++ b/master/VexiiRiscv/Framework/index.html @@ -87,7 +87,7 @@This combination alows to goes way behond what regular HDL alows in terms of hardware description capabilities. +
This combination allows to goes way beyond what regular HDL allows in terms of hardware description capabilities. You can find some documentation about SpinalHDL here :
One main design aspect of VexiiRiscv is that all its hardware is defined inside plugins. When you want to instanciate a VexiiRiscv CPU, you “only” need to provide a list of plugins as parameters. So, plugins can be seen as both parameters and hardware definition from a VexiiRiscv perspective.
-So it is quite different from the regular HDL component/module paradigm. Here are the adventages of this aproache :
+One main design aspect of VexiiRiscv is that all its hardware is defined inside plugins. +When you want to instantiate a VexiiRiscv CPU, you “only” need to provide a list of plugins as parameters. +So, plugins can be seen as both parameters and hardware definition from a VexiiRiscv perspective.
+So it is quite different from the regular HDL component/module paradigm. Here are the advantagesof this approach :
The CPU can be extended without modifying its core source code, just add a new plugin in the parameters
You can swap a specific implementation for another just by swapping plugin in the parameter list. (ex branch prediction, mul/div, …)
It is decentralised by nature, you don’t have a fat toplevel of doom, software interface between plugins can be used to negociate things durring elaboration time.
It is decentralized by nature, you don’t have a fat toplevel of doom, software interface between plugins can be used to negotiate things during elaboration time.
The plugins can fork elaboration threads which cover 2 phases :
setup phase : where plugins can aquire elaboration locks on each others
build phase : where plugins can negociate between each others and generate hardware
setup phase : where plugins can acquire elaboration locks on each others
build phase : where plugins can negotiate between each others and generate hardware
Here is a example where there a plugin which count the number of hardware event comming from other plugins :
+Here is a example where there a plugin which count the number of hardware event coming from other plugins :
import spinal.core._
import spinal.core.fiber.Retainer
import spinal.lib.misc.plugin._
@@ -337,7 +339,7 @@ Negociation exampleclass EventCounterPlugin extends FiberPlugin{
val lock = Retainer() // Will allow other plugins to block the elaboration of "logic" thread
val events = ArrayBuffer[Bool]() // Will allow other plugins to add event sources
- val logic = during build new Area{
+ val logic = during build new Area {
lock.await() // Active blocking
val counter = Reg(UInt(32 bits)) init(0)
counter := counter + CountOne(events)
@@ -345,12 +347,12 @@ Negociation example}
-//For the demo we want to be able to instanciate this plugin multiple times, so we add a prefix parameter
+// For the demo we want to be able to instantiate this plugin multiple times, so we add a prefix parameter
class EventSourcePlugin(prefix : String) extends FiberPlugin{
withPrefix(prefix)
// Create a thread starting from the setup phase (this allow to run some code before the build phase, and so lock some other plugins retainers)
- val logic = during setup new Area{
+ val logic = during setup new Area {
val ecp = host[EventCounterPlugin] // Search for the single instance of EventCounterPlugin in the plugin pool
// Generate a lock to prevent the EventCounterPlugin elaboration until we release it.
// this will allow us to add our localEvent to the ecp.events list
@@ -368,8 +370,8 @@ Negociation example }
}
-object Gen extends App{
- SpinalVerilog{
+object Gen extends App {
+ SpinalVerilog {
val plugins = ArrayBuffer[FiberPlugin]()
plugins += new EventCounterPlugin()
plugins += new EventSourcePlugin("lane0")
@@ -465,7 +467,7 @@ Database
In short, the design use a pipeline API in order to :
-Propagate data into the pipeline automaticaly
+Propagate data into the pipeline automatically
Allow design space exploration with less paine (retiming, moving around the architecture)
Reduce boiler plate code
@@ -502,7 +504,7 @@ Pipeline API
- Version: master git~867f372 2024-09-05
+ Version: master git~d07b8dd 2024-09-08
test.fst : A wave file which can be open with gtkwave. It shows all the CPU signals
konata.log : A wave file which can be open with https://github.com/shioyadan/Konata, it shows the pipeline behaviour of the CPU
konata.log : A wave file which can be open with https://github.com/shioyadan/Konata, it shows the pipeline behavior of the CPU
spike.log : The execution logs of Spike (golden model)
tracer.log : The execution logs of VexRiscv (Simulation model)
VexiiRiscv is designed in a way which should make it easy to deploy on all FPGA. -including the ones without support for asyncronous memory read +including the ones without support for asynchronous memory read (LUT ram / distributed ram / MLAB). The one exception is the MMU, but if configured to only read the memory on cycle 0 -(no tag hit), then the synthesis tool should be capable of inferring that asyncronus -read into a syncronous one (RAM block, work on Efinix FPGA)
+(no tag hit), then the synthesis tool should be capable of inferring that asynchronous +read into a synchronous one (RAM block, work on Efinix FPGA)By default SpinalHDL will generate memories in a Verilog/VHDL inferable way. Otherwise, for ASIC, you likely want to enable the automatic memory blackboxing, -which will instead replace all memories defined in the design by a consistant blackbox +which will instead replace all memories defined in the design by a consistent blackbox module/component, the user having then to provide those blackbox implementation.
Currently all memories used are “simple dual port ram”. While this is the best for FPGA usages, on ASIC maybe some of those could be redesigned to be single port rams instead (todo).
@@ -424,7 +424,7 @@Here is a list of links to ressources which present or document VexiiRiscv :
+Here is a list of links to resources which present or document VexiiRiscv :
FSiC 2024 : https://wiki.f-si.org/index.php?title=Moving_toward_VexiiRiscv
COSCUP 2024 : https://coscup.org/2024/en/session/PVAHAS
VexiiRiscv is a from scratch second iteration of VexRiscv, with the following goals :
To imlement RISC-V 32/64 bits IMAFDCSU
To implement RISC-V 32/64 bits IMAFDCSU
Could start around as small as VexRiscv, but could scale further in performance
Optional late-alu
Optional multi issue
RISC-V 32/64 IMAFDCSU supported (Multiply / Atomic / Float / Double / Supervisor / User)
Can run baremetal applications (2.50 dhrystone/mhz, 5.24 coremark/mhz)
Can run baremetal applications (2.50 dhrystone/MHz, 5.24 coremark/MHz)
Can run linux/buildroot/debian on FPGA hardware (via litex)
single/dual issue supported
late-alu supported
The CPU toplevel src/main/scala/vexiiriscv/VexiiRiscv.scala
A cpu configuration generator : dev/src/main/scala/vexiiriscv/Param.scala
Some globaly shared definitions : src/main/scala/vexiiriscv/Global.scala
Some globally shared definitions : src/main/scala/vexiiriscv/Global.scala
Integer ALU plugin ; src/main/scala/vexiiriscv/execute/IntAluPlugin.scala
Also on quite important one is to use a text editor / IDE which support curly brace folding and to start with them fully folded, as the code extensively used nested structures.
@@ -311,9 +311,9 @@Here is a list of important assumptions and things to know about :
trap/flush/pc request from the pipeline, once asserted one cycle can not be undone. This also mean that while a given instruction is stuck somewere, if that instruction did raised on of those request, nothing should change the execution path. For instance, a sudden cache line refill completion should not lift the request from the LSU asking a redo (due to cache refill hazard).
trap/flush/pc request from the pipeline, once asserted one cycle can not be undone. This also mean that while a given instruction is stuck somewhere, if that instruction did raised on of those request, nothing should change the execution path. For instance, a sudden cache line refill completion should not lift the request from the LSU asking a redo (due to cache refill hazard).
In the execute pipeline, stage.up(RS1/RS2) is the value to be used, while stage.down(RS1/RS2) should not be used, as it implement the bypassing for the next stage
Fetch.ctrl(0) isn’t persistant.
Fetch.ctrl(0) isn’t persistent.
- Version: master git~867f372 2024-09-05 + Version: master git~d07b8dd 2024-09-08
It is still very early in the developement, but here are some metrics :
+It is still very early in the development, but here are some metrics :