Title | LUT/ALM | DSP | Year | Platform | Frequency (MHz) | Throughput (GOPs) | Power (W) | Energie Efficiency (GOPs/W) | Model | W_precision | A_precison |
---|---|---|---|---|---|---|---|---|---|---|---|
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks | 186251 | 2240 | 2015 | VC707 | 100 | 61.62 | 18.61 | 3.31112 | AlexNet | FP-32 | FP-32 |
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks | -- | -- | 2016 | P395-D8 | 120 | 117.8 | 19.1 | 6.16754 | VGG-16 | INT-8 | INT-16 |
Automatic Code Generation of Convolutional Neural Networks in FPGA Implementation | -- | -- | 2016 | VC709 | 100 | 222.1 | 24.8 | 8.95565 | AlexNet | INT-8 | INT-16 |
Going Deeper wth Embedded FPGA Platform for Convolutional Neural Network | 182616 | 780 | 2016 | ZC706 | 150 | 136.97 | 9.63 | 14.2233 | VGG-16 | INT-16 | INT-16 |
A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Network | -- | -- | 2016 | VC709 | 156 | 565.94 | 30.2 | 18.7397 | AlexNet | INT-16 | INT-16 |
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster | -- | -- | 2016 | ZC706+6*VC709 | 150 | 1280.3 | 160 | 8.00188 | VGG-16 | INT-16 | INT-16 |
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster | -- | -- | 2016 | ZC706+4*VC709 | 150 | 825.6 | 126 | 6.55238 | AlexNet | INT-16 | INT-16 |
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster | -- | -- | 2016 | ZC706+4*VC709 | 150 | 128.8 | 126 | 1.02222 | AlexNet | INT-16 | INT-16 |
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster | -- | -- | 2016 | ZC706+VC709 | 150 | 290 | 35 | 8.28571 | VGG-16 | INT-16 | INT-16 |
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster | -- | -- | 2016 | ZC706+VC709 | 150 | 203.9 | 35 | 5.82571 | VGG-16 | INT-16 | INT-16 |
CirCNN:Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices | -- | -- | 2017 | Cyclone V 5CEA9 | 100 | 400 | 0.44 | 909.091 | AlexNet | INT-16 | INT-16 |
F-C3D: FPGA-based 3-Dimensional Convolutional Neural Network | -- | -- | 2017 | ZC706 | 176 | 144.5 | 9.7 | 14.8969 | C3D | INT-16 | INT-16 |
A Fully Connected Layer Elimination for a Binarized Convolutional Network on an FPGA | -- | -- | 2017 | Zedboard | 143 | 329.47 | 2.3 | 143.248 | VGG11 | Binary | Binary |
Accelerating Low Bit-Width Convolutional Neural Networks With Embedded FPGA | -- | -- | 2017 | Zynq XC7Z020 | 200 | 410.22 | 2.26 | 181.513 | DoReFa-Net | Binary | INT-2 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 42349 | 1036 | 2017 | Stratix-V GSMD5 | 150 | 364.36 | 25 | 14.5744 | VGG-19 | INT-16 | INT-16 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 164100 | 264 | 2017 | Stratix-V GSMD5 | 150 | 81 | 25 | 3.24 | VGG-19 | FxP-32 | FxP-32 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 42349 | 1036 | 2017 | Stratix-V GSMD5 | 150 | 315.85 | 25 | 12.634 | LSTM-LM | INT-16 | INT-16 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 164100 | 264 | 2017 | Stratix-V GSMD5 | 150 | 86 | 25 | 3.44 | LSTM-LM | FxP-32 | FxP-32 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 42349 | 1036 | 2017 | Stratix-V GSMD5 | 150 | 226.47 | 25 | 9.0588 | ResNet-152 | INT-16 | INT-16 |
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates | 164100 | 264 | 2017 | Stratix-V GSMD5 | 150 | 73 | 25 | 2.92 | ResNet-152 | FxP-32 | FxP-32 |
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs | 600000 | 2520 | 2017 | ZCU102 | 200 | 2940.7 | 23.6 | 124.606 | VGG-16 | INT-16 | INT-16 |
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs | 600000 | 2520 | 2017 | ZCU102 | 200 | 854.6 | 23.6 | 36.2119 | AlexNet | INT-16 | INT-16 |
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks | 161000 | 1518 | 2017 | Arria 10 GX1150 | 150 | 645.25 | 21.2 | 30.4363 | VGG-16 | INT-8 | INT-16 |
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network | -- | 2756 | 2017 | Arria 10 GX1150 | 385 | 1790 | 37.46 | 47.7843 | VGG-16 | INT-16 | INT-16 |
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network | -- | 1320 | 2017 | Arria 10 GX1150 | 370 | 866 | 41.73 | 20.7525 | VGG-16 | FxP-32 | FxP-32 |
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA | 293920 | 1504 | 2017 | KU060 | 200 | 282.2 | 41 | 6.88293 | LSTM | INT-12 | INT-16 |
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs | 155886 | 824 | 2017 | ZC706 | 100 | 229.5 | 9.4 | 24.4149 | VGG-16 | INT-16 | INT-16 |
An OpenCL Deep Learning Accelerator on Arria 10 | 246000 | 1476 | 2017 | Arria 10 GX1150 | 303 | 1382 | 45 | 30.7111 | AlexNet | FxP-16 | FxP-16 |
Fast and Efficient Implementation of Convolutional Neural Networks on FPGA | 196370 | 256 | 2017 | E5-2600+Stratix V | 200 | 229 | 8.04 | 28.4826 | VGG-16 | INT-32 | INT-32 |
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System | 200522 | 224 | 2017 | E5-2600+Stratix V | 200 | 123.5 | 13.18 | 9.37026 | VGG-16 | FP-32 | FP-32 |
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System | 200522 | 224 | 2017 | E5-2600+Stratix V | 200 | 83 | 13.18 | 6.29742 | AlexNet | FP-32 | FP-32 |
FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks | 198280 | 1176 | 2017 | VC707 | 150 | 7.26 | 19.63 | 0.369842 | LSTM-RNN | FP-32 | FP-32 |
Maximizing CNN accelerator efficiency through resource partitioning | 133854 | 3494 | 2017 | Xilinx Virtex-7 FPGA 690T | 170 | 909.7 | 7.2 | 126.347 | SqueezeNet | INT-16 | INT-16 |
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs | 46900 | 3 | 2017 | Zynq-7000 XC7Z020 | 143 | 207.8 | 4.7 | 44.2128 | BNN | Binary | Binary |
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks | 342126 | 1096 | 2017 | Virtex 7 | 90 | 7663 | 8.2 | 934.512 | BNN | Binary | Binary |
Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs | 175000 | 1518 | 2018 | Arria 10 GX1150 | 240 | 1032 | 40 | 25.8 | SSD300 | INT-8 | INT-16 |
Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs | 532000 | 4363 | 2018 | Arria 10 GX2800 | 300 | 2178 | 100 | 21.78 | SSD300 | INT-8 | INT-16 |
VIBNN: Hardware Acceleration of Bayesian Neural Networks | 98006 | 342 | 2018 | Cyclone V | -- | 127.8 | 6.1 | 20.9508 | MNIST (RLF-Based) | Binary | Binary |
VIBNN: Hardware Acceleration of Bayesian Neural Networks | 91126 | 342 | 2018 | Cyclone V | -- | 127.8 | 8.52 | 15 | MNIST (BNNWallace) | Binary | Binary |
FBNA: A Fully Binarized Neural Network Accelerator | 29600 | 0 | 2018 | ZC702 | -- | 2236 | 3.2 | 698.75 | SVHN | Binary | Binary |
FBNA: A Fully Binarized Neural Network Accelerator | 29600 | 0 | 2018 | ZC702 | -- | 722 | 3.3 | 218.788 | CIFAR-10 | Binary | Binary |
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks | 203000 | 0 | 2018 | ZC706 | 150 | 687.78 | 10.56 | 65.1307 | AlexNet | INT-4 | INT-8 |
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks | 203000 | 0 | 2018 | ZC706 | 150 | 878.11 | 10.56 | 83.1544 | VGG-16 | INT-4 | INT-8 |
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks | 203000 | 0 | 2018 | ZC706 | 150 | 804.03 | 10.56 | 76.1392 | ResNet-50 | INT-4 | INT-8 |
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems | 156859 | 612 | 2018 | ZC706 | 150 | 1249.7 | 9.82 | 127.261 | VGG-16 | INT-8 | INT-8 |
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems | 156859 | 612 | 2018 | ZC706 | 150 | 685.6 | 9.76 | 70.2459 | AlexNet | INT-8 | INT-8 |
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems | 156859 | 612 | 2018 | ZC706 | 150 | 507.2 | 9.72 | 52.1811 | ResNet-50 | INT-8 | INT-8 |
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA | 105673 | 880 | 2018 | ZC706 | 200 | 1972 | 4.2 | 469.524 | AlexNet | hybrid int | INT-4 |
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA | 103505 | 550 | 2018 | ZC706 | 200 | 1233 | 4.1 | 300.732 | AlexNet | hybrid int | INT-8 |
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA | 124317 | 783 | 2018 | ZC706 | 200 | 2530 | 4.8 | 527.083 | AlexNet | hybrid int | INT-4 |
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA | 242000 | 1536 | 2018 | VC709 | 150 | 430.7 | 25 | 17.228 | C3D | INT-16 | INT-16 |
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA | 209000 | 1536 | 2018 | VUS440 | 200 | 784.7 | 26 | 30.1808 | C3D | INT-16 | INT-16 |
Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA | 29867 | 190 | 2018 | XC7z020 | 214 | 84.3 | 3.5 | 24.0857 | VGG-16 | INT-8 | INT-8 |
Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA | 85172 | 900 | 2018 | XC7z045 | 150 | 137 | 9.63 | 14.2264 | VGG-16 | INT-16 | INT-16 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 7000 | 75 | 93.3333 | ResNet-34 1x-wide | FP-32 | FP-32 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 8000 | 75 | 106.667 | ResNet-34 1x-wide | INT-8 | INT-8 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 43000 | 75 | 573.333 | ResNet-34 1x-wide | Ternary | INT-8 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 52000 | 75 | 693.333 | ResNet-34 1x-wide | Binary | INT-8 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 18000 | 75 | 240 | ResNet-34 1x-wide | INT-4 | INT-4 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 51000 | 75 | 680 | ResNet-34 1x-wide | INT-3 | INT-3 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 85000 | 75 | 1133.33 | ResNet-34 1x-wide | INT-2 | INT-2 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 98000 | 75 | 1306.67 | ResNet-34 1x-wide | Ternary | INT-2 |
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs | -- | -- | 2018 | Arria 10 GX 1150 | 275 | 267000 | 75 | 3560 | ResNet-34 1x-wide | Binary | Binary |
An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture | -- | -- | 2018 | VC707 | -- | 20.3 | 0.676 | 30.0296 | LeNet-5 | INT-16 | INT-16 |
DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator | 277440 | 2020 | 2018 | Zynq-7100 XC7Z100 | 125 | 1198 | 7.3 | 164.11 | GRU-RNN | INT-16 | INT-16 |
Accelerator Design with Effective Resource Utilization for Binary Convolutional Neural Networks on an FPGA | 61000 | 0 | 2018 | XCVU190 | 240 | 3756 | 5.9 | 636.61 | BNN | Binary | Binary |
A PYNQ-based Framework for Rapid CNN Prototyping | -- | -- | 2018 | XC7Z020 | -- | 2.56 | 1.896 | 1.35021 | CNN | INT-8 | INT-8 |
Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators | 261096 | 2800 | 2019 | Virtex-7 485T | 150 | 608.28 | 21.64 | 28.1091 | ResNet-152 | INT-16 | INT-16 |
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs | 496958 | 3435 | 2019 | ADM-PCIE-7V3 | 200 | 34529 | 24 | 1438.71 | LSTM on TIMIT (FFT8) | INT-12 | INT-12 |
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs | 411714 | 2866 | 2019 | ADM-PCIE-7V3 | 200 | 54943 | 25 | 2197.72 | LSTM on TIMIT (FFT16) | INT-12 | INT-12 |
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs | 24130 | 37 | 2019 | Ultra96 | 250 | 47.09 | 5.5 | 8.56182 | DiracDeltaNet | INT-4 | INT-4 |
REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs | 637671 | 3456 | 2019 | ADM-PCIE-7V3 | 200 | 1967 | 21 | 93.6667 | YOLO tiny v2 | FxP-6 | FxP-6 |
Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity | 289000 | 1518 | 2019 | Arria 10 GX1150 | 200 | 2432.8 | 19.1 | 127.372 | LSTM | INT-16 | INT-16 |
Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs | 1512810 | 5349 | 2019 | VCU118 | 214 | 1828.61 | 49.25 | 37.1291 | VGG16 | INT-16 | INT-16 |
Automatic Compiler Based FPGA Accelerator for CNN Training | 208000 | 1699 | 2019 | Stratix 10 GX | 240 | 163 | 20.64 | 7.89729 | CIFAR-10; '1X'CNN | INT-16 | INT-16 |
Automatic Compiler Based FPGA Accelerator for CNN Training | 415000 | 3363 | 2019 | Stratix 10 GX | 240 | 282 | 32.83 | 8.5897 | CIFAR-10; '2X'CNN | INT-16 | INT-16 |
Automatic Compiler Based FPGA Accelerator for CNN Training | 720000 | 5760 | 2019 | Stratix 10 GX | 240 | 479 | 50.47 | 9.49079 | CIFAR-10; '4X'CNN | INT-16 | INT-16 |
Towards an Efficient Accelerator for DNN-based Remote Sensing Image Segmentation on FPGAs | 170906 | 1665 | 2019 | Intel Arria10 660 | 200 | 1578 | 32 | 49.3125 | U-Net | INT-8 | INT-8 |
An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs | 132344 | 364 | 2019 | ZCU102 | 200 | 291 | 23.6 | 12.3305 | ResNet | INT-16 | INT-16 |
FPGA-Based Sparsity-Aware CNN Accelerator for Noise-Resilient Edge-Level Image Recognition | -- | -- | 2019 | Intels Stratix-V | 100 | 57.6 | 2.03 | 28.3744 | VGG-16 | int13 | int13 |
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices | 122500 | 793 | 2019 | ZC706 | 166 | 167.58 | 6.08 | 27.5625 | VGG16 | INT-8 | INT-16 |
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices | 55500 | 84 | 2019 | ZC706 | 200 | 405.82 | 5.6 | 72.4679 | DoReFa-Net | Binary | INT-2 |
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices | 51300 | 31 | 2019 | ZC706 | 200 | 441.95 | 4.88 | 90.5635 | XNOR-Net | Binary | Binary |
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices | 100200 | 818 | 2019 | ZC706 | 200 | 124.9 | 7.31 | 17.0862 | ResNet-18 | INT-8 | INT-8 |
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays | 241202 | 768 | 2019 | V-Ultra XCVU095 | 150 | 460.8 | 8.24 | 55.9223 | VGG16 | INT-8 | INT-8 |
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays | 241202 | 768 | 2019 | V-Ultra XCVU095 | 150 | 230.4 | 4.12 | 55.9223 | VGG16 | INT-16 | INT-16 |
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays | 241202 | 768 | 2019 | V-Ultra XCVU095 | 150 | 921.6 | 16.49 | 55.8884 | VGG16 | INT-8 | INT-8 |
A Fine-Grained Sparse Accelerator for Multi-Precision DNN | -- | -- | 2019 | Xilinx XCKU115 | 200 | 574.2 | 13.42 | 42.7869 | CNN | INT-4 | INT-4 |
A Fine-Grained Sparse Accelerator for Multi-Precision DNN | -- | -- | 2019 | Xilinx XCKU115 | 200 | 110.4 | 13.39 | 8.24496 | RNN | INT-4 | INT-4 |
A Fine-Grained Sparse Accelerator for Multi-Precision DNN | -- | -- | 2019 | Xilinx XCKU115 | 200 | 571.1 | 13.41 | 42.5876 | CNN+RNN | INT-4 | INT-4 |
InS-DLA: An In-SSD Deep Learning Accelerator for Near-Data Processing | 93232 | 0 | 2019 | Zynq XC7Z045 | 100 | 44.8 | 9.621 | 4.65648 | CNN | INT-8 | INT-8 |
A 307-fps 351.7-GOPs/W Deep Learning FPGA Accelerator for Real-Time Scene Text Recognition | -- | -- | 2019 | Virtex Ultrascale+ | 100 | 11973 | 34.04 | 351.733 | BSEG | Binary | Binary |
A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization | -- | -- | 2019 | Arria 10 | 150 | 2220 | 1.679 | 1322.22 | LSTM | INT-4 | INT-8 |
A 112-765 FPGA-based CNN Accelerator using Importance Map Guided Adaptive Activation Sparsification for Pix2pix Applications | -- | -- | 2020 | Zynq XC7Z035 | 100 | 2525 | 3.3 | 765.152 | SRResNet | INT-16 | INT-16 |
NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks | 20600 | -- | 2020 | Zynq7020 SoC | 200 | 324 | 2.72 | 119.118 | VGG16 | FP-32 | FP-32 |
When massive GPU parallelism ain\u2019t enough: A Novel Hardware Architecture of 2D-LSTM Neural Network | 191449 | 440 | 2020 | ZCU102 | 300 | 5255.66 | 13.2 | 398.156 | 2D-LSTM | Binary | Binary |
When massive GPU parallelism ain\u2019t enough: A Novel Hardware Architecture of 2D-LSTM Neural Network | 93324 | 234 | 2020 | ZCU102 | 240 | 3071.79 | 15.47 | 198.564 | 2D-LSTM | INT-4 | INT-8 |
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks | 173522 | 704 | 2020 | Xilinx XC7K325T | 200 | 295.68 | 16.518 | 17.9005 | MobileNetV1 | INT-8 | INT-8 |
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks | 173522 | 704 | 2020 | Xilinx XC7K325T | 200 | 197.12 | 17.14 | 11.5006 | MobileNetV2 | INT-8 | INT-8 |
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks | 173522 | 704 | 2020 | Xilinx XC7K325T | 200 | 168.96 | 17.07 | 9.89807 | MobileNetV3-Large | INT-8 | INT-8 |
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks | 173522 | 704 | 2020 | Xilinx XC7K325T | 200 | 352 | 6.7 | 52.5373 | DenseNet-161 | INT-8 | INT-8 |
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks | 173522 | 704 | 2020 | Xilinx XC7K325T | 200 | 267.52 | 16.9 | 15.8296 | SqueezeNetV1.1 | INT-8 | INT-8 |
End-to-End Optimization of Deep Learning Applications | 1111980 | 3420 | 2020 | VCU1525 | 242.9 | 117 | 10 | 11.7 | OpenPose-V2 | FP-32 | FP-32 |
FTDL: An FPGA-tailored Architecture for Deep Learning Applications | -- | -- | 2020 | UltraScale | 650 | 1272.22 | 46.1 | 27.5969 | GoogLeNet, ResNet50 | INT-16 | INT-16 |
High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression | 274795 | 2370 | 2020 | VirtexUS+XCVU9P | 300 | 2419.2 | 75 | 32.256 | CNN | Binary | INT-8 |
Optimizing Reconfigurable Recurrent Neural Networks | 487232 | 4368 | 2020 | Stratix10 GX2800 | 260 | 8015 | 62.13 | 129.004 | LSTM | INT-8 | INT-8 |
A High Throughput MobileNetV2 FPGA Implementation Based on a Flexible Architecture for Depthwise Separable Convolution | 145000 | 1220 | 2020 | Arria 10 | 200 | 693 | 34 | 20.3824 | MobileNet-V2 | INT-16 | INT-16 |
A Reconfigurable Multithreaded Accelerator for Recurrent Neural Network | 522852 | 4368 | 2020 | Stratix 10 2800 | 260 | 7810 | 125 | 62.48 | LSTM | INT-8 | INT-8 |
Memory-Efficient Dataflow Inference Acceleration for Deep CNNs on FPGA | 1027000 | 1611 | 2020 | Alveo U250 | 195 | 18300 | 71 | 257.746 | ResNet-50 | Binary | INT-2 |
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations | 50656 | 224 | 2021 | ZYNQ ZU3EG | 250 | 702 | 6.1 | 115.082 | ReActNet (ImageNet) | Binary | Binary |
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations | 51444 | 126 | 2021 | ZYNQ ZU3EG | 250 | 401 | 4.1 | 97.8049 | ReActNet (CIFAR-10) | Binary | Binary |
Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory | 334000 | 1442 | 2021 | Intel Stratix 10 MX2100 | 257 | 980.344 | 79.98 | 12.2574 | MobileNet | FxP-16 | FxP-16 |
Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory | 334000 | 1442 | 2021 | Intel Stratix 10 MX2100 | 257 | 5071.24 | 79.99 | 63.3985 | ResNet-50 | FxP-16 | FxP-16 |
ESCA: Event-Based Split-CNN Architecture with Data-Level Parallelism on UltraScale+ FPGA (short) | 469288 | 2100 | 2021 | Virtex UltraScale+ xcvu9p | 320 | 49.92 | 10.68 | 4.67416 | VGG16 | INT-14 | INT-14 |
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) | -- | 1024 | 2021 | Xilinx ZCU102 | 200 | 1353 | 10.2 | 132.647 | C3D | INT-8 | INT-8 |
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) | -- | 1024 | 2021 | Xilinx ZCU102 | 200 | 1150 | 10.2 | 112.745 | VGG16 | INT-8 | INT-8 |
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) | -- | -- | 2021 | Xilinx ZCU102 | 200 | 1210 | 10.2 | 118.627 | 3D RESNET-18 | INT-8 | INT-8 |
Eciton: Very Low-Power LSTM Neural Network Accelerator for Predictive Maintenance at the Edge | 4987 | 6 | 2021 | Lattice iCE40 UP5K | 17 | 0.067 | 0.017 | 3.94118 | LSTM | INT-8 | INT-8 |
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access | 1078800 | 1730 | 2021 | Stratix 10 GX 10M FPGA | 169.2 | 3990 | 28.06 | 142.195 | MobileNet-V1 (1.0) | INT-4 | INT-4 |
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access | 993800 | 1730 | 2021 | Stratix 10 GX 10M FPGA | 196.89 | 2650 | 27.41 | 96.68 | MobileNet-V1 (0.75) | INT-4 | INT-4 |
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access | 804500 | 1730 | 2021 | Stratix 10 GX 10M FPGA | 200.76 | 1240 | 27.08 | 45.7903 | MobileNet-V1 (0.5) | INT-4 | INT-4 |
An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics | 308449 | 2160 | 2021 | Xilinx Virtex-7 XC7V690t | 150 | 181.8 | 11.35 | 16.0176 | MobileNet | INT-8 | INT-8 |
Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures | 336000 | 1352 | 2021 | Intel Arria 10 GX1150 | 242 | 1662 | 27.8 | 59.7842 | VGG-16 | INT-8 | INT-8 |
Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures | 336000 | 1352 | 2021 | Intel Arria 10 GX1150 | 242 | 495 | 22.6 | 21.9027 | ResNet-50 | INT-8 | INT-8 |
-
Notifications
You must be signed in to change notification settings - Fork 1
AnouarITI/FPGA-based-DNN-Accels
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
 |  | |||
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published