Pre-synthesis failed in both project and in part7a_bitstream tutorial #1157

sdubey11 · 2024-12-19T21:37:56Z

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
Check that the issue hasn't already been reported, by checking the currently open issues.
If there are steps to reproduce the problem, make sure to write them down below.
If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

Hello, I have been having issues running hls_model.build(csim=False, export=True, bitfile=True) for the project I am doing. Pre-synthesis fails due to some limit being exceeded. I'm somewhat new to this so I'm not sure what the origin of the error is, or the optimal solution.

Details

I encounter the error when running hls_model.build(csim=False, export=True, bitfile=True).

ERROR: [XFORM 203-103] Array 'mult.V' (firmware/nnet_utils/nnet_dense_latency.h:17): partitioned elements number (4000) has exeeded the threshold (1024), which may cause long run-time.

I am loading a trained, pruned, Keras MLP. It's a simple model with one hidden layer with 10 neurons, but the input shape is (10, 400). There have been previous discussions here and here on the same problem. Based on the latter I shrank my model to its current state but realize that the large shape of the input tensor, and given that I'm trying to put this model on a pynq-z2, might still cause problems.

Steps to Reproduce

To test this I simply opened the tutorials (which I had already gone through, before) and re-ran parts 1-4, then ran part 7a. I changed nothing of substance in any of the notebooks, except in parts 1-4 I changed the use of XILINX_VITIS to XILINX_VIVADO and Vitis to Vivado, where relevant. Otherwise the code is the original, checked-out, code. I still get the error I quoted above, in part 7a (screenshots below). Parts 1-4 ran without issue. I am using hls4ml==1.0.0. My Vivado version is 2019.2. The commit hash for the tutorial checkout is 29a7f7e7891ddc40c7feb2f9f9d7e116778785c1.

The only other issue that popped up in part 7a was the warning

WARNING:tensorflow:No training configuration found in the save file, so the model was not compiled. Compile it manually.

when running model = load_model('model_3/KERAS_check_best_model.h5', custom_objects=co). I'm not sure if that's relevant, since I haven't changed anything in the part 7a notebook.

The tutorial error surprised me since the same error appeared for my own model and the tutorial model, implying the input tensor shape of my own model isn't the origin of the error (unless the same error is being generated by more than one thing).

Additional context

Originally in my personal project, I had a much larger MLP and was getting errors similar to those here. I did perform the suggested fixes, such as changing the ReuseFactor, changing Strategy to Resource, and setting io_type to io_stream. However, the issue resolved when I shrank the model. However, then I started getting the error this post is about. I still thought that because of the size of the input tensor was large, that that was the issue. But after running the hls4ml tutorial out of the box, as is, and getting essentially the same error, I am not sure that is the case. As such, I'm not certain that this (the error thrown) is a bug or I have overlooked something/done something impermissible.

My own model is trained with Keras==2.15.0. I have also set up a pip venv with hls4ml==1.0.0 and all the necessary libraries and use that for the kernel when I run the notebooks, including for the tutorials. I am running Ubuntu 24.04.1 LTS (not sure if this matters).

If this seems to not be a bug and there is a more appropriate forum for this question, please let me know.

Thank you.

The text was updated successfully, but these errors were encountered:

bo3z · 2025-01-07T08:04:58Z

Hi @sdubey11,

Array partitioning affects how arrays are stored in FPGA memory. There are various schemes Vivado offers (cyclic, complete, block etc) - but, in hls4ml, we use the complete partitioning scheme, which essentially stores all the elements of the array in registers. By storing all the intermediate results in registers (rather than on-chip memory - BRAM), we can access and propagate the intermediate results faster and therefore, achieve higher throughput and lower latency. However, in general registers on an FPGA, are not meant for storing large arrays, as they are a limited resource and also when used excessively, can cause significant routing complexity and timing closure issues.

Therefore, Vivado includes a variable called config_array_partition which determines how many elements at most for a given array can be stored in registers. Heuristically, for Vivado HLS the hls4ml team discovered that the number that usually works well and is able to pass synthesis is 4,096, i.e. any given array you want to completely partition can have at most 4,096 elements (and this is often repeated in the tutorials). So for Dense / Fully Connected layers, where the intermediate variable mult, has dimensionality n_in * n_out cannot exceed 4,096. Of course, this value of 4,096 is a bit arbitrary; depending on your design complexity, precision etc. you may be able to use a larger limit - in the past I've successfully used up to 16,384.

In the most recent version of hls4ml, we've made it possible for users to modify this value, in case there are more complex designs that might need a larger upper limit. However, there was a bug where the default value of 4,096 wasn't propagated to the VivadoAccelerator backend and therefore, would use the Xilinx Vivado default value of 1,024. This should now be fixed by PR #1160.

Can you please check out the branch from the PR and let us know if this fixes your problem?

sdubey11 · 2025-01-10T15:47:01Z

Hi @bo3z ,

Thank you for your reply.

Can you please check out the branch from the PR and let us know if this fixes your problem?

I checked this out. To be sure, print(hls4ml.__version__) returns 1.1.0.dev5+g27c7031b. Please let me know if this is incorrect.

When I now run hls_model.build(csim=False, export=True, bitfile=True), it does not throw the error, however it throws the following.

INTERNAL-INFO: never seen llvm instruction 'fexp'(507)

WARNING: [SCHED 204-69] Unable to schedule 'store' operation ('table_out_V_addr_3_write_ln155', firmware/nnet_utils/nnet_activation.h:155) of constant 131071 on array 'table_out_V' due to limited memory ports. Please consider using a memory core with more ports or partitioning the array 'table_out_V'.

I'm not sure if this is related to that limit but the notebook I'm running for my model now seems to hang after this. It doesn't crash but I left hls_model.build(csim=False, export=True, bitfile=True) running overnight and it was still running the next day. The model is the same as I mentioned in the original post, an MLP with one hidden layer and 10 neurons.

When I apply this fix to the tutorials, the kernel notebook for part7a_bitstream crashes. Specifically, it crashes at y_hls = hls_model.predict(np.ascontiguousarray(X_test[:10])). Tutorial parts 1-4 run just fine with the checked-out branch. If I downgrade to hls4ml==0.8.0, then part7a_bitstream does not crash and the tutorial runs fine.

Are these related to the original problem? If not and they are new ones, please let me know if I should open a new issue.

Thank you.

sdubey11 added the bug label Dec 19, 2024

bo3z mentioned this issue Jan 7, 2025

Fix Vivado Accelerator missing partition factor variable #1160

Merged

7 tasks

sdubey11 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2025

sdubey11 reopened this Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-synthesis failed in both project and in part7a_bitstream tutorial #1157

Pre-synthesis failed in both project and in part7a_bitstream tutorial #1157

sdubey11 commented Dec 19, 2024

bo3z commented Jan 7, 2025

sdubey11 commented Jan 10, 2025

Pre-synthesis failed in both project and in part7a_bitstream tutorial #1157

Pre-synthesis failed in both project and in part7a_bitstream tutorial #1157

Comments

sdubey11 commented Dec 19, 2024

Prerequisites

Quick summary

Details

Steps to Reproduce

Additional context

bo3z commented Jan 7, 2025

sdubey11 commented Jan 10, 2025