

Joint ICTP-IAEA School on Systemson-Chip Based on FPGA for Scientific Instrumentation and Reconfigurable Computing

#### **HLS Demo**

#### Fernando Rincón

University of Castilla-La Mancha fernando.rincon@uclm.es



Smr3891 – ICTP (Nov. 2023)

## **Project Creation**

• File  $\rightarrow$  New Project ...



### **Project Creation**

- Add firTop.cpp as the top file
  - Then select the top module



HLS Demo

#### Smr3891 - ICTP (Nov. 2023)

### **Project Creation**

- Skip testbench
- And select Zedboard or xc7z020clg484-3 part

| New Vitis HLS Project                                                                           | ×   |
|-------------------------------------------------------------------------------------------------|-----|
| Solution Configuration<br>Create Vitis HLS solution for selected technology                     | Ho  |
| Solution Name: solution1<br>Clock<br>Period: 10 Uncertainty:                                    |     |
| Part Selection<br>Part: <b>xc7z020clg484-3</b>                                                  |     |
| Flow Target Vivado IP Flow Target Configure <u>several options</u> for the selected flow target |     |
| < Back Cancel Fin                                                                               | ish |

## First synthesis

- Solution  $\rightarrow$  Run C Synthesis  $\rightarrow$  Active Solution
- Note down:
  - Estimated clock cycle
  - Fir latency
  - BRAM, DSPs, FFs and LUTs
- Note the issues at the module and loop
- Note the Hw ports

#### Schedule Analysis

- Loop latency = 5 cycles \* 181 iterations = 905 cycles
- Expand the loop and find where time is spent

| irTop.cpp                                                                                             | 🚍 Schedule Viewe | er(solution1) S | 23    |            |      |   |  |
|-------------------------------------------------------------------------------------------------------|------------------|-----------------|-------|------------|------|---|--|
| fir Focus Off 🔹                                                                                       | <u></u>          | <b>↓</b> ↔   @  | ?     |            |      |   |  |
| Operation\Control Step                                                                                | 0                | 1               | 2     | 3          | 4    | 5 |  |
| acc(alloca)<br>i(alloca)<br>data(read)<br>i_write_ln72(write)<br>acc_write_ln72(write)<br>br_ln72(br) |                  |                 |       |            |      |   |  |
| Shift_Accum_Loop<br>acc_load_1(read)<br>acc_1(partselect)<br>sext_ln85(sext)<br>y_write_ln86(write)   |                  |                 | - Shi | ft_Accum_l | _oop |   |  |

# Let's pipeline the loop

#### • Replace the pragma disabling pipeline

- #pragma HLS PIPELINE off
- Pipeline is now the default for Vitis HLS synthesize
- Alternatively
  - #pragma HLS PIPELINE II=1
- Repeat the synthesis
- Note down the new latency and resource usage
- Did we achieve II=1?
  - Why not?

| Modules & Loops  | Issue Type       | Violation Type    | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF  | LUT | URAM |
|------------------|------------------|-------------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|-----|-----|------|
| ▼ • fir          | 💮 II Violation   | 1                 |          | -     | 366             | 3.660E3     | -                 | 367      | -          | no        | 3    | 2   | 206 | 299 | 0    |
| Shift_Accum_Loop | 🔞 🔞 II Violatior | Memory Dependency | 1        |       | 364             | 3.640E3     | 5                 | 2        | 181        | yes       |      |     |     |     | -    |

#### Schedule Analysis

#### Select Shift\_Acum\_Loop and with the context menu: Goto II Violation



🏗 🖽 🖻 🔶 🗐 눩

| No filter settings |      |      |        |             |         |      |         |
|--------------------|------|------|--------|-------------|---------|------|---------|
| Name               | BRAM | URAM | Pragma | Variable    | Storage | Impl | Latency |
| 🕶 🛛 fir            | 3    | -    |        |             |         |      |         |
| shift_reg_0_U      | 1    | -    |        | shift_reg_0 | ram_1p  | auto | 1       |
| shift_reg_1_U      | 1    | -    |        | shift_reg_1 | ram_1p  | auto | 1       |
| firCoeff_U         | 1    | -    |        | firCoeff    | rom_1p  | auto | 1       |
| C Shift_Accum_Loop |      |      |        |             |         |      |         |

# shift\_reg partitioning

- Let's turn the shift\_reg memory into a real shift register:
  - #pragma HLS ARRAY\_PARTITION dim=1 type=complete variable=shift\_reg
- This should avoid read/write contention
- But will increase resources

| Modules & Loops    | Issue Type | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
|--------------------|------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| 🕶 🛛 fir            |            |                |          | -     | 185             | 1.850E3     | -                 | 186      | -          | no        | 1    | 2   | 5981 | 1163 | 0    |
| 🕑 Shift_Accum_Loop |            |                |          | -     | 183             | 1.830E3     | 4                 | 1        | 181        | yes       | -    | -   | -    | -    | -    |

#### Interfaces

- Observe the top level interface
  - start/stop protocol
- Observe input/output variables
  - Y bus with valid signal

TOP LEVEL CONTROL

| Interface | Туре       | Ports                             |
|-----------|------------|-----------------------------------|
| ap_clk    | clock      | ap_clk                            |
| ap_rst    | reset      | ap_rst                            |
| ap_ctrl   | ap_ctrl_hs | ap_done ap_idle ap_ready ap_start |

- SW I/O Information
- Top Function Arguments

| Argument | Direction | Datatype |
|----------|-----------|----------|
| у        | out       | int*     |
| x        | in        | int*     |

SW-to-HW Mapping

| Argument | HW Interface | HW Type |
|----------|--------------|---------|
| у        | У            | port    |
| у        | y_ap_vld     | port    |
| x        | х            | port    |

#### Interfaces

- Apply pragmas to turn the module into an endless stream processor
  - Remove start/stop protocol:
    - #pragma HLS INTERFACE mode=ap\_ctrl\_hs port=return
  - Set x & y ports to stream interfaces
    - #pragma HLS INTERFACE mode=axis register\_mode=both port=y register
    - #pragma HLS INTERFACE mode=axis register\_mode=both port=x register
- Resynthesize and check the new interface

## Export RTL

- To build the IP to be integrated in Vivado
- Solution  $\rightarrow$  Export RTL



Add the path to this folder in Vivado as a new repository