Transcription of Vitis High-Level Synthesis User Guide
1 Vitis High-Level SynthesisUser GuideUG1399 ( ) June 24, 2020 Revision HistoryThe following table shows the revision history for this Summary06/24/2020 Version the Vitis Kernel FlowGrammatical updatesCleanup of figures and Version HistoryUG1399 ( ) June 24, 2020 HLS user Guide 2 Send FeedbackTable of ContentsRevision 1: Using Vitis 6 Introduction to Vitis Vitis HLS Process a New Vitis HLS Code with C the the Results of the HLS Co-Simulation in Vitis the RTL Vitis HLS from the Command 93 Chapter 2: Programming for Vitis HLS Coding 95 Managing Interface Techniques in Vitis 3: Command 293vitis_hls 4: AXI4-Lite Slave C Driver 411UG1399 ( ) June 24, 2020 HLS user Guide 3 Send 5: Vitis HLS Libraries Precision Data Types HLS Math Stream 485 HLS IP A: Additional Resources and Legal Navigator and Design ( ) June 24, 2020 HLS user Guide 4 Send FeedbackPlease Read: Important Legal 519UG1399 ( ) June 24, 2020 HLS user Guide 5 Send FeedbackChapter 1 Using Vitis HLSI ntroduction to Vitis HLSThe Vitis HLS tool has been developed to simplify the use of C/C++ functions forimplementation as hardware kernels in the Vitis application acceleration development flow.
2 Andto use C/C++ code for developing RTL IP for FPGA the Vitis application acceleration flow the Vitis HLS tool automates much of the codemodifications required to implement and optimize the C/C++ code in programmable logic, andachieve low latency and high throughput. The inference of required pragmas to produce the rightinterface for your function arguments, and to pipeline loops and functions within your code isthe foundation of Vitis HLS in the application acceleration HLS also supports customization of your code to implement different interface standards, orspecific optimizations to achieve your design following is the Vitis HLS design flow:1. Compile, simulate, and debug the C/C++ View reports to analyze and optimize the Synthesize the C algorithm into an RTL Verify the RTL implementation using RTL Compile the RTL implementation into a compiled object file (.)
3 Xo), or export to an RTL of High-Level SynthesisThe Xilinx Vitis HLS tool synthesizes a C or C++ function into RTL code for acceleration inprogrammable logic. Vitis HLS is tightly integrated with the Vitis core development kit and theapplication acceleration design benefits of using a High-Level Synthesis (HLS) design methodology include: Developing and validating algorithms at the C-level to design at a level that is abstract fromthe hardware implementation 1: Using Vitis HLSUG1399 ( ) June 24, 2020 HLS user Guide 6 Send Feedback Using C-simulation to validate the design, and iterate more quickly than with traditional RTLdesign. Controlling the C- Synthesis process using optimization pragmas to create high -performanceimplementations.
4 Creating multiple design solutions from the C source code and pragmas to explore the designspace, and find an optimal solution. Quickly recompile the C-source to target different platforms and hardware includes the following stages:1. Scheduling determines which operations occur during each clock cycle based on: When an operation s dependencies have been satisfied or are available. The length of the clock cycle or clock frequency. The time it takes for the operation to complete, as defined by the target device. The available resource allocation. Incorporation of any user -specified optimization : More operations can be completed in a single clock cycle for longer clock periods, or if a fasterdevice is targeted, and all operations might complete in one clock cycle.
5 However, for shorter clockperiods, or when slower devices are targeted, HLS automatically schedules operations over more clockcycles. Some operations might need to be implemented as multi-cycle Binding determines which hardware resources implement each scheduled operation, andmaps operators (such as addition, multiplication, and shift) to specific RTL example, a mult operation can be implemented in RTL as a combinational or Control logic extraction creates a finite state machine (FSM) that sequences the operations inthe RTL design according to the defined and Binding ExampleThe following figure shows an example of the scheduling and binding phases for this codeexample:int foo(char x, char a, char b, char c) { char y; y = x*a+b+c; return y.}
6 }Chapter 1: Using Vitis HLSUG1399 ( ) June 24, 2020 HLS user Guide 7 Send FeedbackFigure 1: Scheduling and Binding ExampleTarget BindingPhaseDSPAddSubInitial BindingPhaseScheduling PhaseX14220-052220 Clock Cycleax+123*bc+yMulAddSubAddSubIn the scheduling phase of this example, High-Level Synthesis schedules the following operationsto occur during each clock cycle: First clock cycle: Multiplication and the first addition Second clock cycle: Second addition, if the result of the first addition is available in the secondclock cycle, and output generationNote: In the preceding figure, the square between the first and second clock cycles indicates when aninternal register stores a variable. In this example, High-Level Synthesis only requires that the output of theaddition is registered across a clock cycle.
7 The first cycle reads x, a, and b data ports. The second cyclereads data port c and generates output the final hardware implementation, High-Level Synthesis implements the arguments to the top-level function as input and output (I/O) ports. In this example, the arguments are simple dataports. Because each input variable is a char type, the input data ports are all 8-bits wide. Thefunction return is a 32-bit int data type, and the output data port is 32-bits ! The advantage of implementing the C code in the hardware is that all operations finish in ashorter number of clock cycles. In this example, the operations complete in only two clock cycles. In acentral processing unit (CPU), even this simple code example takes more clock cycles to the initial binding phase of this example, High-Level Synthesis implements the multiplieroperation using a combinational multiplier (Mul) and implements both add operations using acombinational adder/subtractor (AddSub).
8 Chapter 1: Using Vitis HLSUG1399 ( ) June 24, 2020 HLS user Guide 8 Send FeedbackIn the target binding phase, High-Level Synthesis implements both the multiplier and one of theaddition operations using a DSP module resource. Some applications use many binary multipliersand accumulators that are best implemented in dedicated DSP resources. The DSP module is acomputational block available in the FPGA architecture that provides the ideal balance of high -performance and efficient Control Logic and Implementing I/O Ports ExampleThe following figure shows the extraction of control logic and implementation of I/O ports forthis code example:void foo(int in[3], char a, char b, char c, int out[3]) { int x,y; for(int i = 0; i < 3; i++) { x = in[i]; y = a*x + b + c; out[i] = y.}}
9 }}Figure 2: Control Logic Extraction and I/O Port Implementation ExampleClockbcain_data++*out_ceout_weout _addrin_addrin_cexyFinite State Machine (FSM)C0C1C2C3x3+X14218out_dataChapter 1: Using Vitis HLSUG1399 ( ) June 24, 2020 HLS user Guide 9 Send FeedbackThis code example performs the same operations as the previous example. However, it performsthe operations inside a for-loop, and two of the function arguments are arrays. The resultingdesign executes the logic inside the for-loop three times when the code is scheduled. high -levelsynthesis automatically extracts the control logic from the C code and creates an FSM in the RTLdesign to sequence these operations. High-Level Synthesis implements the top-level functionarguments as ports in the final RTL design.
10 The scalar variable of type char maps into a standard8-bit data bus port. Array arguments, such as in and out, contain an entire collection of High-Level Synthesis , arrays are synthesized into block RAM by default, but other options arepossible, such as FIFOs, distributed RAM, and individual registers. When using arrays asarguments in the top-level function, High-Level Synthesis assumes that the block RAM is outsidethe top-level function and automatically creates ports to access a block RAM outside the design,such as data ports, address ports, and any required chip-enable or write-enable FSM controls when the registers store data and controls the state of any I/O control FSM starts in the state C0.
