File tree Expand file tree Collapse file tree 1 file changed +5
-4
lines changed
mlir/docs/Tutorials/transform Expand file tree Collapse file tree 1 file changed +5
-4
lines changed Original file line number Diff line number Diff line change @@ -583,10 +583,11 @@ LLVM IR and processed by the LLVM compiler to produce an executable or JITted.
583
583
584
584
The generated code runs in ~ 420ms on an Intel processor with Skylake
585
585
microarchitecture clocked at 2. 0GHz . Given that the computation performs
586
- $ 5 * 80 * 100 * 128 * (2 * 3 * 3 * 128 + 2 ) ~ = 5.9 * 10 ^ 9 $ floating point operations, it
587
- reaches ~ 14 GFlops. With 1 FMA unit available, the single- core performance of
588
- the test processor is 64 GFlops $ 16 * 2 * 2 * 10 ^ 9 $ , where 16 is the vector
589
- width), so only 22 % of the theoretical peak is achieved.
586
+ $ `5 \cdot 80 \cdot 100 \cdot 128 \cdot (2 \cdot 3 \cdot 3 \cdot 128 + 2) \approx 5.9 * 10^9`$
587
+ floating point operations, it reaches ~ 14 GFlops. With 1 FMA unit available,
588
+ the single- core performance of the test processor is 64 GFlops
589
+ ($ `16 \cdot 2 \cdot 2 \cdot 10^9`$, where 16 is the vector width), so only
590
+ 22 % of the theoretical peak is achieved.
590
591
591
592
The code produced by Halide runs in ~ 120ms on the same processor, a 3. 5x
592
593
improvement and 77 % of peak. Let us analyze the generated assembly to understand
You can’t perform that action at this time.
0 commit comments