You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| What you will learn | How to build and run a Fortran OpenMP application using Intel Fortran compiler
40
-
| Time to complete | 10 minutes
41
43
44
+
## Key Implementation Details
45
+
The Intel® oneAPI Intel Fortran Compiler (Beta) includes all libraries and headers necessary to compile and run OpenMP* enabled Fortran applications. Users simply use the -qopenmp compiler option to compile and link their OpenMP enabled applications.
| What you will learn | Optimization using the Intel® Fortran compiler
21
+
| Time to complete | 15 minutes
22
+
23
+
## Purpose
24
+
25
+
The Intel® Fortran Compiler can optimize applications for performance. The primary compiler option is -O followed by a numeric optimizaiton "level" from 0 requesting no optimization to 3, which requests all compiler optimizations for the application. The -O optimizaition levels are:
18
26
19
27
* O0 - No optimizations
20
28
* O1 - Enables optimizations for speed and disables some optimizations that increase code size and affect speed.
@@ -23,27 +31,21 @@ The source for this program also demonstrates recommended Fortran coding practic
23
31
24
32
Read the [Intel® Fortran Compiler Developer Guide and Reference][1]
25
33
[1]: https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-developer-guide-and-reference/top.html "Intel® Fortran Compiler Developer Guide and Reference"
26
-
for more information about these options.
34
+
for more information about these options.
27
35
28
-
Some of these automatic optimizations use features and options that can
36
+
Some of these compiler optimizations use features and options that can
29
37
restrict program execution to specific architectures.
| What you will learn | Vectorization using Intel Fortran compiler
17
13
| Time to complete | 15 minutes
18
14
19
15
20
-
## License
21
-
This code sample is licensed under MIT license
22
-
23
-
### Introduction to Auto Vectorization
16
+
## Purpose
17
+
The Intel® Compiler has an auto-vectorizer that detects operations in the application
18
+
that can be done in parallel and converts sequential operations
19
+
to parallel operations by using the
20
+
Single Instruction Multiple Data (SIMD) instruction set.
24
21
25
22
For the Intel® compiler, vectorization is the unrolling of a loop combined with the generation of packed SIMD instructions. Because the packed instructions operate on more than one data element at a time, the loop can execute more efficiently. It is sometimes referred to as auto-vectorization to emphasize that the compiler automatically identifies and optimizes suitable loops on its own.
26
23
@@ -39,7 +36,7 @@ Vectorization is enabled with the compiler at optimization levels of O2 (default
39
36
40
37
4. improve performance using Interprocedural Optimization
41
38
42
-
### Preparing the Sample Application
39
+
##Key Implementation Details
43
40
44
41
In this sample, you will use the following files:
45
42
@@ -48,7 +45,20 @@ In this sample, you will use the following files:
48
45
matvec.f90
49
46
50
47
51
-
### Establishing a Performance Baseline
48
+
## License
49
+
This code sample is licensed under MIT license
50
+
51
+
52
+
## Building the `Fortran Vectorization` sample
53
+
54
+
This sample contains 2 Fortran source files, in subdirectory 'src/' under the main sample root directory oneAPI-samples/DirectProgramming/Fortran/vec_samples
55
+
56
+
1. matvec.f90 is a Fortran source file with a matrix-times-vector algorithm
57
+
2. driver.f90 is a Fortran source file with the main program calling matvec
58
+
59
+
## Running the `Fortran Vectorization` sample
60
+
61
+
### Step1 Establishing a Performance Baseline
52
62
53
63
To set a performance baseline for the improvements that follow in this sample, compile your sources from the src directory with these compiler options:
54
64
@@ -60,7 +70,7 @@ Execute 'MatVector'
60
70
and record the execution time reported in the output. This is the baseline against which subsequent improvements will be measured.
61
71
62
72
63
-
### Generating a Vectorization Report
73
+
### Step 2 Generating a Vectorization Report
64
74
65
75
A vectorization report shows what loops in your code were vectorized and explains why other loops were not vectorized. To generate a vectorization report, use the **qopt-report-phase=vec** compiler options together with **qopt-report=1** or **qopt-report=2**.
66
76
@@ -149,7 +159,7 @@ For more information on the **qopt-report** and **qopt-report-phase** compiler o
The vectorizer can generate faster code when operating on aligned data. In this activity you will improve the vectorizer performance by aligning the arrays a, b, and c in **driver.f90** on a 16-byte boundary so the vectorizer can use aligned load instructions for all arrays rather than the slower unaligned load instructions and can avoid runtime tests of alignment. Using the ALIGNED macro will insert an alignment directive for a, b, and c in driver.f90 with the following syntax:
155
165
@@ -172,7 +182,7 @@ Recompile the program after adding the ALIGNED macro to ensure consistently alig
### Improving Performance with Interprocedural Optimization
185
+
### Step 4 Improving Performance with Interprocedural Optimization
176
186
177
187
The compiler may be able to perform additional optimizations if it is able to optimize across source line boundaries. These may include, but are not limited to, function inlining. This is enabled with the **-ipo** option.
0 commit comments