Skip to content

Commit eaceca8

Browse files
authored
Merge branch 'oneapi-src:development' into development
2 parents 767bc89 + 399c2a9 commit eaceca8

File tree

788 files changed

+480
-267030
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

788 files changed

+480
-267030
lines changed
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
ipykernel
2-
matplotlib
3-
sentence_transformers
4-
transformers
5-
datasets
6-
accelerate
7-
wordcloud
8-
spacy
9-
jinja2
1+
ipykernel
2+
matplotlib
3+
sentence-transformers
4+
transformers
5+
datasets
6+
accelerate
7+
wordcloud
8+
spacy
9+
jinja2
1010
nltk
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
notebook
22
Pillow
3-
tensorflow_hub==0.16
3+
tensorflow-hub==0.16
44
requests
55
py-cpuinfo
66

Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
neural_compressor==2.4.1
1+
neural-compressor==2.4.1
22
Pillow
33
py-cpuinfo
44
requests
5-
tensorflow_hub==0.16.0
5+
tensorflow-hub==0.16.0

AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Enabling_Auto_Mixed_Precision_for_TransferLearning/scripts/plot.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@
88
throughput = line.split(': ')[1]
99
throughput_list.append(float(throughput))
1010

11-
print("Throughput list: ", throughput_list)
12-
speedup = float(throughput_list[1])/float(throughput_list[0])
13-
print("Speedup : ", speedup)
14-
df = pd.DataFrame({'pretrained_model':['saved model', 'optimized model'], 'Speedup':[1, speedup]})
15-
ax = df.plot.bar( x='pretrained_model', y='Speedup', rot=0)
11+
if len(throughput_list) == 2:
12+
speedup = float(throughput_list[1])/float(throughput_list[0])
13+
print("Speedup : ", speedup)
14+
df = pd.DataFrame({'pretrained_model':['saved model', 'optimized model'], 'Speedup':[1, speedup]})
15+
ax = df.plot.bar( x='pretrained_model', y='Speedup', rot=0)
16+
else:
17+
print("Incorrect data size to calculate speedup")
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
accelerate==0.29.3
22
datasets==2.09.0
33
intel-extension-for-transformers==1.4.1
4-
neural_speed==1.0
4+
neural-speed==1.0
55
peft==0.10.0
66
sentencepiece
77
transformers==4.38.0
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
neural_compressor==2.1
1+
neural-compressor==2.1
22
transformers>=4.27.4
3-
datasets>=2.4.0
3+
datasets>=2.4.0
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
tensorflow_hub
1+
tensorflow-hub
22
ipykernel
33
matplotlib

DirectProgramming/C++/ParallelPatterns/openmp_reduction/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This example demonstrates how to perform reduction by using the CPU in serial mo
2323

2424
This code shows how to use OpenMP on the CPU host as well as using target offload capabilities.
2525

26-
The different modes use a simple calculation using the well known mathematical formula that states if one integrates from 0 to 1 over the function, $(4.0/(1+x*x))dx$, the answer is pi. One can approximate this integral by summing up the area of a large number of rectangles over this same range.
26+
The different modes use a simple calculation using the well known mathematical formula, $\int_{0}^{1} \frac{4}{1 + x^2}\\mathrm{d}x$, the answer is pi. One can approximate this integral by summing up the area of a large number of rectangles over this same range.
2727

2828
Each of the different functions calculates pi by breaking the range into many tiny rectangles and then summing up the results.
2929

DirectProgramming/C++SYCL/DenseLinearAlgebra/simple-add/README.md

Lines changed: 5 additions & 153 deletions
Original file line numberDiff line numberDiff line change
@@ -15,27 +15,16 @@ The `Simple Add` sample is a simple program that adds two large vectors of integ
1515
The basic SYCL implementations explained in the sample includes device selector,
1616
USM, buffer, accessor, kernel, and command groups.
1717

18-
>**Note**: See the `Base: Vector Add` sample to examine another getting started sample you can use to learn more about using the Intel® oneAPI Toolkits to develop SYCL-compliant applications for CPU, GPU, and FPGA devices.
18+
>**Note**: See the `Base: Vector Add` sample to examine another getting started sample you can use to learn more about using the Intel® oneAPI Toolkits to develop SYCL-compliant applications for CPU and GPU devices.
1919
2020
## Prerequisites
2121

2222
| Optimized for | Description
2323
|:--- |:---
2424
| OS | Ubuntu* 18.04 <br> Windows* 10, 11
25-
| Hardware | GEN9 or newer <br> Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs
25+
| Hardware | GEN9 or newer
2626
| Software | Intel® oneAPI DPC++/C++ Compiler
2727

28-
29-
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for CPU, GPU, FPGA emulation, generating FPGA reports and generating RTL for FPGAs, there are extra software requirements for the FPGA simulation flow and FPGA compiles.
30-
>
31-
> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
32-
> - Questa*-Intel® FPGA Edition
33-
> - Questa*-Intel® FPGA Starter Edition
34-
> - ModelSim® SE
35-
>
36-
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
37-
> **Warning** Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
38-
3928
## Key Implementation Details
4029

4130
This sample provides examples of both buffers and USM implementations for simple side-by-side comparison.
@@ -105,20 +94,7 @@ To learn more about the extensions and how to configure the oneAPI environment,
10594
cmake .. -DUSM=1
10695
```
10796

108-
> **Note**: When building for FPGAs, the default FPGA family will be used (Intel® Agilex® 7).
109-
> You can change the default target by using the command:
110-
> ```
111-
> cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
112-
> ```
113-
>
114-
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
115-
> ```
116-
> cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
117-
> ```
118-
>
119-
> You will only be able to run an executable on the FPGA if you specified a BSP.
120-
121-
#### Build for CPU and GPU
97+
#### Build
12298

12399
1. Build the program.
124100
```
@@ -129,33 +105,6 @@ To learn more about the extensions and how to configure the oneAPI environment,
129105
make clean
130106
```
131107

132-
#### Build for FPGA
133-
134-
1. Compile for FPGA emulation.
135-
```
136-
make fpga_emu
137-
```
138-
2. Compile for simulation (fast compile time, targets simulator FPGA device):
139-
```
140-
make fpga_sim
141-
```
142-
3. Generate HTML performance reports.
143-
```
144-
make report
145-
```
146-
The reports reside at `simple-add_report.prj/reports/report.html`.
147-
148-
4. Compile the program for FPGA hardware. (Compiling for hardware can take a long
149-
time.)
150-
```
151-
make fpga
152-
```
153-
154-
5. Clean the program. (Optional)
155-
```
156-
make clean
157-
```
158-
159108
### On Windows*
160109

161110
#### Configure the build system
@@ -177,20 +126,7 @@ time.)
177126
cmake -G "NMake Makefiles" .. -DUSM=1
178127
```
179128

180-
> **Note**: When building for FPGAs, the default FPGA family will be used (Intel® Agilex® 7).
181-
> You can change the default target by using the command:
182-
> ```
183-
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
184-
> ```
185-
>
186-
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
187-
> ```
188-
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
189-
> ```
190-
>
191-
> You will only be able to run an executable on the FPGA if you specified a BSP.
192-
193-
#### Build for CPU and GPU
129+
#### Build
194130

195131
1. Build the program.
196132
```
@@ -201,35 +137,6 @@ time.)
201137
nmake clean
202138
```
203139

204-
#### Build for FPGA
205-
206-
>**Note**: Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
207-
208-
1. Compile for FPGA emulation.
209-
```
210-
nmake fpga_emu
211-
```
212-
2. Compile for simulation (fast compile time, targets simulator FPGA device):
213-
```
214-
nmake fpga_sim
215-
```
216-
3. Generate HTML performance reports.
217-
```
218-
nmake report
219-
```
220-
The reports reside at `simple-add_report.prj/reports/report.html`.
221-
222-
4. Compile the program for FPGA hardware. (Compiling for hardware can take a long
223-
time.)
224-
```
225-
nmake fpga
226-
```
227-
228-
5. Clean the program. (Optional)
229-
```
230-
nmake clean
231-
```
232-
233140
#### Troubleshooting
234141

235142
If an error occurs, you can get more details by running `make` with
@@ -243,39 +150,16 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
243150

244151
### On Linux
245152

246-
#### Run for CPU and GPU
247-
248153
1. Change to the output directory.
249154

250155
2. Run the program for Unified Shared Memory (USM) and buffers.
251156
```
252157
./simple-add-buffers
253158
./simple-add-usm
254159
```
255-
#### Run for FPGA
256-
257-
1. Change to the output directory.
258-
259-
2. Run for FPGA emulation.
260-
```
261-
./simple-add-buffers.fpga_emu
262-
./simple-add-usm.fpga_emu
263-
```
264-
3. Run on FPGA simulator.
265-
```
266-
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./simple-add-buffers.fpga_sim
267-
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./simple-add-usm.fpga_sim
268-
```
269-
4. Run on FPGA hardware (only if you ran `cmake` with `-DFPGA_DEVICE=<board-support-package>:<board-variant>`).
270-
```
271-
./simple-add-buffers.fpga
272-
./simple-add-usm.fpga
273-
```
274160

275161
### On Windows
276162

277-
#### Run for CPU and GPU
278-
279163
1. Change to the output directory.
280164

281165
2. Run the program for Unified Shared Memory (USM) and buffers.
@@ -284,31 +168,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
284168
simple-add-buffers.exe
285169
```
286170

287-
#### Run for FPGA
288-
289-
1. Change to the output directory.
290-
291-
2. Run for FPGA emulation.
292-
```
293-
simple-add-buffers.fpga_emu.exe
294-
simple-add-usm.fpga_emu.exe
295-
```
296-
3. Run on FPGA simulator.
297-
```
298-
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
299-
simple-add-buffers.fpga_sim.exe
300-
simple-add-usm.fpga_sim.exe
301-
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
302-
```
303-
4. Run on FPGA hardware (only if you ran `cmake` with `-DFPGA_DEVICE=<board-support-package>:<board-variant>`).
304-
```
305-
simple-add-buffers.fpga.exe
306-
simple-add-usm.fpga.exe
307-
```
308-
309171
### Build and Run the `Simple Add` Sample in Intel® DevCloud (Optional)
310172

311-
When running a sample in the Intel® DevCloud, you must specify the compute node (CPU, GPU, FPGA) and whether to run in batch or interactive mode.
173+
When running a sample in the Intel® DevCloud, you must specify the compute node (CPU, GPU) and whether to run in batch or interactive mode.
312174

313175
>**Note**: Since Intel® DevCloud for oneAPI includes the appropriate development environment already configured, you do not need to set environment variables.
314176
@@ -328,19 +190,9 @@ qsub -I -l nodes=1:gpu:ppn=2 -d .
328190
|:--- |:---
329191
|GPU |`qsub -l nodes=1:gpu:ppn=2 -d .`
330192
|CPU |`qsub -l nodes=1:xeon:ppn=2 -d .`
331-
|FPGA Compile Time |`qsub -l nodes=1:fpga_compile:ppn=2 -d .`
332-
|FPGA Runtime (Arria 10) |`qsub -l nodes=1:fpga_runtime:arria10:ppn=2 -d .`
333-
334193

335194
>**Note**: For more information on how to specify compute nodes, read *[Launch and manage jobs](https://devcloud.intel.com/oneapi/documentation/job-submission/)* in the Intel® DevCloud for oneAPI Documentation.
336195
337-
Only `fpga_compile` nodes support compiling to FPGA. When compiling for FPGA hardware, increase the job timeout to **24 hours**.
338-
339-
Executing programs on FPGA hardware is only supported on `fpga_runtime` nodes of the appropriate type, such as `fpga_runtime:arria10`.
340-
341-
Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the Intel® DevCloud for oneAPI [*Intel® oneAPI Base Toolkit Get Started*](https://devcloud.intel.com/oneapi/get_started/) page.
342-
343-
344196
## Example Output
345197
```
346198
simple-add output snippet changed to:

0 commit comments

Comments
 (0)