-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add tutorial inductor on Windows CPU #3062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
00c6211
10b6d4b
d1a98df
4df70a7
c8f29b6
d0c485b
3ea5e05
7216e5c
d2e1f27
0bfbceb
b6c69db
b6c5ea6
84a84f7
ae04689
c7ce4a0
51b154a
8a8fe38
319c395
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
How to use TorchInductor on Windows CPU | ||
======================================= | ||
|
||
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_ | ||
|
||
|
||
|
||
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. | ||
This tutorial will guide you through the process of using TorchInductor on a Windows CPU. | ||
|
||
.. grid:: 2 | ||
|
||
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn | ||
:class-card: card-prerequisites | ||
|
||
* How to compile and execute a Python function with PyTorch, optimized for Windows CPU | ||
* Basics of TorchInductor's optimization using C++/Triton kernels. | ||
|
||
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | ||
:class-card: card-prerequisites | ||
|
||
* PyTorch v2.5 or later | ||
* Microsoft Visual C++ (MSVC) | ||
* Miniforge for Windows | ||
|
||
Install the Required Software | ||
----------------------------- | ||
|
||
First, let's install the required software. C++ compiler is required for TorchInductor optimization. | ||
We will use Microsoft Visual C++ (MSVC) for this example. | ||
|
||
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_. | ||
|
||
2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software | ||
|
||
.. note:: | ||
|
||
We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_. | ||
Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_. | ||
|
||
3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__. | ||
|
||
Set Up the Environment | ||
---------------------- | ||
|
||
#. Open the command line environment via ``cmd.exe``. | ||
#. Activate ``MSVC`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" | ||
#. Activate ``conda`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/ProgramData/miniforge3/Scripts/activate.bat" | ||
#. Create and activate a customer conda environment: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be "custom" conda environment, shouldn't it? |
||
|
||
.. code-block:: sh | ||
|
||
conda create -n inductor_cpu_windows python=3.10 -y | ||
conda activate inductor_cpu_windows | ||
|
||
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later. | ||
svekars marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Using TorchInductor on Windows CPU | ||
---------------------------------- | ||
|
||
Here’s a simple example to demonstrate how to use TorchInductor: | ||
|
||
.. code-block:: python | ||
|
||
|
||
import torch | ||
def foo(x, y): | ||
a = torch.sin(x) | ||
b = torch.cos(x) | ||
Comment on lines
+75
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If function takes two arguments, should 2nd one be used somewhere? (i.e. |
||
return a + b | ||
opt_foo1 = torch.compile(foo) | ||
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) | ||
|
||
The code above returns the following output: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would not, would it? As inputs are random |
||
|
||
.. code-block:: sh | ||
|
||
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, | ||
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], | ||
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, | ||
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], | ||
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, | ||
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], | ||
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, | ||
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], | ||
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, | ||
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], | ||
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, | ||
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], | ||
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, | ||
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], | ||
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, | ||
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], | ||
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, | ||
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], | ||
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, | ||
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) | ||
|
||
Using an Alternative Compiler for Better Performance | ||
------------------------------------------- | ||
|
||
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. | ||
agunapal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Intel Compiler | ||
^^^^^^^^^^^^^^ | ||
|
||
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``. | ||
|
||
LLVM Compiler | ||
^^^^^^^^^^^^^ | ||
|
||
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``. | ||
|
||
Conclusion | ||
---------- | ||
|
||
In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed | ||
further performance improvements with Intel Compiler and LLVM Compiler. |
Uh oh!
There was an error while loading. Please reload this page.