Skip to content

Add tutorial inductor on Windows CPU #3062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 30, 2024
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions prototype_source/inductor_windows_cpu.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
How to use TorchInductor on Windows CPU
=======================================

**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_



TorchInductor is a new compiler backend that compiles FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
This tutorial will guide you through the process of using TorchInductor on a Windows CPU.

.. grid:: 2

.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
:class-card: card-prerequisites

* How to compile and execute a Python function with PyTorch, optimized for Windows CPU
* Basics of TorchInductor's optimization using C++/Triton kernels.

.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
:class-card: card-prerequisites

* PyTorch v2.5 or later
* Microsoft Visual C++ (MSVC)
* Miniforge for Windows

Install the Required Software
---------------------

First, let's install the required software. C++ compiler is required for TorchInductor optimization.
We will use Microsoft Visual C++ (MSVC) for this example.

1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.

1. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section. Then install the software

.. note::

We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_.


1. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__.

Set Up Environment
^^^^^^^^^^^^^^^^^^

#. Open the command line environment via ``cmd.exe``.
#. Activate `MSVC` with the following command::

"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
#. Activate `conda` with the following command:

.. code-block:: sh

"C:/ProgramData/miniforge3/Scripts/activate.bat"
#. Create and activate a customer conda environment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "custom" conda environment, shouldn't it?


.. code-block:: sh

conda create -n inductor_cpu_windows python=3.10 -y
conda activate inductor_cpu_windows

#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later.

Using TorchInductor on Windows CPU
----------------------------------------

Here’s a simple example to demonstrate how to use TorchInductor:

.. code-block:: python


import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
Comment on lines +75 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If function takes two arguments, should 2nd one be used somewhere? (i.e. y argument is currently unused in the codebase, is it?)

return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

The code above returns the following output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not, would it? As inputs are random


.. code-block:: sh

tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])

Conclusion
----------

In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch 2.5 or later.
Loading