-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Regional compilation recipe #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
215d81e
Regional compilation recipe
anijain2305 e77a954
Merge branch 'main' into regional-compilation
svekars 52fe948
Apply suggestions from code review
anijain2305 937ae75
Move the file to recipes_source and change runner
anijain2305 acab392
Merge branch 'main' into regional-compilation
anijain2305 05bee4b
Remove toctree line
anijain2305 84f11aa
Formatting cleanup
svekars d3e65ac
Merge branch 'main' into regional-compilation
svekars 6679a07
Merge branch 'main' into regional-compilation
svekars File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
""" | ||
Reducing torch.compile cold start compilation time with regional compilation | ||
============================================================================ | ||
|
||
Introduction | ||
------------ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
As deep learning models get larger, the compilation time of these models also | ||
increase. This increase in compilation time can lead to a large startup time in | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
inference services or wasted resources in large scale training. This recipe | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
shows an example of how to reduce the cold start compilation time by choosing to | ||
compile a repeated region of the model instead of the entire model. | ||
|
||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Setup | ||
----- | ||
Before we begin, we need to install ``torch`` if it is not already | ||
available. | ||
|
||
.. code-block:: sh | ||
|
||
pip install torch | ||
|
||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
|
||
|
||
|
||
###################################################################### | ||
# Steps | ||
# ----- | ||
# | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 1. Import all necessary libraries | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 2. Define and initialize a neural network with repeated regions. | ||
# 3. Understand the difference between the full model and the regional compilation. | ||
# 4. Measure the compilation time of the full model and the regional compilation. | ||
# | ||
# 1. Import necessary libraries for loading our data | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# | ||
|
||
import torch | ||
import torch.nn as nn | ||
from time import perf_counter | ||
|
||
# | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 2. Define and initialize a neural network with repeated regions. | ||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Typically neural networks are composed of repeated layers. For example, a | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# large language model is composed of many Transformer blocks. In this recipe, | ||
# we will create a `Layer` `nn.Module` class as a proxy for a repeated region. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# We will then create a `Model` which is composed of 64 instances of this | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# `Layer` class. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
class Layer(torch.nn.Module): | ||
def __init__(self): | ||
super().__init__() | ||
self.linear1 = torch.nn.Linear(10, 10) | ||
self.relu1 = torch.nn.ReLU() | ||
self.linear2 = torch.nn.Linear(10, 10) | ||
self.relu2 = torch.nn.ReLU() | ||
|
||
def forward(self, x): | ||
a = self.linear1(x) | ||
a = self.relu1(a) | ||
a = torch.sigmoid(a) | ||
b = self.linear2(a) | ||
b = self.relu2(b) | ||
return b | ||
|
||
class Model(torch.nn.Module): | ||
def __init__(self, apply_regional_compilation): | ||
super().__init__() | ||
self.linear = torch.nn.Linear(10, 10) | ||
# Apply compile only to the repeated layers. | ||
if apply_regional_compilation: | ||
self.layers = torch.nn.ModuleList([torch.compile(Layer()) for _ in range(64)]) | ||
else: | ||
self.layers = torch.nn.ModuleList([Layer() for _ in range(64)]) | ||
|
||
def forward(self, x): | ||
# In regional compilation, the self.linear is outside of the scope of `torch.compile`. | ||
x = self.linear(x) | ||
for layer in self.layers: | ||
x = layer(x) | ||
return x | ||
|
||
# | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 3. Understand the difference between the full model and the regional compilation. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# In full model compilation, the full model is compiled as a whole. This is how | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# most users use torch.compile. In this example, we can apply torch.compile to | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# the `model` object. This will effectively inline the 64 layers, producing a | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# large graph to compile. You can look at the full graph by running this recipe | ||
# with `TORCH_LOGS=graph_code`. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# | ||
|
||
model = Model(apply_regional_compilation=False).cuda() | ||
full_compiled_model = torch.compile(model) | ||
|
||
|
||
# | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# The regional compilation, on the other hand, compiles a region of the model. | ||
# By wisely choosing to compile a repeated region of the model, we can compile a | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# much smaller graph and then reuse the compiled graph for all the regions. We | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# can apply regional compilation in the example as follows. `torch.compile` is | ||
# applied only to the `layers` and not the full model. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
|
||
regional_compiled_model = Model(apply_regional_compilation=True).cuda() | ||
|
||
# Applying compilation to a repeated region, instead of full model, leads to | ||
# large savings in compile time. Here, we will just compile a layer instance and | ||
# then reuse it 64 times in the `model` object. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# Note that with repeated regions, some part of the model might not be compiled. | ||
# For example, the `self.linear` in the `Model` is outside of the scope of | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# regional compilation. | ||
# | ||
# Also, note that there is a tradeoff between performance speedup and compile | ||
# time. The full model compilation has larger graph and therefore, | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# theoretically, has more scope for optimizations. However for practical | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# purposes and depending on the model, we have observed many cases with minimal | ||
# speedup differences between the full model and regional compilation. | ||
|
||
|
||
# | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 4. Measure the compilation time of the full model and the regional compilation. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# `torch.compile` is a JIT compiler, i.e., it compiles on the first invocation. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Here, we measure the total time spent in the first invocation. This is not | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# precise, but it gives a good idea because the majority of time is spent in | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# compilation. | ||
|
||
def measure_latency(fn, input): | ||
# Reset the compiler caches to ensure no reuse between different runs | ||
torch.compiler.reset() | ||
with torch._inductor.utils.fresh_inductor_cache(): | ||
start = perf_counter() | ||
fn(input) | ||
torch.cuda.synchronize() | ||
end = perf_counter() | ||
return end - start | ||
|
||
input = torch.randn(10, 10, device="cuda") | ||
full_model_compilation_latency = measure_latency(full_compiled_model, input) | ||
print(f"Full model compilation time = {full_model_compilation_latency:.2f} seconds") | ||
|
||
regional_compilation_latency = measure_latency(regional_compiled_model, input) | ||
print(f"Regional compilation time = {regional_compilation_latency:.2f} seconds") | ||
|
||
############################################################################ | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# This recipe shows how to control the cold start compilation time if your model | ||
# has repeated regions. This requires user changes to apply `torch.compile` to | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# the repeated regions instead of more commonly used full model compilation. We | ||
# are continually working on reducing cold start compilation time. So, please | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# stay tuned for our next tutorials. | ||
anijain2305 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# This feature is available with 2.5 release. If you are on 2.4, you can use a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could say this in the beginning of the tutorial |
||
# config flag - `torch._dynamo.config.inline_inbuilt_nn_modules=True` to avoid | ||
# recompilations on the regional compilation. In 2.5, this flag is turned on by | ||
# default. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.