Skip to content

Commit c2f6f84

Browse files
committed
Added CUDA graph, Tensor Core and Core pinning explaination
1 parent a58f40f commit c2f6f84

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

recipes_source/recipes/tuning_guide.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,7 @@ def gelu(x):
213213

214214
###############################################################################
215215
# Typically, the following environment variables are used to set for CPU affinity with GNU OpenMP implementation. ``OMP_PROC_BIND`` specifies whether threads may be moved between processors. Setting it to CLOSE keeps OpenMP threads close to the primary thread in contiguous place partitions. ``OMP_SCHEDULE`` determines how OpenMP threads are scheduled. ``GOMP_CPU_AFFINITY`` binds threads to specific CPUs.
216+
# An important tuning parameter is core pinning which prevent the threads of migrating between multiple CPUs, enchancing data location and minimizing intra core communication.
216217
#
217218
# .. code-block:: sh
218219
#
@@ -318,6 +319,33 @@ def gelu(x):
318319
# GPU specific optimizations
319320
# --------------------------
320321

322+
###############################################################################
323+
# Enable Tensor cores
324+
# ~~~~~~~~~~~~~~~~~~~~~~~
325+
# Tensor cores are specialized hardware to compute matrix-matrix multiplication
326+
# operations which neural network operation can take advantage of.
327+
#
328+
# Hardware tensor core operations tend to use a different floating point format
329+
# which sacrifices precision at expense of speed gains.
330+
331+
torch.backends.cuda.matmul.allow_tf32
332+
333+
# Prior to pytorch 1.12 this was enabled by default but since this version
334+
# it must be explicitly set as it can conflict with some operations which do not
335+
# benefit from Tensor core computations.
336+
337+
338+
###############################################################################
339+
# Use CUDA Graphs
340+
# ~~~~~~~~~~~~~~~~~~~~~~~
341+
# At the time of using a GPU, work first must be launched from the CPU and
342+
# on some cases the context switch between CPU and GPU can lead to bad resourse
343+
# utilization. CUDA graphs are a way to keep computation within the GPU without
344+
# paying the extra cost of kernel launches and host synchronization.
345+
#
346+
# It can be enabled using `torch.compile <https://pytorch.org/docs/stable/generated/torch.compile.html>`_ "reduce-overhead" and "max-autotune" modes.
347+
# Special care must be present when using cuda graphs as it can lead to increased memory consumption and some models might not compile.
348+
321349
###############################################################################
322350
# Enable cuDNN auto-tuner
323351
# ~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)