We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 7e8ec2d commit 50ed60dCopy full SHA for 50ed60d
doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst
@@ -502,7 +502,7 @@ smdistributed.modelparallel.torch.nn.FlashAttentionLayer
502
This class supports
503
`FlashAttention <https://github.com/HazyResearch/flash-attention>`_
504
for PyTorch 2.0.
505
- It takes the ``qkv`` matrix as an argument through its ``forward`` class method,
+ It takes the ``qkv`` matrix as an argument through its ``forward`` class method,
506
computes attention scores and probabilities,
507
and then operates the matrix multiplication with value layers.
508
0 commit comments