fix: typos

XilunWu · XilunWu · commit 4275c42b0f3f · 2025-04-16T12:06:10.000-07:00
diff --git a/prototype_source/context_parallel.rst b/prototype_source/context_parallel.rst
@@ -82,7 +82,7 @@ To better demonstrate the usage of this API, we start with a simple code snippet
             )
             for _ in range(3)
         ]
-        # specify the SDPABackend to use
+        # specify the SDPBackend to use
         with sdpa_kernel(backend):
             out = F.scaled_dot_product_attention(*qkv, is_causal=True)
 
@@ -148,7 +148,7 @@ shard to input and distribute the computation across ranks:
             )
             for _ in range(3)
         ]
-        # specify the SDPABackend to use
+        # specify the SDPBackend to use
         with sdpa_kernel(backend):
             out = F.scaled_dot_product_attention(*qkv, is_causal=True)
 
@@ -191,7 +191,7 @@ shard to input and distribute the computation across ranks:
 
 
 You can use the command ``torchrun --standalone --nnodes=1 --nproc-per-node=4 cp_sdpa_example.py`` to launch the above context parallel
-SDPA on 4 GPUs. We demonstrate the nemuric correctness by comparing the output of Ring Attention to that of SDPA on a single GPU.
+SDPA on 4 GPUs. We demonstrate the numeric correctness by comparing the output of Ring Attention to that of SDPA on a single GPU.
 
 
 Select Rotation Approach

Original file line number	Diff line number	Diff line change
`@@ -82,7 +82,7 @@ To better demonstrate the usage of this API, we start with a simple code snippet`
`82`	`82`	`)`
`83`	`83`	`for _ in range(3)`
`84`	`84`	`]`
`85`		`- # specify the SDPABackend to use`
	`85`	`+ # specify the SDPBackend to use`
`86`	`86`	`with sdpa_kernel(backend):`
`87`	`87`	`out = F.scaled_dot_product_attention(*qkv, is_causal=True)`
`88`	`88`
`@@ -148,7 +148,7 @@ shard to input and distribute the computation across ranks:`
`148`	`148`	`)`
`149`	`149`	`for _ in range(3)`
`150`	`150`	`]`
`151`		`- # specify the SDPABackend to use`
	`151`	`+ # specify the SDPBackend to use`
`152`	`152`	`with sdpa_kernel(backend):`
`153`	`153`	`out = F.scaled_dot_product_attention(*qkv, is_causal=True)`
`154`	`154`
`@@ -191,7 +191,7 @@ shard to input and distribute the computation across ranks:`
`191`	`191`
`192`	`192`
`193`	`193`	You can use the command ``torchrun --standalone --nnodes=1 --nproc-per-node=4 cp_sdpa_example.py`` to launch the above context parallel
`194`		`-SDPA on 4 GPUs. We demonstrate the nemuric correctness by comparing the output of Ring Attention to that of SDPA on a single GPU.`
	`194`	`+SDPA on 4 GPUs. We demonstrate the numeric correctness by comparing the output of Ring Attention to that of SDPA on a single GPU.`
`195`	`195`
`196`	`196`
`197`	`197`	`Select Rotation Approach`