Update on "[Executorch][llama] Allow custom sdpa op replacement pass to leverage attention mask"

kimishpatel · kimishpatel · commit 2d9ba1420722 · 2025-04-18T11:10:48.000-07:00
Previously we assumed that the custom sdpa always does causal attention. This diff adds option to this module swap pass to make custom sdpa leverage attention mask instead of causal. Differential Revision: [D73222736](https://our.internmc.facebook.com/intern/diff/D73222736/) [ghstack-poisoned]
diff --git a/examples/models/llama/source_transformation/sdpa.py b/examples/models/llama/source_transformation/sdpa.py
@@ -47,7 +47,6 @@ def forward(
             torch._check_is_size(start_pos)
             torch._check(start_pos < self.max_context_len)
             seq_length = q.size(2)
-            # pyre-ignore: Incompatible parameter type [6]
             mask = mask.narrow(0, start_pos, seq_length)
         else:
             mask = mask[input_pos]