[Executorch][llama] bug fix for custom sdpa for attention bias

kimishpatel · kimishpatel · commit e6b5f525d36f · 2025-04-17T16:05:55.000-07:00
When using attention bias dont override seq length for causal attention Differential Revision: [D73222733](https://our.internmc.facebook.com/intern/diff/D73222733/) [ghstack-poisoned]
diff --git a/extension/llm/custom_ops/op_sdpa.cpp b/extension/llm/custom_ops/op_sdpa.cpp
@@ -400,7 +400,8 @@ Tensor& custom_sdpa_out_impl(
 
   ET_CHECK_MSG(q.dim() == 4, "query must be a 4D tensor");
 
-  const int64_t num_keys_for_causal_attention = start_pos + seq_len;
+  const int64_t num_keys_for_causal_attention =
+      attn_mask.has_value() ? -1 : start_pos + seq_len;
 
   ET_KERNEL_CHECK(
       ctx,