{executorch][llama] support mqa

kimishpatel · larryliu0820 · commit 988485c046f7 · 2024-04-17T18:28:50.000-07:00
Summary: This diff adds support for multi query attention for sdpa with kv cache

Reviewed By: iseeyuan

Differential Revision: D56212419
diff --git a/examples/models/llama2/custom_ops/op_sdpa.cpp b/examples/models/llama2/custom_ops/op_sdpa.cpp
@@ -240,6 +240,7 @@ void cpu_flash_attention(
       " and num kv heads=%" PRId64,
       num_head,
       num_heads_kv);
+
   int64_t num_reps = num_head / num_heads_kv;
 
   bool has_attn_mask = attn_mask.has_value() && attn_mask.value().numel();