Skip to content

Commit 988485c

Browse files
kimishpatellarryliu0820
authored andcommitted
{executorch][llama] support mqa
Summary: This diff adds support for multi query attention for sdpa with kv cache Reviewed By: iseeyuan Differential Revision: D56212419
1 parent 203ae40 commit 988485c

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

examples/models/llama2/custom_ops/op_sdpa.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ void cpu_flash_attention(
240240
" and num kv heads=%" PRId64,
241241
num_head,
242242
num_heads_kv);
243+
243244
int64_t num_reps = num_head / num_heads_kv;
244245

245246
bool has_attn_mask = attn_mask.has_value() && attn_mask.value().numel();

0 commit comments

Comments
 (0)