Skip to content

Commit 95c889a

Browse files
committed
use --use_sdpa_with_kv_cache for 1B/3B bf16
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) [ghstack-poisoned]
1 parent 70aee72 commit 95c889a

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

examples/models/llama2/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,9 @@ LLAMA_PARAMS=path/to/params.json
142142
python -m examples.models.llama2.export_llama \
143143
--checkpoint "${LLAMA_CHECKPOINT:?}" \
144144
--params "${LLAMA_PARAMS:?}" \
145-
-kv -X \
145+
-kv \
146+
--use_sdpa_with_kv_cache \
147+
-X \
146148
-d bf16 \
147149
--metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' \
148150
--output_name="llama3_2.pte"

0 commit comments

Comments
 (0)