Skip to content

Commit c092a77

Browse files
committed
Update on "[Excutorch][Llama] Decouple input sequence length from kv cache context length"
Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
2 parents 2ae1870 + 426aae0 commit c092a77

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

examples/models/llama/export_llama_lib.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1019,11 +1019,13 @@ def _load_llama_model(
10191019
# pyre-fixme[6]: For 5th argument expected `ModelArgs` but got
10201020
# `Union[Tensor, Module]`.
10211021
model.max_seq_len,
1022+
# pyre-fixme[6]: For 6th argument expected `ModelArgs` but got
1023+
# `Union[Tensor, Module]`.
10221024
model.max_context_len,
1023-
# pyre-fixme[6]: For 6th argument expected `int` but got `Union[Tensor,
1025+
# pyre-fixme[6]: For 7th argument expected `int` but got `Union[Tensor,
10241026
# Module]`.
10251027
model.n_layers,
1026-
# pyre-fixme[6]: For 7th argument expected `int` but got `Union[Tensor,
1028+
# pyre-fixme[6]: For 8th argument expected `int` but got `Union[Tensor,
10271029
# Module]`.
10281030
model.vocab_size,
10291031
metadata_str,

0 commit comments

Comments
 (0)