9
9
"""
10
10
Configurations for exporting Llama.
11
11
12
- Uses dataclases , which integrate with OmegaConf and Hydra.
12
+ Uses dataclasses , which integrate with OmegaConf and Hydra.
13
13
"""
14
14
15
15
import ast
@@ -45,7 +45,7 @@ class PreqMode(str, Enum):
45
45
If you are dealing with pre-quantized checkpoints, this used to
46
46
be the way to specify them. Now you don't need to specify these
47
47
options if you use a TorchAo-prequantized checkpoint, but they
48
- are still around to preservce backward compatibility.
48
+ are still around to preserve backward compatibility.
49
49
"""
50
50
51
51
PREQ_8DA4W = "8da4w"
@@ -65,17 +65,17 @@ class BaseConfig:
65
65
If left empty will use defaults specified in model_args.py.
66
66
checkpoint: Path to the checkpoint file.
67
67
If left empty, the model will be initialized with random weights.
68
- checkpoint_dir: Path to directory containt sharded checkpoint files.
68
+ checkpoint_dir: Path to directory containing sharded checkpoint files.
69
69
tokenizer_path: Path to the tokenizer file.
70
- metadata: Json string containining metadata information.
70
+ metadata: Json string containing metadata information.
71
71
e.g. '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
72
72
use_lora: Rank of the LoRA, if set to 0 then this means no LoRA. For use with QAT.
73
73
fairseq2: For legacy internal use cases, this is safe to ignore.
74
74
preq_mode: Legacy option to specify how prequantized weights are loaded.
75
75
Going forward, ExecuTorch supports loading weights prequantized through
76
76
TorchAo as-is, without any special handling.
77
- preq_group_size: Legacy option to specify the gropu size of prequantized weights.
78
- preq_embedding_quantize: Legacy option to specify how prequanitzed embeddings
77
+ preq_group_size: Legacy option to specify the group size of prequantized weights.
78
+ preq_embedding_quantize: Legacy option to specify how prequantized embeddings
79
79
are loaded.
80
80
"""
81
81
@@ -124,10 +124,10 @@ class ModelConfig:
124
124
token generation.
125
125
use_shared_embeddings: whether the embedding/output weights should be
126
126
shared. Only available with torchao kernels, e.g. when
127
- qmode set to use a "torchao:8da(\d+)w" pattern.
128
- use_sdpa_with_kv_cache: Whether to use flash attention by subtituting
127
+ qmode set to use a "torchao:8da(\\ d+)w" pattern.
128
+ use_sdpa_with_kv_cache: Whether to use flash attention by substituting
129
129
for our custom SDPA op. Note that the naming is poor and this
130
- doesn't actually ahve anything to do with the kv_cache at the moment.
130
+ doesn't actually have anything to do with the kv_cache at the moment.
131
131
expand_rope_table: Temporary workaround to expand sin/cos table in head
132
132
dim to take vectorized path in optimized kernels.
133
133
use_attention_sink: Whether to use attention sink to support multi-round
@@ -140,7 +140,7 @@ class ModelConfig:
140
140
quantize_kv_cache: Whether to perform int8 per token quantization on the KV cache.
141
141
local_global_attention: List of integers specifying local and global attention pattern.
142
142
e.g., [0, 16, 0, 16] to specify that every other layer is sliding window of 16.
143
- [0, 16, 32] pattern specifes 2nd and 3rd layers have sliding windows of 16 and 32.
143
+ [0, 16, 32] pattern specifies 2nd and 3rd layers have sliding windows of 16 and 32.
144
144
[16] pattern specifies all layers have a sliding window of 16.
145
145
"""
146
146
0 commit comments