Skip to content

Commit e8db72a

Browse files
authored
Merge branch 'main' into bump_ao_2262025
2 parents 96afe1f + 8f509e1 commit e8db72a

File tree

2 files changed

+2
-3
lines changed

2 files changed

+2
-3
lines changed

docs/source/backends-xnnpack.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,4 +121,3 @@ target_link_libraries(
121121
```
122122

123123
No additional steps are necessary to use the backend beyond linking the target. Any XNNPACK-delegated .pte file will automatically run on the registered backend.
124-

examples/models/llama/source_transformation/quantize.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@
1414
import torch.nn as nn
1515
import torch.nn.functional as F
1616

17-
from executorch.backends.vulkan._passes import VkInt4WeightOnlyQuantizer
18-
1917
from executorch.extension.llm.export.builder import DType
2018

2119
from sentencepiece import SentencePieceProcessor
@@ -180,6 +178,8 @@ def quantize( # noqa C901
180178
model = gptq_quantizer.quantize(model, inputs)
181179
return model
182180
elif qmode == "vulkan_4w":
181+
from executorch.backends.vulkan._passes import VkInt4WeightOnlyQuantizer
182+
183183
q_group_size = 256 if group_size is None else group_size
184184
model = VkInt4WeightOnlyQuantizer(groupsize=q_group_size).quantize(model)
185185

0 commit comments

Comments
 (0)