Skip to content

Commit 80787b4

Browse files
committed
Creating an initial Quantization Directory (#863)
* Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Update Quant call using llama.cpp (#868) llama.cpp did a BC breaking refactor: ggml-org/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867)
1 parent b9559ae commit 80787b4

File tree

6 files changed

+6
-6
lines changed

6 files changed

+6
-6
lines changed

build/builder.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import torch._inductor.config
1818

1919
from config.model_config import resolve_model_config
20-
from quantize import quantize_model
20+
from quantization.quantize import quantize_model
2121

2222
from build.model import Transformer
2323
from build.utils import device_sync, is_cpu_device, is_cuda_or_cpu_device, name_to_dtype

build/gguf_loader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
import torch
1515

1616
from gguf import GGUFValueType
17-
from quantize import pack_scales_and_zeros, WeightOnlyInt4Linear
17+
from quantization.quantize import pack_scales_and_zeros, WeightOnlyInt4Linear
1818

1919
from build.gguf_util import Q4_0, to_float
2020
from build.model import ModelArgs, Transformer

build/gguf_util.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
import gguf
88
import torch
9-
from quantize import group_dequantize_tensor_from_qparams
9+
from quantization.quantize import group_dequantize_tensor_from_qparams
1010

1111

1212
def to_float(t: gguf.gguf_reader.ReaderTensor):

docs/quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ for valid `bitwidth` and `groupsize` values.
8383
| linear with GPTQ (asymmetric) | `'{"linear:int4-gptq" : {"groupsize" : <groupsize>}}'`|
8484
| embedding | `'{"embedding": {"bitwidth": <bitwidth>, "groupsize":<groupsize>}}'` |
8585

86-
See the available quantization schemes [here](https://github.com/pytorch/torchchat/blob/main/quantize.py#L1260-L1266).
86+
See the available quantization schemes [here](https://github.com/pytorch/torchchat/blob/main/quantization/quantize.py#L1260-L1266).
8787

8888
## Examples
8989
We can mix and match weight quantization with embedding quantization.

qops.py renamed to quantization/qops.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,7 +390,7 @@ def _check_k(cls, *, k, groupsize=1, inner_k_tiles=1):
390390
def _prepare_weight_and_scales_and_zeros(
391391
cls, weight_bf16, groupsize, inner_k_tiles
392392
):
393-
from quantize import group_quantize_tensor
393+
from quantization.quantize import group_quantize_tensor
394394

395395
weight_int32, scales_and_zeros = group_quantize_tensor(
396396
weight_bf16, n_bit=4, groupsize=groupsize

quantize.py renamed to quantization/quantize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
state_dict_device,
2424
)
2525

26-
from qops import (
26+
from quantization.qops import (
2727
LinearAct8Int4DQ,
2828
LinearInt4 as WeightOnlyInt4Linear,
2929
LinearInt8 as WeightOnlyInt8Linear,

0 commit comments

Comments
 (0)