Skip to content

Commit daf1a14

Browse files
Jack-Khuumalfet
authored andcommitted
Removing GPTQ from all of torchchat (#864)
* Removing GPTQ from all of torchchat * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Rebase + Add back accidental deletion * Update Quant call using llama.cpp (#868) llama.cpp did a BC breaking refactor: ggml-org/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Creating an initial Quantization Directory (#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Update Quant call using llama.cpp (#868) llama.cpp did a BC breaking refactor: ggml-org/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Removing all references to HQQ (#869) * Removing all references to HQQ * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Update Quant call using llama.cpp (#868) llama.cpp did a BC breaking refactor: ggml-org/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Creating an initial Quantization Directory (#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867) * Update Quant call using llama.cpp (#868) llama.cpp did a BC breaking refactor: ggml-org/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (#867)
1 parent 2ac37e2 commit daf1a14

File tree

7 files changed

+2
-766
lines changed

7 files changed

+2
-766
lines changed

.ci/scripts/validate.sh

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -99,10 +99,6 @@ function generate_compiled_model_output() {
9999
.ci/scripts/check_gibberish "$MODEL_DIR/output_eager"
100100
python3 -W ignore generate.py --dtype ${DTYPE} --compile --quant '{"linear:int4" : {"groupsize": 32}}' --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --device "$TARGET_DEVICE" > "$MODEL_DIR/output_compiled" || exit 1
101101
.ci/scripts/check_gibberish "$MODEL_DIR/output_compiled"
102-
if [ "$TARGET_DEVICE" == "cuda" ]; then
103-
python3 -W ignore generate.py --dtype ${DTYPE} --compile --quant '{"linear:int4-gptq" : {"groupsize": 32}}' --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --device "$TARGET_DEVICE" > "$MODEL_DIR/output_compiled" || exit 1
104-
.ci/scripts/check_gibberish "$MODEL_DIR/output_compiled"
105-
fi
106102
fi
107103
done
108104
}
@@ -181,14 +177,10 @@ function generate_aoti_model_output() {
181177
python3 -W ignore generate.py --dtype ${DTYPE} --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" > "$MODEL_DIR/output_aoti" || exit 1
182178
.ci/scripts/check_gibberish "$MODEL_DIR/output_aoti"
183179
fi
184-
185180
echo "******************************************"
186181
echo "******** INT4 group-wise quantized *******"
187182
echo "******************************************"
188183
if [ "$TARGET_DEVICE" == "cuda" ]; then
189-
python3 -W ignore export.py --dtype ${DTYPE} --quant '{"linear:int4-gptq" : {"groupsize": 32}}' --checkpoint-path "$CHECKPOINT_PATH" --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" || exit 1
190-
python3 -W ignore generate.py --dtype ${DTYPE} --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" > "$MODEL_DIR/output_aoti" || exit 1
191-
.ci/scripts/check_gibberish "$MODEL_DIR/output_aoti"
192184
if [ "$DTYPE" != "float16" ]; then
193185
python3 -W ignore export.py --dtype ${DTYPE} --quant '{"linear:int4" : {"groupsize": 32}}' --checkpoint-path "$CHECKPOINT_PATH" --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" || exit 1
194186
python3 -W ignore generate.py --dtype ${DTYPE} --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" > "$MODEL_DIR/output_aoti" || exit 1

0 commit comments

Comments
 (0)