-
Notifications
You must be signed in to change notification settings - Fork 250
add aoti c/c++ runner to hqq tests; check output for gibberish using spell #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c193892
be8e226
7300c6d
8378ffc
c670f66
952e001
6d8c790
14794e1
26b71a3
d5feb4f
cdf359c
92b097e
e480bca
aa1774c
3e16ba2
e907196
94e236a
4dee526
4b367ef
1360ef9
2bd5dec
3246d8e
dbb35cd
3bc7218
faae387
7e67742
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#! /bin/bash | ||
|
||
#!/bin/bash | ||
# Check the spelling of the specified file | ||
cat "$1" | ||
|
||
TMPFILE=/tmp/`basename "$1"`-sequence | ||
|
||
if [ "X$2" == "X--no-extract" ]; then | ||
cp "$1" $TMPFILE | ||
else | ||
# We extract only the sequence output and don't spell check status and performance stats | ||
python3 .ci/scripts/extract-sequence.py "$1" >$TMPFILE | ||
|
||
if [ $? -ne 0 ]; then | ||
echo "Sequence extraction failed. Exiting." | ||
exit 1 | ||
fi | ||
fi | ||
|
||
cat ${TMPFILE} | aspell -a -c | grep '^[\&#]' >/tmp/out.$$ | ||
# Exit with a non-zero status code if there were any spelling errors because: | ||
# * Finding one or more lines with & or # means we found a spelling error, might be gibberish | ||
if [ $? -ne 0 ]; then | ||
echo "No spelling errors found; likely correct operation. Success." | ||
exit 0 | ||
fi | ||
cat /tmp/out.$$ | ||
echo "Spelling errors found; might indicate garbage output. Failing." | ||
exit 1 | ||
Comment on lines
+21
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we introduce some sort of a tolerance criteria here (i.e. total number of unknown words) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great idea -- I've had one instance where generate produced an invented non-dictionary work with and 'he calls it "whateversomething".' That being said it only occurred once, so I decided to not overindex on that. Allowing for some rate of output is a good solution to this. Should we proactively do that, or wait till we see if in actual test runs? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
import sys | ||
|
||
def print_until_equals(filename): | ||
output = False | ||
past_output = False | ||
with open(filename, "r") as f: | ||
for line in f: | ||
if line.startswith("-" * 8): | ||
output = True | ||
if output and line.startswith("=" * 8): | ||
if past_output: | ||
print("Double end-of-sequence line") | ||
exit(1) | ||
past_output = True | ||
output = False | ||
if output: | ||
print(line) | ||
|
||
if not past_output: | ||
print("Did find sequence to output") | ||
exit(1) | ||
|
||
|
||
if __name__ == "__main__": | ||
if len(sys.argv) < 2: | ||
print("Usage: python scriptname.py filename") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should have been something like `f"Usage:\n {sys.executable} {sys.argv[0]} filename") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you! |
||
sys.exit(1) | ||
filename = sys.argv[1] | ||
print_until_equals(filename) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,7 @@ on: | |
workflow_dispatch: | ||
|
||
jobs: | ||
test-cuda: | ||
test-hqq: | ||
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main | ||
with: | ||
runner: linux.g5.4xlarge.nvidia.gpu | ||
|
@@ -28,8 +28,11 @@ jobs: | |
echo "::group::Download checkpoints" | ||
# Install requirements | ||
./install_requirements.sh cuda | ||
bash scripts/build_native.sh aoti | ||
pip3 list | ||
python3 -c 'import torch;print(f"torch: {torch.__version__, torch.version.git_version}")' | ||
# needed to check for gibberish | ||
yum install -y aspell aspell-en | ||
echo "::endgroup::" | ||
|
||
echo "::group::Download checkpoints" | ||
|
@@ -42,30 +45,43 @@ jobs: | |
|
||
echo "::group::Run inference" | ||
export MODEL_PATH=checkpoints/stories15M/stories15M.pt | ||
export TOKENIZER_PATH=checkpoints/stories15M/tokenizer.model | ||
export MODEL_NAME=stories15M | ||
export MODEL_DIR=/tmp | ||
|
||
for DTYPE in bfloat16 float16 float32; do | ||
export PROMPT="Once upon a time in a land far away" | ||
|
||
for DEVICE in cpu cuda; do | ||
for DTYPE in bfloat16 float16 float32; do | ||
|
||
python generate.py --dtype ${DTYPE} --device cuda --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_eager | ||
cat ./output_eager | ||
python generate.py --dtype ${DTYPE} --device cuda --compile --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_compiled | ||
cat ./output_compiled | ||
python export.py --dtype ${DTYPE} --device cuda --checkpoint-path ${MODEL_PATH} --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so | ||
python generate.py --dtype ${DTYPE} --device cuda --checkpoint-path ${MODEL_PATH} --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so > ./output_aoti | ||
cat ./output_aoti | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_eager | ||
.ci/scripts/check_gibberish ./output_eager | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --compile --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_compiled | ||
.ci/scripts/check_gibberish ./output_compiled | ||
python export.py --dtype ${DTYPE} --device ${DEVICE} --checkpoint-path ${MODEL_PATH} --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --checkpoint-path ${MODEL_PATH} --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so > ./output_aoti | ||
.ci/scripts/check_gibberish ./output_aoti | ||
|
||
./cmake-out/aoti_run ${MODEL_DIR}/${MODEL_NAME}.so -z ${TOKENIZER_PATH} -i "${PROMPT}" > ./output_runner_aoti | ||
cat ./output_runner_aoti | ||
# .ci/scripts/check_gibberish ./output_runner_aoti --no-extract | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why check is skipped there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because aoti_runner does not work properly and I can't run proper tests. Would love to add test here, and add aoti runner everywhere where we call |
||
|
||
echo "**********************************************" | ||
echo "******** INT4 HQQ group-wise quantized *******" | ||
echo "**********************************************" | ||
python generate.py --dtype ${DTYPE} --device cuda --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_eager | ||
cat ./output_eager | ||
python generate.py --dtype ${DTYPE} --device cuda --compile --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_compiled | ||
cat ./output_compiled | ||
python export.py --dtype ${DTYPE} --device cuda --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so | ||
python generate.py --dtype ${DTYPE} --device cuda --checkpoint-path ${MODEL_PATH} --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so > ./output_aoti | ||
cat ./output_aoti | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_eager | ||
.ci/scripts/check_gibberish ./output_eager | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --compile --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --temperature 0 > ./output_compiled | ||
.ci/scripts/check_gibberish ./output_compiled | ||
python export.py --dtype ${DTYPE} --device ${DEVICE} --quant '{"linear:hqq" : {"groupsize": 32}}' --checkpoint-path ${MODEL_PATH} --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so | ||
python generate.py --dtype ${DTYPE} --device ${DEVICE} --checkpoint-path ${MODEL_PATH} --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so > ./output_aoti | ||
.ci/scripts/check_gibberish ./output_aoti | ||
|
||
./cmake-out/aoti_run ${MODEL_DIR}/${MODEL_NAME}.so -z ${TOKENIZER_PATH} -i "${PROMPT}" > ./output_runner_aoti | ||
cat ./output_runner_aoti | ||
# .ci/scripts/check_gibberish ./output_runner_aoti --no-extract | ||
|
||
done | ||
done | ||
|
||
echo "tests complete" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -244,3 +244,4 @@ jobs: | |
echo "tests complete" | ||
echo "*******************************************" | ||
echo "::endgroup::" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -441,6 +441,7 @@ def _initialize_model( | |
|
||
model.to(dtype=builder_args.precision) | ||
|
||
print("-----------------------------------------------------------") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this is needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking for a delimiter for our gibberish (load, quantization and other time....). As far as model output it's as much gibberish as a hallucinating model (in terms of utlliity and connectedness to the model). That being said, there's a point where I strongly believe in dumping performance data -- we had missed regressions when we did not. |
||
return model | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -748,7 +748,7 @@ def callback(x): | |
aggregate_metrics["tokens_per_sec"].append(tokens_sec) | ||
|
||
if jit_compile: | ||
print(f"JIT compilation time (incl runtime): {compilation_time:.2} seconds") | ||
print(f"just-in-time compilation time (incl run time): {compilation_time:.2} seconds") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fine, but shouldn't be part of the PR, should it? |
||
# Don't continue here.... because we need to report and reset | ||
# continue | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -550,9 +550,9 @@ def quantize(self, module): | |
inner_k_tiles=self.inner_k_tiles, | ||
): | ||
if self.padding_allowed: | ||
print( | ||
f"warning: {name} is padded to satisfy in_features % 1024 == 0" | ||
) | ||
# print( | ||
# f"warning: {name} is padded to satisfy in_features % 1024 == 0" | ||
# ) | ||
Comment on lines
+553
to
+555
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, how this change is related to PR in question? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you output that, it enters the spell checked sequence, and fails! We are too wordy with debug messages, this is primarily my fault in the quantization logic. |
||
padded_in_features = find_multiple(in_features, 1024) | ||
weight = F.pad( | ||
weight, pad=(0, padded_in_features - in_features) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do it twice?