Skip to content

Multi Pin Bumps across PT/AO/tune/ET #1367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Dec 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
bcdfc54
Bump PyTorch pin to 20241111
Jack-Khuu Nov 12, 2024
a976734
bump to 1112
Jack-Khuu Nov 12, 2024
23b4536
Merge branch 'main' into pinbump1111
Jack-Khuu Nov 12, 2024
6328935
Update install_requirements.sh
Jack-Khuu Nov 13, 2024
7aa96d7
Update install_requirements.sh
Jack-Khuu Nov 14, 2024
4a977a5
Merge branch 'main' into pinbump1111
Jack-Khuu Nov 14, 2024
774ebb6
Update checkpoint.py typo
Jack-Khuu Nov 14, 2024
655dc4a
Merge branch 'main' into pinbump1111
Jack-Khuu Nov 15, 2024
a6cb90c
Update install_requirements.sh
Jack-Khuu Nov 18, 2024
8cb415d
Merge branch 'main' into pinbump1111
Jack-Khuu Nov 18, 2024
f9d0a29
Update install_requirements.sh
Jack-Khuu Nov 18, 2024
c3f18c6
Update install_requirements.sh
Jack-Khuu Nov 19, 2024
5b91d46
Merge branch 'main' into pinbump1111
Jack-Khuu Nov 22, 2024
bde427d
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 2, 2024
7647d52
Bump pins, waiting for nvjit fix
Jack-Khuu Dec 2, 2024
bb6ca2a
Update install_requirements.sh
Jack-Khuu Dec 2, 2024
eb00467
bump tune
Jack-Khuu Dec 2, 2024
673f5ab
fix tune major version
Jack-Khuu Dec 2, 2024
da0a26d
Bump AO pin to pick up import fix
Jack-Khuu Dec 2, 2024
2530e71
misc
Jack-Khuu Dec 3, 2024
1ada559
Update linux_job CI to v2
Jack-Khuu Dec 3, 2024
f58c22e
Update install_requirements.sh PT pin to 1202
Jack-Khuu Dec 4, 2024
2ece601
Vision nightly is delayed
Jack-Khuu Dec 4, 2024
565338b
Bump Cuda version; drop PT version to one with vision nightly
Jack-Khuu Dec 5, 2024
7088e79
Bump to 1205 vision nightly
Jack-Khuu Dec 5, 2024
94aa9a8
Vision nightly 1205 needs 1204 torch(?)
Jack-Khuu Dec 5, 2024
6e54cba
Drop PT version to 1126 (friendly vision version), update devtoolset …
Jack-Khuu Dec 6, 2024
a05683d
Test download toolchain instead of binutils
Jack-Khuu Dec 6, 2024
411cf94
Test removing devtoolset
Jack-Khuu Dec 6, 2024
953a42e
Remove dep on devtoolset 11 that doesnt' exist on the new machine
Jack-Khuu Dec 6, 2024
6e8bfb1
Bump ET pin
Jack-Khuu Dec 6, 2024
5a80f5f
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 6, 2024
59e00d5
Test nightly with updated vision
Jack-Khuu Dec 6, 2024
d67eb86
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 7, 2024
aae4eb3
Attempt to account for int4wo packing pt#139611
Jack-Khuu Dec 7, 2024
25da485
Naive gguf int4wo attempt
Jack-Khuu Dec 7, 2024
a9fa27e
Update install_requirements.sh to 1210
Jack-Khuu Dec 10, 2024
bdd2356
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 10, 2024
bfe5826
Update install_requirements.sh to 20241213
Jack-Khuu Dec 13, 2024
02dc6a4
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 13, 2024
dbb090f
Update torchvision minor version to 22
Jack-Khuu Dec 13, 2024
9579f18
Merge branch 'main' into pinbump1111
Jack-Khuu Dec 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions .github/workflows/more-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,17 @@ on:

jobs:
test-cuda:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between these two?

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Former is being replaced in the new wheel build: pytorch/pytorch#123649

with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"


echo "::group::Download checkpoints"
# Install requirements
./install/install_requirements.sh cuda
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/periodic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ jobs:
set -eux
PYTHONPATH="${PWD}" python .ci/scripts/gather_test_models.py --event "periodic" --backend "gpu"
test-gpu:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
secrets: inherit
Expand All @@ -119,7 +119,7 @@ jobs:
secrets-env: "HF_TOKEN_PERIODIC"
runner: ${{ matrix.runner }}
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
script: |
echo "::group::Print machine info"
nvidia-smi
Expand Down
42 changes: 11 additions & 31 deletions .github/workflows/pull.yml
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ jobs:
set -eux
PYTHONPATH="${PWD}" python .ci/scripts/gather_test_models.py --event "pull_request" --backend "gpu"
test-gpu-compile:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu-compile (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
strategy:
Expand All @@ -224,7 +224,7 @@ jobs:
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
script: |
echo "::group::Print machine info"
nvidia-smi
Expand All @@ -250,7 +250,7 @@ jobs:
echo "::endgroup::"

test-gpu-aoti-bfloat16:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu-aoti-bfloat16 (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
strategy:
Expand All @@ -259,18 +259,13 @@ jobs:
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
nvidia-smi
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Install required packages"
./install/install_requirements.sh cuda
pip3 list
Expand All @@ -291,7 +286,7 @@ jobs:
echo "::endgroup::"

test-gpu-aoti-float32:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu-aoti-float32 (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
strategy:
Expand All @@ -300,17 +295,12 @@ jobs:
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
script: |
echo "::group::Print machine info"
nvidia-smi
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Install required packages"
./install/install_requirements.sh cuda
pip list
Expand All @@ -337,7 +327,7 @@ jobs:
echo "::endgroup::"

test-gpu-aoti-float16:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu-aoti-float16 (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
strategy:
Expand All @@ -346,17 +336,12 @@ jobs:
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
script: |
echo "::group::Print machine info"
nvidia-smi
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Install required packages"
./install/install_requirements.sh cuda
pip list
Expand Down Expand Up @@ -384,7 +369,7 @@ jobs:
echo "::endgroup::"

test-gpu-eval-sanity-check:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
name: test-gpu-eval-sanity-check (${{ matrix.platform }}, ${{ matrix.model_name }})
needs: gather-models-gpu
strategy:
Expand All @@ -393,17 +378,12 @@ jobs:
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
script: |
echo "::group::Print machine info"
nvidia-smi
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Install required packages"
./install/install_requirements.sh cuda
pip3 list
Expand Down Expand Up @@ -1031,7 +1011,7 @@ jobs:
echo "Tests complete."

test-build-runner-et-android:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
runner: linux.4xlarge
script: |
Expand Down
27 changes: 6 additions & 21 deletions .github/workflows/run-readme-periodic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,19 @@ on:

jobs:
test-readme:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
secrets: inherit
with:
runner: linux.g5.4xlarge.nvidia.gpu
secrets-env: "HF_TOKEN_PERIODIC"
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Create script to run README"
python3 torchchat/utils/scripts/updown.py --create-sections --file README.md > ./run-readme.sh
# for good measure, if something happened to updown processor,
Expand All @@ -44,23 +39,18 @@ jobs:


test-quantization-any:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
runner: linux.g5.4xlarge.nvidia.gpu
secrets: inherit
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Create script to run quantization"
python3 torchchat/utils/scripts/updown.py --create-sections --file docs/quantization.md > ./run-quantization.sh
# for good measure, if something happened to updown processor,
Expand All @@ -76,24 +66,19 @@ jobs:
echo "::endgroup::"

test-gguf-any:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
secrets: inherit
with:
runner: linux.g5.4xlarge.nvidia.gpu
secrets-env: "HF_TOKEN_PERIODIC"
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"

echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"

echo "::group::Create script to run gguf"
python3 torchchat/utils/scripts/updown.py --file docs/GGUF.md > ./run-gguf.sh
# for good measure, if something happened to updown processor,
Expand Down
Loading
Loading