Remove vLLM dependency for CUDA #2751

danieldk · 2024-11-15T12:41:32Z

What does this PR do?

This change adds attention-kernels as a dependency for paged attention and cache reshaping. With that, we don't use vLLM anywhere for CUDA.

Test run (since we don't have paged attention in CI):

❯ ATTENTION=paged python -m pytest integration-tests -k "llama and awq" --release
[...]
5 snapshots passed.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This change adds `attention-kernels` as a dependency for paged attention and cache reshaping. With that, we don't use vLLM anywhere for CUDA. Tested run (since we don't have paged attention in CI): ``` ❯ ATTENTION=paged python -m pytest integration-tests -k "llama and awq" --release [...] 5 snapshots passed. ```

Narsil

LGTM

danieldk force-pushed the maintenance/cuda-remove-vllm-dep branch from ec9933d to cea21a7 Compare November 15, 2024 12:54

danieldk mentioned this pull request Nov 15, 2024

Upgrading our deps. #2750

Merged

5 tasks

danieldk force-pushed the maintenance/cuda-remove-vllm-dep branch from cea21a7 to e84bcd3 Compare November 15, 2024 12:57

danieldk force-pushed the maintenance/cuda-remove-vllm-dep branch from e84bcd3 to dfc00f7 Compare November 15, 2024 13:07

Fix clippy warning

110d154

danieldk marked this pull request as ready for review November 15, 2024 14:53

Narsil approved these changes Nov 16, 2024

View reviewed changes

danieldk merged commit 52e4873 into main Nov 17, 2024
10 of 12 checks passed

danieldk deleted the maintenance/cuda-remove-vllm-dep branch November 17, 2024 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove vLLM dependency for CUDA #2751

Remove vLLM dependency for CUDA #2751

Uh oh!

danieldk commented Nov 15, 2024 •

edited

Loading

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Remove vLLM dependency for CUDA #2751

Remove vLLM dependency for CUDA #2751

Uh oh!

Conversation

danieldk commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danieldk commented Nov 15, 2024 •

edited

Loading