Move quantized weight handling out of the `Weights` class #2194

danieldk · 2024-07-05T16:13:41Z

What does this PR do?

Quantized weights were loaded in the Weights class, but this was getting quite unwieldy, where every higher level method to load weights was a long conditional to cover all the different quantizers.

This change moves loading of quantized weights out of the Weights class. This is done by defining a simple WeightsLoader interface that is implemented by Exl2WeightsLoader, GPTQWeightsLoader, and MarlinWeightsLoader. These implementations are in the quantizers' respective modules. The Weights class provides the low-level load operations (such as loading tensors or sharded tensors), but delegates loads that need quantizer-specific weight processing to a loader. The loaders still use the low-level functionality provided by Weights.

I initially tried making a hierarchy where a class like GPTQWeights would inherit from Weights. But it is not very flexible (e.g. does not work well with the new weight storage mock used in tests) and the implicit indirections made the code harder to follow.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Narsil

Nice change.

I expected a net negative amount of lines from this (or neutral) but I guess the extra interface make it a bit more verbose.

Regardless it's a nice change.

I think we can make the interface slightly simpler and remove the need for default checking in the weights load.

server/text_generation_server/utils/weights.py

Quantized weights were loaded in the `Weights` class, but this was getting quite unwieldy, where every higher level method to load weights was a long conditional to cover all the different quantizers. This change moves loading of quantized weights out of the `Weights` class. This is done by defining a simple `WeightsLoader` interface that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`, and `MarlinWeightsLoader`. These implementations are in the quantizers' respective modules. The `Weights` class provides the low-level load operations (such as loading tensors or sharded tensors), but delegates loads that need quantizer-specific weight processing to a loader. The loaders still use the low-level functionality provided by `Weights`. I initially tried making a hierarchy where a class like `GPTQWeights` would inherit from `Weights`. But it is not very flexible (e.g. does not work well with the new weight storage mock used in tests) and the implicit indirections made the code harder to follow.

Narsil

LGTM

Quantized weights were loaded in the `Weights` class, but this was getting quite unwieldy, where every higher level method to load weights was a long conditional to cover all the different quantizers. This change moves loading of quantized weights out of the `Weights` class. This is done by defining a simple `WeightsLoader` interface that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`, and `MarlinWeightsLoader`. These implementations are in the quantizers' respective modules. The `Weights` class provides the low-level load operations (such as loading tensors or sharded tensors), but delegates loads that need quantizer-specific weight processing to a loader. The loaders still use the low-level functionality provided by `Weights`. I initially tried making a hierarchy where a class like `GPTQWeights` would inherit from `Weights`. But it is not very flexible (e.g. does not work well with the new weight storage mock used in tests) and the implicit indirections made the code harder to follow.

danieldk force-pushed the refactor/quantizer-weights branch 3 times, most recently from a93a398 to eca19d4 Compare July 8, 2024 11:26

Narsil reviewed Jul 8, 2024

View reviewed changes

danieldk force-pushed the refactor/quantizer-weights branch 2 times, most recently from 397657e to 8aaacc3 Compare July 9, 2024 07:37

danieldk marked this pull request as ready for review July 9, 2024 07:39

danieldk requested a review from Narsil July 9, 2024 07:39

danieldk force-pushed the refactor/quantizer-weights branch from 8aaacc3 to 2f19a24 Compare July 9, 2024 15:09

danieldk force-pushed the refactor/quantizer-weights branch from 2f19a24 to 800ee99 Compare July 9, 2024 15:39

Narsil approved these changes Jul 9, 2024

View reviewed changes

danieldk merged commit 8511669 into main Jul 9, 2024
14 of 15 checks passed

danieldk deleted the refactor/quantizer-weights branch July 9, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move quantized weight handling out of the `Weights` class #2194

Move quantized weight handling out of the `Weights` class #2194

Uh oh!

danieldk commented Jul 5, 2024 •

edited

Loading

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Move quantized weight handling out of the Weights class #2194

Move quantized weight handling out of the Weights class #2194

Uh oh!

Conversation

danieldk commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Move quantized weight handling out of the `Weights` class #2194

Move quantized weight handling out of the `Weights` class #2194

danieldk commented Jul 5, 2024 •

edited

Loading