Skip to content

Move quantized weight handling out of the Weights class #2194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

danieldk
Copy link
Member

@danieldk danieldk commented Jul 5, 2024

What does this PR do?

Quantized weights were loaded in the Weights class, but this was getting quite unwieldy, where every higher level method to load weights was a long conditional to cover all the different quantizers.

This change moves loading of quantized weights out of the Weights class. This is done by defining a simple WeightsLoader interface that is implemented by Exl2WeightsLoader, GPTQWeightsLoader, and MarlinWeightsLoader. These implementations are in the quantizers' respective modules. The Weights class provides the low-level load operations (such as loading tensors or sharded tensors), but delegates loads that need quantizer-specific weight processing to a loader. The loaders still use the low-level functionality provided by Weights.

I initially tried making a hierarchy where a class like GPTQWeights would inherit from Weights. But it is not very flexible (e.g. does not work well with the new weight storage mock used in tests) and the implicit indirections made the code harder to follow.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@danieldk danieldk force-pushed the refactor/quantizer-weights branch 3 times, most recently from a93a398 to eca19d4 Compare July 8, 2024 11:26
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change.

I expected a net negative amount of lines from this (or neutral) but I guess the extra interface make it a bit more verbose.

Regardless it's a nice change.

I think we can make the interface slightly simpler and remove the need for default checking in the weights load.

@danieldk danieldk force-pushed the refactor/quantizer-weights branch 2 times, most recently from 397657e to 8aaacc3 Compare July 9, 2024 07:37
@danieldk danieldk marked this pull request as ready for review July 9, 2024 07:39
@danieldk danieldk requested a review from Narsil July 9, 2024 07:39
@danieldk danieldk force-pushed the refactor/quantizer-weights branch from 8aaacc3 to 2f19a24 Compare July 9, 2024 15:09
Quantized weights were loaded in the `Weights` class, but this was
getting quite unwieldy, where every higher level method to load weights
was a long conditional to cover all the different quantizers.

This change moves loading of quantized weights out of the `Weights`
class. This is done by defining a simple `WeightsLoader` interface
that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`,
and `MarlinWeightsLoader`. These implementations are in the quantizers'
respective modules. The `Weights` class provides the low-level load
operations (such as loading tensors or sharded tensors), but delegates
loads that need quantizer-specific weight processing to a loader. The
loaders still use the low-level functionality provided by `Weights`.

I initially tried making a hierarchy where a class like `GPTQWeights`
would inherit from `Weights`. But it is not very flexible (e.g. does
not work well with the new weight storage mock used in tests) and
the implicit indirections made the code harder to follow.
@danieldk danieldk force-pushed the refactor/quantizer-weights branch from 2f19a24 to 800ee99 Compare July 9, 2024 15:39
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danieldk danieldk merged commit 8511669 into main Jul 9, 2024
14 of 15 checks passed
@danieldk danieldk deleted the refactor/quantizer-weights branch July 9, 2024 18:04
ErikKaum pushed a commit that referenced this pull request Jul 26, 2024
Quantized weights were loaded in the `Weights` class, but this was
getting quite unwieldy, where every higher level method to load weights
was a long conditional to cover all the different quantizers.

This change moves loading of quantized weights out of the `Weights`
class. This is done by defining a simple `WeightsLoader` interface
that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`,
and `MarlinWeightsLoader`. These implementations are in the quantizers'
respective modules. The `Weights` class provides the low-level load
operations (such as loading tensors or sharded tensors), but delegates
loads that need quantizer-specific weight processing to a loader. The
loaders still use the low-level functionality provided by `Weights`.

I initially tried making a hierarchy where a class like `GPTQWeights`
would inherit from `Weights`. But it is not very flexible (e.g. does
not work well with the new weight storage mock used in tests) and
the implicit indirections made the code harder to follow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants