do weight transform on cpu #508

mikekgfb · 2024-04-26T22:02:21Z

to avoid OOM situations, do weight transform on cpu

load_state_dict does not have a map_location argument lich torch.load().

However, a quick check for mps suggested that when the state dict is loaded back into a model, it's placed on the device of the pre-existing device. If this does not work as expected in scenarios, it should be quick to write a new version of load_state_dict that takes a map_location or amend the APUI for the exixsting function....

We might also avoid instantiating the state dict altogether - this is inherited from gpt fast, to enable sharing of quantization algos between torchchat and gpt-fast.

malfet

It works...

jerryzh168 · 2024-04-27T01:40:38Z

quantize.py

@@ -856,8 +870,8 @@ def create_quantized_state_dict(self):
                        weight.to(torch.float), self.groupsize, self.inner_k_tiles
                    )
                )
-                weight_int4pack = weight_int4pack.to(device=self.device)
-                scales_and_zeros = scales_and_zeros.to(device=self.device)
+                weight_int4pack = weight_int4pack.to(device=dict_device)


so cpu packed weight and cuda packed weight are calling different int4mm kernels, and the weights prepared in one device may not be compatible with another (gives wrong results) as recently discovered by @HDCharles, and it's a silent error right now. have we done any evaluation on accuracy for this change (on cuda)?

We have not done this, but @malfet and I have discussed this, and Intel had previously promised us an unpack routine - so we would be bale to unpack() the different formats. BTW, we need an unpack for the GPU packing format as well

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 26, 2024

mikekgfb requested review from malfet, jerryzh168, soumith and cpuhrsch April 26, 2024 22:02

mikekgfb force-pushed the quantize_in_cpu branch from 8e60a72 to 59e6275 Compare April 26, 2024 23:23

malfet approved these changes Apr 26, 2024

View reviewed changes

do weight transform on cpu

fe3e608

mikekgfb force-pushed the quantize_in_cpu branch from 59e6275 to fe3e608 Compare April 26, 2024 23:35

mikekgfb merged commit 898620f into main Apr 27, 2024

mikekgfb mentioned this pull request Apr 27, 2024

[Feature Request] Can not quantize llama3 for MPS on 32Gb M1 Max #474

Closed

jerryzh168 reviewed Apr 27, 2024

View reviewed changes

mikekgfb mentioned this pull request Apr 27, 2024

Need to unpack from AVX and repack for GPU if we perform quantization on CPU #512

Closed

malfet deleted the quantize_in_cpu branch April 30, 2024 16:50

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

7331400

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

4d56583

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

e437fea

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

9697399

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

55a3a46

malfet pushed a commit that referenced this pull request Jul 17, 2024

do weight transform on cpu (#508)

dd73ce4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do weight transform on cpu #508

do weight transform on cpu #508

Uh oh!

mikekgfb commented Apr 26, 2024 •

edited

Loading

Uh oh!

malfet left a comment

Uh oh!

jerryzh168 Apr 27, 2024

Uh oh!

mikekgfb Apr 27, 2024

Uh oh!

Uh oh!

do weight transform on cpu #508

do weight transform on cpu #508

Uh oh!

Conversation

mikekgfb commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 27, 2024

Choose a reason for hiding this comment

Uh oh!

mikekgfb Apr 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikekgfb commented Apr 26, 2024 •

edited

Loading