Use _weight_int8pack_mm for CPU + eager #472

malfet · 2024-04-24T23:49:48Z

Restrict use of the op to eager + torch > 2.3 (i.e. only nightlies) and cpu

This improves the perf for

% python3 torchchat.py generate --dtype float16 --device cpu --quant '{"linear:int8" : {}}' --checkpoint-path checkpoints/stories110M/model.pth

from 21 to 54 tokens/sec on M1 Max

mikekgfb · 2024-04-25T00:25:17Z

Is that because we are running with an old pytorch for Macos-12 ?
(if so we need to suppress this test on macos-12/x86 for now (but not on macos-14) cc: @guangy10

https://github.com/pytorch/torchchat/actions/runs/8824786469/job/24227882131?pr=472

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: _weight_int8pack_mm_cpu : expect A to be bfloat16 tensor.

malfet · 2024-04-25T00:43:51Z

@mikekgfb sure, I can spend some cycles doing it, but I wonder what's the goal. Are TorchChat claims to support Pytorch-2.2?)

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

Use _weight_int8pack_mm for CPU + eager

0c245f3

malfet requested a review from mikekgfb April 24, 2024 23:49

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 24, 2024

mikekgfb approved these changes Apr 25, 2024

View reviewed changes

Skip for older PyTorch versions

c3c08c3

mikekgfb merged commit e3c7007 into main Apr 25, 2024

mikekgfb deleted the malfet/use-int8mm-on-cpu-eager branch April 25, 2024 02:58

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

f2edc8e

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

bbd4463

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

0c32279

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

6ad274f

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

ff0d9ce

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

malfet added a commit that referenced this pull request Jul 17, 2024

Use _weight_int8pack_mm for CPU + eager (#472)

8605a11

* Use _weight_int8pack_mm for CPU + eager * Skip for older PyTorch versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use _weight_int8pack_mm for CPU + eager #472

Use _weight_int8pack_mm for CPU + eager #472

Uh oh!

malfet commented Apr 24, 2024 •

edited

Loading

Uh oh!

mikekgfb commented Apr 25, 2024

Uh oh!

malfet commented Apr 25, 2024

Uh oh!

Uh oh!

Use _weight_int8pack_mm for CPU + eager #472

Use _weight_int8pack_mm for CPU + eager #472

Uh oh!

Conversation

malfet commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikekgfb commented Apr 25, 2024

Uh oh!

malfet commented Apr 25, 2024

Uh oh!

Uh oh!

malfet commented Apr 24, 2024 •

edited

Loading