You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "High-Performance Low-Bit Operators for PyTorch"
4
+
author: Scott Roy, Digant Desai, Kimish Patel
5
+
---
6
+
7
+
We are excited to announce the addition of embedding operators with low-bit weights (1-8 bit) and linear operators with 8-bit dynamically quantized activations and low-bit weights (1-8 bit) for Arm CPUs in TorchAO, PyTorch’s native low-precision library. These operators work seamlessly across all PyTorch surfaces, including eager, torch.compile, AOTI, and ExecuTorch, and are [available to use in torchchat](https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#experimental-torchao-lowbit-kernels).
8
+
9
+
In developing these linear operators, our focus was on **code sharing between PyTorch and ExecuTorch**, and establishing a clear boundary between the higher-level operator and the lower-level kernel. This design **allows third-party vendors to easily swap in their own kernels**. We also set out to **create a place and infrastructure to experiment** with new CPU quantization ideas and test those across the PyTorch ecosystem.
10
+
11
+
12
+
## Universal low-bit kernels
13
+
14
+
There is no hardware support for low-bit arithmetic. In what we call universal kernels, we explicitly separated the logic that unpacks low-bit values to int8 values, and the int8 GEMV kernel logic in a modular fashion. We started with an 8-bit kernel, for example, this [1x8 8-bit GEMV kernel](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/channelwise_8bit_activation_groupwise_lowbit_weight_1x8x16_f32_neondot-impl.h#L64) that uses the Arm neondot instruction. Within the 8-bit kernel, we invoke an [inlined unpacking routine](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/channelwise_8bit_activation_groupwise_lowbit_weight_1x8x16_f32_neondot-impl.h#L169) to convert low-bit values into int8 values. This unpacking routine is force-inlined and templated on some low-bit value. Our experiments showed no performance difference between using a separate force-inlined unpacking routine and directly embedding the unpacking code inline.
15
+
16
+
The advantage of this modular design is improved development speed and code maintainability. After writing an 8-bit kernel, we quickly achieved full low-bit coverage by writing [simple bitpacking routines](https://github.com/pytorch/ao/tree/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/bitpacking). In fact, developers who worked on the bit packing routines did not need to be experts on GEMV/GEMM kernel writing. We also reused the same bitpacking routines from the linear kernels [within the embedding kernels](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/embedding/embedding.h#L161). In future we could reuse the same bitpacking routines for universal GEMM kernels or kernels based on fma or i8mm instructions.
17
+
18
+
19
+
## Shared code between PyTorch and ExecuTorch
20
+
21
+
To achieve shared code between PyTorch and ExecuTorch, we wrote kernels [using raw pointers instead of PyTorch tensors](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/linear.h). Moreover, we implemented the [linear operator in a header ](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight-impl.h#L259)that is included in separate [PyTorch](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight_aten.cpp) and [ExecuTorch](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight_executorch/w4s.cpp) operator registration code. By using only features common to both ATen and ExecuTorch tensors, we ensured compatibility between the two frameworks. For multi-threaded compute, we introduced [torchao::parallel_1d](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel.h#L13), which compiles to either [at::parallel_for](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel-aten-impl.h) or [ExecuTorch’s threadpool](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel-executorch-impl.h) based on compile-time flags.
22
+
23
+
24
+
## Swappable kernels
25
+
26
+
Our design for the higher-level multi-threaded linear operator is agnostic to the lower-level single-threaded kernels, allowing third-party vendors to swap in their own implementations. The interface between the operator and kernel is defined by a [ukernel config](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.h#L14), which specifies kernel function pointers for preparing activation data, preparing weight data, and running the kernel. The operator, responsible for tiling and scheduling, interacts with kernels solely through this config.
27
+
28
+
29
+
## Performance
30
+
31
+
In the table below, we show Llama3.1 8B token generation performance using 6 CPU threads on an M1 Macbook Pro with 32GB of RAM.
Results were run on an M1 Macbook Pro (with 8 perf cores, and 2 efficiency cores) with 32GB of RAM and 6 threads [using torchchat](https://github.com/pytorch/torchchat). In each test, the max-seq-length of 128 tokens were generated. For each bit width x, the embedding layer was groupwise quantized to x-bits with group size 32. In the linear layers, activations were dynamically quantized per token to 8 bits and weights were groupwise quantized to x-bits with group size 256. Our focus here is performance and we do not report accuracy or perplexity numbers. Depending on the model, lower bit widths may require quantization-aware training, quantizing a model with a mixture of bit widths, or adjusting the group sizes for acceptable accuracy.
If you want to see the new low-bit kernels in action, give them a try by [setting up torchchat](https://github.com/pytorch/torchchat/tree/main) and [quantizing and running an LLM locally using the kernels](https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#experimental-torchao-lowbit-kernels).
127
+
128
+
If you want to help contribute, consider adding support for one of the following areas:
129
+
130
+
*[Add universal low-bit GEMM kernels](https://github.com/pytorch/ao/issues/1394) for Arm CPU, reusing the same bitpacking routines from the universal GEMV kernels.
131
+
*[Improve runtime selection](https://github.com/pytorch/ao/issues/1376) of ukernel configs based on ISA, packing format, and activation shape.
132
+
* Add low-bit kernels for other CPU ISAs like x86.
133
+
* Integrate third-party libraries like [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) with the operator framework.
<pclass="lead">The PyTorch Foundation is a neutral home for the deep learning community to
23
-
collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by
24
-
leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members
25
-
and contributors to enable community discussions and collaboration.
22
+
<pclass="lead">
23
+
Welcome to the PyTorch Foundation—a vibrant, collaborative hub created for and by the deep learning community. Here, developers, researchers, and industry leaders come together to shape and expand the open source PyTorch framework and ecosystem. Through a network of dedicated contributors to the PyTorch project, the PyTorch Foundation fuels discussion, innovation, and hands-on collaboration across the PyTorch landscape.
26
24
<br/>
27
25
<br/>
28
-
Community collaboration is critical for the framework’s evolution as well as the development of
29
-
associated projects that support using PyTorch in production and at scale. As part of The Linux Foundation, the PyTorch
30
-
community will also collaborate on training, local and regional events, open source developer tooling, academic research,
31
-
and guides to help new users and contributors have a productive experience.
26
+
Community-driven collaboration is at the heart of PyTorch's growth and evolution. From advancing the core framework to building essential tools that power PyTorch at a production scale, your contributions are key to moving this ecosystem forward. As part of the Linux Foundation, the PyTorch community also supports a variety of initiatives: developer training, regional and local events, open source tooling, research, and guides for both newcomers and seasoned contributors—all to make your journey with PyTorch more accessible and rewarding.
<pclass="lead">The Foundation’s mission is to drive adoption of AI and deep learning tooling by fostering and sustaining
48
-
an ecosystem of open source, vendor-neutral projects with PyTorch. We democratize state-of-the-art tools, libraries and other
49
-
components to make these innovations accessible to everyone. Read more about the Role and Values of the PyTorch Foundation <ahref="/assets/pytorch-foundation-principles.pdf" target="_blank">here</a>.</p>
39
+
<pclass="lead">Our mission is to drive the adoption of AI and deep learning by supporting an open, vendor-neutral ecosystem built around PyTorch. By making state-of-the-art tools and libraries accessible to everyone, we aim to democratize innovation in AI and ML. Learn more about the mission and values that guide us in our <ahref="/assets/pytorch-foundation-principles.pdf" target="_blank">PyTorch Foundation Principles</a>.</p>
50
40
</div>
51
41
</div>
52
42
</div>
@@ -113,24 +103,52 @@ <h2>General Members</h2>
113
103
<divclass="col-md-10">
114
104
<h2>Associate Members</h2>
115
105
<divclass="card-container">
106
+
<divclass="card bayero">
107
+
<ahref="https://buk.edu.ng/home" target="_blank">
108
+
<divclass="card-body">
109
+
<imgsrc="/assets/images/members/bayero-logo.svg" alt="Bayero University Kano logo"/>
110
+
</div>
111
+
</a>
112
+
</div>
113
+
<divclass="card">
114
+
<ahref="https://baai.ac.cn/" target="_blank">
115
+
<divclass="card-body">
116
+
<imgsrc="/assets/images/members/baai-logo.svg" alt="Beijing Academy of Artificial Intelligence logo"/>
<pclass="lead">The PyTorch Foundation’s governance structure establishes a Governing Board to oversee the Foundation’s activities
164
-
according to its Guiding Principles. The technical governance structure for the PyTorch open source project
165
-
is defined by the PyTorch maintainers and is available on <ahref="https://pytorch.org/docs/master/community/governance.html">this page</a>.</p>
181
+
<pclass="lead">
182
+
The PyTorch Foundation’s Governing Board oversees the Foundation’s activities according to its Guiding Principles and the <ahref="/assets/pytorch-foundation-charter-04052023.pdf" target="_blank">PyTorch Foundation Charter</a>.
183
+
<br/>
184
+
<br/>
185
+
The technical governance structure for the PyTorch open source project is defined by the PyTorch maintainers and is available on our <ahref="https://pytorch.org/docs/main/community/governance.html">PyTorch Technical Governance page</a>.
186
+
</p>
166
187
</div>
167
188
</div>
168
189
</div>
@@ -173,9 +194,9 @@ <h2>Governance</h2>
173
194
<divclass="container">
174
195
<divclass="row content">
175
196
<divclass="col-md-10 body-side-text">
176
-
<h2>How to Contribute</h2>
197
+
<h2>How to Get Involved</h2>
177
198
<span></span>
178
-
<pclass="lead">Join the <ahref="https://pytorch.org/#community-module">PyTorch developer community</a> to contribute, learn, and get your questions answered.</p>
199
+
<pclass="lead">New to the PyTorch Foundation? Check out our guide to <ahref="/new">getting started with the PyTorch Foundation</a> or join the PyTorch <ahref="https://pytorch.org/#community-module">developer</a> or <ahref="https://discuss.pytorch.org/">user</a> community to contribute, learn, and get your questions answered.</p>
0 commit comments