Skip to content

Commit 010a578

Browse files
authored
Merge branch 'site' into patch-1
2 parents 1ca7573 + f6fbed2 commit 010a578

10 files changed

+398
-30
lines changed

.github/workflows/update-quick-start-module.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ jobs:
6666
runs-on: "ubuntu-20.04"
6767
environment: pytorchbot-env
6868
steps:
69+
- name: Checkout pytorch.github.io
70+
uses: actions/checkout@v2
6971
- name: Setup Python
7072
uses: actions/setup-python@v2
7173
with:
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
layout: blog_detail
3+
title: "High-Performance Low-Bit Operators for PyTorch"
4+
author: Scott Roy, Digant Desai, Kimish Patel
5+
---
6+
7+
We are excited to announce the addition of embedding operators with low-bit weights (1-8 bit) and linear operators with 8-bit dynamically quantized activations and low-bit weights (1-8 bit) for Arm CPUs in TorchAO, PyTorch’s native low-precision library. These operators work seamlessly across all PyTorch surfaces, including eager, torch.compile, AOTI, and ExecuTorch, and are [available to use in torchchat](https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#experimental-torchao-lowbit-kernels).
8+
9+
In developing these linear operators, our focus was on **code sharing between PyTorch and ExecuTorch**, and establishing a clear boundary between the higher-level operator and the lower-level kernel. This design **allows third-party vendors to easily swap in their own kernels**. We also set out to **create a place and infrastructure to experiment** with new CPU quantization ideas and test those across the PyTorch ecosystem.
10+
11+
12+
## Universal low-bit kernels
13+
14+
There is no hardware support for low-bit arithmetic. In what we call universal kernels, we explicitly separated the logic that unpacks low-bit values to int8 values, and the int8 GEMV kernel logic in a modular fashion. We started with an 8-bit kernel, for example, this [1x8 8-bit GEMV kernel](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/channelwise_8bit_activation_groupwise_lowbit_weight_1x8x16_f32_neondot-impl.h#L64) that uses the Arm neondot instruction. Within the 8-bit kernel, we invoke an [inlined unpacking routine](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/channelwise_8bit_activation_groupwise_lowbit_weight_1x8x16_f32_neondot-impl.h#L169) to convert low-bit values into int8 values. This unpacking routine is force-inlined and templated on some low-bit value. Our experiments showed no performance difference between using a separate force-inlined unpacking routine and directly embedding the unpacking code inline.
15+
16+
The advantage of this modular design is improved development speed and code maintainability. After writing an 8-bit kernel, we quickly achieved full low-bit coverage by writing [simple bitpacking routines](https://github.com/pytorch/ao/tree/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/bitpacking). In fact, developers who worked on the bit packing routines did not need to be experts on GEMV/GEMM kernel writing. We also reused the same bitpacking routines from the linear kernels [within the embedding kernels](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/embedding/embedding.h#L161). In future we could reuse the same bitpacking routines for universal GEMM kernels or kernels based on fma or i8mm instructions.
17+
18+
19+
## Shared code between PyTorch and ExecuTorch
20+
21+
To achieve shared code between PyTorch and ExecuTorch, we wrote kernels [using raw pointers instead of PyTorch tensors](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/kernels/cpu/aarch64/linear/linear.h). Moreover, we implemented the [linear operator in a header ](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight-impl.h#L259)that is included in separate [PyTorch](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight_aten.cpp) and [ExecuTorch](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight_executorch/w4s.cpp) operator registration code. By using only features common to both ATen and ExecuTorch tensors, we ensured compatibility between the two frameworks. For multi-threaded compute, we introduced [torchao::parallel_1d](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel.h#L13), which compiles to either [at::parallel_for](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel-aten-impl.h) or [ExecuTorch’s threadpool](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/parallel-executorch-impl.h) based on compile-time flags.
22+
23+
24+
## Swappable kernels
25+
26+
Our design for the higher-level multi-threaded linear operator is agnostic to the lower-level single-threaded kernels, allowing third-party vendors to swap in their own implementations. The interface between the operator and kernel is defined by a [ukernel config](https://github.com/pytorch/ao/blob/299aacd0ab0e0cce376f56e18e5bb585d517b2e1/torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.h#L14), which specifies kernel function pointers for preparing activation data, preparing weight data, and running the kernel. The operator, responsible for tiling and scheduling, interacts with kernels solely through this config.
27+
28+
29+
## Performance
30+
31+
In the table below, we show Llama3.1 8B token generation performance using 6 CPU threads on an M1 Macbook Pro with 32GB of RAM.
32+
33+
34+
<table class="table table-bordered">
35+
<tr>
36+
<td><strong>Bitwidth x</strong>
37+
</td>
38+
<td><strong>torch.compile (Decode tokens/sec)</strong>
39+
</td>
40+
<td><strong>ExecuTorch (Decode tokens/sec)</strong>
41+
</td>
42+
<td><strong>ExecuTorch PTE size (GiB)</strong>
43+
</td>
44+
</tr>
45+
<tr>
46+
<td>1
47+
</td>
48+
<td>24.18
49+
</td>
50+
<td>17.86
51+
</td>
52+
<td>1.46
53+
</td>
54+
</tr>
55+
<tr>
56+
<td>2
57+
</td>
58+
<td>27.02
59+
</td>
60+
<td>19.65
61+
</td>
62+
<td>2.46
63+
</td>
64+
</tr>
65+
<tr>
66+
<td>3
67+
</td>
68+
<td>21.01
69+
</td>
70+
<td>22.25
71+
</td>
72+
<td>3.46
73+
</td>
74+
</tr>
75+
<tr>
76+
<td>4
77+
</td>
78+
<td>19.51
79+
</td>
80+
<td>19.47
81+
</td>
82+
<td>4.47
83+
</td>
84+
</tr>
85+
<tr>
86+
<td>5
87+
</td>
88+
<td>14.78
89+
</td>
90+
<td>16.34
91+
</td>
92+
<td>5.47
93+
</td>
94+
</tr>
95+
<tr>
96+
<td>6
97+
</td>
98+
<td>12.80
99+
</td>
100+
<td>13.61
101+
</td>
102+
<td>6.47
103+
</td>
104+
</tr>
105+
<tr>
106+
<td>7
107+
</td>
108+
<td>8.16
109+
</td>
110+
<td>11.73
111+
</td>
112+
<td>7.48
113+
</td>
114+
</tr>
115+
</table>
116+
117+
118+
Results were run on an M1 Macbook Pro (with 8 perf cores, and 2 efficiency cores) with 32GB of RAM and 6 threads [using torchchat](https://github.com/pytorch/torchchat). In each test, the max-seq-length of 128 tokens were generated. For each bit width x, the embedding layer was groupwise quantized to x-bits with group size 32. In the linear layers, activations were dynamically quantized per token to 8 bits and weights were groupwise quantized to x-bits with group size 256. Our focus here is performance and we do not report accuracy or perplexity numbers. Depending on the model, lower bit widths may require quantization-aware training, quantizing a model with a mixture of bit widths, or adjusting the group sizes for acceptable accuracy.
119+
120+
121+
![Llama 3.1 chart](/assets/images/hi-po-low-bit.png){:style="width:100%"}
122+
123+
124+
## Try them out and contribute!
125+
126+
If you want to see the new low-bit kernels in action, give them a try by [setting up torchchat](https://github.com/pytorch/torchchat/tree/main) and [quantizing and running an LLM locally using the kernels](https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#experimental-torchao-lowbit-kernels).
127+
128+
If you want to help contribute, consider adding support for one of the following areas:
129+
130+
* [Add universal low-bit GEMM kernels](https://github.com/pytorch/ao/issues/1394) for Arm CPU, reusing the same bitpacking routines from the universal GEMV kernels.
131+
* [Improve runtime selection](https://github.com/pytorch/ao/issues/1376) of ukernel configs based on ISA, packing format, and activation shape.
132+
* Add low-bit kernels for other CPU ISAs like x86.
133+
* Integrate third-party libraries like [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) with the operator framework.

announcement.html

Lines changed: 51 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,12 @@ <h1>PyTorch<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Foundation</h1>
1919
<div class="container">
2020
<div class="row content">
2121
<div class="col-md-10 body-side-text">
22-
<p class="lead">The PyTorch Foundation is a neutral home for the deep learning community to
23-
collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by
24-
leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members
25-
and contributors to enable community discussions and collaboration.
22+
<p class="lead">
23+
Welcome to the PyTorch Foundation—a vibrant, collaborative hub created for and by the deep learning community. Here, developers, researchers, and industry leaders come together to shape and expand the open source PyTorch framework and ecosystem. Through a network of dedicated contributors to the PyTorch project, the PyTorch Foundation fuels discussion, innovation, and hands-on collaboration across the PyTorch landscape.
2624
<br />
2725
<br />
28-
Community collaboration is critical for the framework’s evolution as well as the development of
29-
associated projects that support using PyTorch in production and at scale. As part of The Linux Foundation, the PyTorch
30-
community will also collaborate on training, local and regional events, open source developer tooling, academic research,
31-
and guides to help new users and contributors have a productive experience.
26+
Community-driven collaboration is at the heart of PyTorch's growth and evolution. From advancing the core framework to building essential tools that power PyTorch at a production scale, your contributions are key to moving this ecosystem forward. As part of the Linux Foundation, the PyTorch community also supports a variety of initiatives: developer training, regional and local events, open source tooling, research, and guides for both newcomers and seasoned contributors—all to make your journey with PyTorch more accessible and rewarding.
3227
</p>
33-
<a href="https://jira.linuxfoundation.org/plugins/servlet/desk/portal/23?project=pytorch" class="btn mt-3 btn-lg with-right-arrow">
34-
Member Support
35-
</a>
3628
</div>
3729
</div>
3830
</div>
@@ -42,11 +34,9 @@ <h1>PyTorch<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Foundation</h1>
4234
<div class="container">
4335
<div class="row content">
4436
<div class="col-md-10 body-side-text">
45-
<h2>Principles</h2>
37+
<h2>Our Guiding Principles</h2>
4638
<span></span>
47-
<p class="lead">The Foundation’s mission is to drive adoption of AI and deep learning tooling by fostering and sustaining
48-
an ecosystem of open source, vendor-neutral projects with PyTorch. We democratize state-of-the-art tools, libraries and other
49-
components to make these innovations accessible to everyone. Read more about the Role and Values of the PyTorch Foundation <a href="/assets/pytorch-foundation-principles.pdf" target="_blank">here</a>.</p>
39+
<p class="lead">Our mission is to drive the adoption of AI and deep learning by supporting an open, vendor-neutral ecosystem built around PyTorch. By making state-of-the-art tools and libraries accessible to everyone, we aim to democratize innovation in AI and ML. Learn more about the mission and values that guide us in our <a href="/assets/pytorch-foundation-principles.pdf" target="_blank">PyTorch Foundation Principles</a>.</p>
5040
</div>
5141
</div>
5242
</div>
@@ -113,24 +103,52 @@ <h2>General Members</h2>
113103
<div class="col-md-10">
114104
<h2>Associate Members</h2>
115105
<div class="card-container">
106+
<div class="card bayero">
107+
<a href="https://buk.edu.ng/home" target="_blank">
108+
<div class="card-body">
109+
<img src="/assets/images/members/bayero-logo.svg" alt="Bayero University Kano logo"/>
110+
</div>
111+
</a>
112+
</div>
113+
<div class="card">
114+
<a href="https://baai.ac.cn/" target="_blank">
115+
<div class="card-body">
116+
<img src="/assets/images/members/baai-logo.svg" alt="Beijing Academy of Artificial Intelligence logo"/>
117+
</div>
118+
</a>
119+
</div>
120+
<div class="card">
121+
<a href="https://www.columbia.edu/" target="_blank">
122+
<div class="card-body">
123+
<img src="/assets/images/members/columbia-university-logo.svg" alt="Columbia University logo"/>
124+
</div>
125+
</a>
126+
</div>
116127
<div class="card onefact">
117128
<a href="https://www.onefact.org/" target="_blank">
118129
<div class="card-body">
119130
<img src="/assets/images/members/onefact-logo.svg" alt="onefact.org logo"/>
120131
</div>
121132
</a>
122133
</div>
123-
<div class="card bayero">
124-
<a href="https://buk.edu.ng/home" target="_blank">
134+
<div class="card">
135+
<a href="https://iitj.ac.in/" target="_blank">
125136
<div class="card-body">
126-
<img src="/assets/images/members/bayero-logo.svg" alt="Bayero University Kano logo"/>
137+
<h3 style="margin-top:3em; text-align:center;">PROM, IIT Rajasthan</h3>
127138
</div>
128139
</a>
129140
</div>
130-
<div class="card wedf">
131-
<a href="https://worldethicaldata.org/" target="_blank">
141+
<div class="card texas">
142+
<a href="https://www.tamu.edu/index.html" target="_blank">
132143
<div class="card-body">
133-
<img src="/assets/images/members/wedf-logo.png" alt="WEDF logo"/>
144+
<img src="/assets/images/members/texas-am-logo.svg" alt="Texas A&M University logo"/>
145+
</div>
146+
</a>
147+
</div>
148+
<div class="card">
149+
<a href="https://uci.edu/" target="_blank">
150+
<div class="card-body">
151+
<img src="/assets/images/members/university-california-logo.svg" alt="University of California, Irvine logo"/>
134152
</div>
135153
</a>
136154
</div>
@@ -141,10 +159,10 @@ <h2>Associate Members</h2>
141159
</div>
142160
</a>
143161
</div>
144-
<div class="card texas">
145-
<a href="https://www.tamu.edu/index.html" target="_blank">
162+
<div class="card wedf">
163+
<a href="https://worldethicaldata.org/" target="_blank">
146164
<div class="card-body">
147-
<img src="/assets/images/members/texas-am-logo.svg" alt="Texas A&M University logo"/>
165+
<img src="/assets/images/members/wedf-logo.png" alt="WEDF logo"/>
148166
</div>
149167
</a>
150168
</div>
@@ -158,11 +176,14 @@ <h2>Associate Members</h2>
158176
<div class="container">
159177
<div class="row content">
160178
<div class="col-md-10 body-side-text">
161-
<h2>Governance</h2>
179+
<h2>Our Governance</h2>
162180
<span></span>
163-
<p class="lead">The PyTorch Foundation’s governance structure establishes a Governing Board to oversee the Foundation’s activities
164-
according to its Guiding Principles. The technical governance structure for the PyTorch open source project
165-
is defined by the PyTorch maintainers and is available on <a href="https://pytorch.org/docs/master/community/governance.html">this page</a>.</p>
181+
<p class="lead">
182+
The PyTorch Foundation’s Governing Board oversees the Foundation’s activities according to its Guiding Principles and the <a href="/assets/pytorch-foundation-charter-04052023.pdf" target="_blank">PyTorch Foundation Charter</a>.
183+
<br />
184+
<br />
185+
The technical governance structure for the PyTorch open source project is defined by the PyTorch maintainers and is available on our <a href="https://pytorch.org/docs/main/community/governance.html">PyTorch Technical Governance page</a>.
186+
</p>
166187
</div>
167188
</div>
168189
</div>
@@ -173,9 +194,9 @@ <h2>Governance</h2>
173194
<div class="container">
174195
<div class="row content">
175196
<div class="col-md-10 body-side-text">
176-
<h2>How to Contribute</h2>
197+
<h2>How to Get Involved</h2>
177198
<span></span>
178-
<p class="lead">Join the <a href="https://pytorch.org/#community-module">PyTorch developer community</a> to contribute, learn, and get your questions answered.</p>
199+
<p class="lead">New to the PyTorch Foundation? Check out our guide to <a href="/new">getting started with the PyTorch Foundation</a> or join the PyTorch <a href="https://pytorch.org/#community-module">developer</a> or <a href="https://discuss.pytorch.org/">user</a> community to contribute, learn, and get your questions answered.</p>
179200
</div>
180201
</div>
181202
</div>

assets/images/governance.png

133 KB
Loading

assets/images/hi-po-low-bit.png

223 KB
Loading

0 commit comments

Comments
 (0)