Skip to content

Commit 7c92551

Browse files
committed
Update on "[ET-VK] Simplifying conv1d op shader by changing it to process one output texel per thread."
This diff changes conv1d shader to process one output texel per thread, increasing GPU occupancy and improve performance. Differential Revision: [D74097560](https://our.internmc.facebook.com/intern/diff/D74097560/) [ghstack-poisoned]
1 parent fac54f0 commit 7c92551

File tree

2 files changed

+1
-5
lines changed

2 files changed

+1
-5
lines changed

backends/vulkan/runtime/graph/ops/glsl/conv1d.glsl

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,6 @@ const lowp ivec4 bias_axis_map = unhash_axis_map(bias_layout);
5959
// This implementation performs N x out_C x out_L shader invocations, where each invocation
6060
// calculates the rolling kernel of the length dimension for each batch, i.e.,
6161
// computes out_L results.
62-
//
63-
// Note that we can rewrite this implementation as out_L * out_C * ceil(N / 4)
64-
// shader invocations, where each invocation computes 1 result. But that
65-
// performs worse.
6662
void main() {
6763
const ivec3 lpos = ivec3(gl_GlobalInvocationID);
6864

backends/vulkan/runtime/graph/ops/impl/Convolution.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,7 @@ void add_conv1d_node(
520520
// out channels
521521
static_cast<uint32_t>(out_channels),
522522
// out batches
523-
graph.size_at<uint32_t>(-3, out)};
523+
utils::div_up_4(graph.size_at<uint32_t>(-3, out))};
524524
const utils::uvec3 local_size = graph.create_local_wg_size(global_size);
525525

526526
Kernel1dParams kernel_params = {

0 commit comments

Comments
 (0)