Skip to content

Commit 374f7cf

Browse files
[ET-VK] Minor unroll tuning to improve conv2d pw perf. (#11187)
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11134 by @trivedivivek ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/trivedivivek/94/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/trivedivivek/94/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/trivedivivek/93/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/trivedivivek/94/orig @diff-train-skip-merge --------- Co-authored-by: Vivek Trivedi <[email protected]>
1 parent bc47f5a commit 374f7cf

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

backends/vulkan/runtime/graph/ops/glsl/conv2d_pw.glsl

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ layout(push_constant) uniform restrict Block {
3838

3939
layout(local_size_x_id = 0, local_size_y_id = 1, local_size_z_id = 2) in;
4040

41+
#extension GL_EXT_control_flow_attributes : require
42+
4143
/*
4244
* Computes a 2D pointwise convolution of an NxN output tile. Calculating an
4345
* output tile for pointwise convolution is more efficient because the kernel
@@ -105,7 +107,7 @@ void main() {
105107
float kernel_values[4 * 4]; // 4 channels, 4 elements per channel
106108

107109
// Load kernel values from texels to array
108-
for (int i = 0; i < 4; ++i) {
110+
[[unroll]] for (int i = 0; i < 4; ++i) {
109111
const vec4 k_tex = texelFetch(t_kernel, ivec2(z + i, gpos.z), 0);
110112
kernel_values[i * 4 + 0] = k_tex.x;
111113
kernel_values[i * 4 + 1] = k_tex.y;

0 commit comments

Comments
 (0)