Skip to content

Commit 544f409

Browse files
smesoggerganov
authored andcommitted
vulkan : argsort barriers must be under uniform control flow (ggml/951)
a return before a barrier (that happens only in some threads in a workgroup) leads to UB. While the old code actually works on some devices, it fails on some others (i.e. "smaller" GPUs). BTW, I think it would be better to set specialization constants when the graph is built, in that way the local workgroup could be sized appropriately. But it would take a lot of work. Signed-off-by: Salvatore Mesoraca <[email protected]>
1 parent 6084bfb commit 544f409

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

ggml/src/vulkan-shaders/argsort.comp

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,20 +29,18 @@ void main() {
2929
const int col = int(gl_LocalInvocationID.x);
3030
const uint row = gl_WorkGroupID.y;
3131

32-
if (col >= p.ncols_pad) {
33-
return;
34-
}
35-
3632
const uint row_offset = row * p.ncols;
3733

3834
// initialize indices
39-
dst_row[col] = col;
35+
if (col < p.ncols_pad) {
36+
dst_row[col] = col;
37+
}
4038
barrier();
4139

4240
for (uint k = 2; k <= p.ncols_pad; k *= 2) {
4341
for (uint j = k / 2; j > 0; j /= 2) {
4442
const uint ixj = col ^ j;
45-
if (ixj > col) {
43+
if (col < p.ncols_pad && ixj > col) {
4644
if ((col & k) == 0) {
4745
if (dst_row[col] >= p.ncols ||
4846
(dst_row[ixj] < p.ncols && (p.order == ASC ?

0 commit comments

Comments
 (0)