[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

charithaintc · 2025-06-03T23:00:38Z

Changes:

Decouple layout propagation from subgroup distribution and move it to an independent pass.
Refine layout assignment to handle control-flow ops correctly (scf.for).
Refine test cases.

charithaintc · 2025-06-16T17:20:27Z

@adam-smnk sorry to bother you. :-P would you have some time to give me a general review on this or approve? :-)

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

chencha3 · 2025-06-16T19:49:04Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

+                                              nullptr);
+  terminator.getSuccessorRegions(operands, successors);
+
+  for (mlir::RegionSuccessor &successor : successors) {


Does this also handle the block arguments of forOp, which is also handled by BranchOpInterface? updateBranchOpInterface maybe not needed in this case.

great point. I checked this again and seems like you are right. BranchOpInterface is doing some redundant work here.

I removed updateBranchOpInterface and also added some comment with example to clarify the logic.

one caveat is that we still need to a check for region ops and filter them out (done in updateOp).

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

adam-smnk

Minor comments
General structure looks good, I'll have to leave details to @chencha3 but I'll try to make another pass later

mlir/test/Dialect/XeGPU/layout-propagate.mlir

mlir/test/Dialect/XeGPU/subgroup-distribute.mlir

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

chencha3

LGTM with some nit suggestions.

chencha3 · 2025-06-17T14:12:37Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

+/// clang-format on
+/// In this example, at scf.yield, control-flow can transfer to successor
+/// regions. One is the ^bb0 (for loop body) and the other is the scf.for op
+/// itself (yield the results). So we update both the block arguments of the


This may need to double check. My understanding is that the other is not scf.for itself, it is the region right following the scf.for.

after out discussion I updated the comment again. thanks for the suggestions.

chencha3 · 2025-06-17T14:18:41Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

@@ -12,6 +12,8 @@
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/GPU/Utils/DistributionUtils.h"


nit: some headers seem not needed. Could you help to clean them? make the header includes as less as possible.

thanks. I cleaned up unused headers shown by vscode. let me know if there is a better way to find this out.

chencha3 · 2025-06-17T14:19:28Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Analysis/DataFlow/ConstantPropagationAnalysis.h"


clean up un-related header includes.

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

charithaintc · 2025-06-18T22:32:47Z

Hi @adam-smnk, I addressed your comments. Do you have any other concerns? If not please approve this PR. :-)

llvm-ci · 2025-06-20T17:49:24Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building mlir at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/14477

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x56c9b7612df0 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x56c9b7612df0 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

charithaintc added 25 commits May 27, 2025 23:40

add bug fix

ff1012e

add test

c6eb53f

add comments

3bdb596

Merge branch 'main' into scf_for_bug

6d47e3f

remove unsused headers

fe3ab99

save work

f91b64c

Merge branch 'main' into scf_for_bug

cc621a1

Merge branch 'scf_for_bug' into fix_layout_assign

76f7d98

initial version

5cacace

working version

7d54194

working expect for unreal cast

b289399

some fixes

4318343

branch terminator iface

20a6415

save work

7bd0be2

working

00dc2b6

move out layout prop

35620ec

fix test

92c23f1

fix names

7b69082

func op iface support

5669616

fix test

71902aa

fix test

341daff

revert merge

fdacb63

add comment

57acc9e

fix

d7eaaa5

refactor

a99ee75

charithaintc marked this pull request as ready for review June 5, 2025 21:15

charithaintc requested review from nbpatel, chencha3, adam-smnk and fschlimb June 5, 2025 21:15

charithaintc added 2 commits June 13, 2025 22:51

Merge branch 'main' into fix_layout_assign

559bf74

Merge branch 'main' into fix_layout_assign

c34f401

chencha3 reviewed Jun 16, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

chencha3 reviewed Jun 16, 2025

View reviewed changes

Garra1980 reviewed Jun 16, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

Garra1980 reviewed Jun 16, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

charithaintc added 4 commits June 16, 2025 21:10

address comments

c4dd5a5

address comments

2c66eac

address comments

5705d74

Merge branch 'main' into fix_layout_assign

f73f237

adam-smnk reviewed Jun 17, 2025

View reviewed changes

mlir/test/Dialect/XeGPU/layout-propagate.mlir Outdated Show resolved Hide resolved

mlir/test/Dialect/XeGPU/subgroup-distribute.mlir Outdated Show resolved Hide resolved

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

chencha3 approved these changes Jun 17, 2025

View reviewed changes

charithaintc added 5 commits June 17, 2025 19:35

Merge branch 'main' into fix_layout_assign

10f9b49

chnage pass name

d842d3a

fix line breaks in test

f091519

fix comment in region ops

0ac7162

remove unused headers

caca184

adam-smnk reviewed Jun 18, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp Outdated Show resolved Hide resolved

charithaintc added 3 commits June 18, 2025 16:31

Merge branch 'main' into fix_layout_assign

e49fc63

fix conflict

0111b9f

add option to print layout results

4de7cab

adam-smnk approved these changes Jun 19, 2025

View reviewed changes

charithaintc and others added 4 commits June 20, 2025 16:46

Merge branch 'main' into fix_layout_assign

10936bc

fix conflict

3a26509

Merge branch 'main' into fix_layout_assign

cc956c6

Merge branch 'main' into fix_layout_assign

65dd453

charithaintc merged commit adc6228 into llvm:main Jun 20, 2025
5 of 7 checks passed

		@@ -12,6 +12,8 @@
		#include "mlir/Dialect/GPU/IR/GPUDialect.h"
		#include "mlir/Dialect/GPU/Utils/DistributionUtils.h"

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

Uh oh!

Conversation

charithaintc commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charithaintc commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chencha3 Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chencha3 left a comment

Choose a reason for hiding this comment

Uh oh!

chencha3 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

chencha3 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

chencha3 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charithaintc commented Jun 18, 2025

Uh oh!

Uh oh!

llvm-ci commented Jun 20, 2025

Uh oh!

Uh oh!

charithaintc commented Jun 3, 2025 •

edited

Loading

chencha3 Jun 16, 2025 •

edited

Loading