[mlir][GPU] Fixes subgroup reduce lowering #141825

Muzammiluddin-Syed-ECE · 2025-05-28T19:04:16Z

Fixes the final reduction steps which were taken from an implementation of scan, not reduction, causing lanes earlier in the wave to have incorrect results due to masking.

Now aligning more closely with triton implementation : triton-lang/triton#5019

Hypothetical example

To provide an explanation of the issue with the current implementation, let's take the simple example of attempting to perform a sum over 64 lanes where the initial values are as follows (first lane has value 1, and all other lanes have value 0):

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

When performing a sum reduction over these 64 lanes, in the current implementation we perform 6 dpp instructions which in sequential order do the following:

sum over clusters of 2 contiguous lanes
sum over clusters of 4 contiguous lanes
sum over clusters of 8 contiguous lanes
sum over an entire row
broadcast the result of last lane in each row to the next row and each lane sums current value with incoming value.
broadcast the result of the 32nd lane to last two rows and each lane sums current value with incoming value.

After step 4) the result for the example above looks like this:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

After step 5) the result looks like this:

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

After step 6) the result looks like this:

[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Note that the correct value here is always 1, yet after the dpp.broadcast ops some lanes have incorrect values. The reason is that for these incorrect lanes, like lanes 0-15 in step 5, the dpp.broadcast op doesn't provide them incoming values from other lanes. Instead these lanes are provided either their own values, or 0 (depending on whether bound_ctrl is true or false) as values to sum over, either way these values are stale and these lanes shouldn't be used in general.

So what this means:

For a subgroup reduce over 32 lanes (like Step 5), the correct result is stored in lanes 16 to 31
For a subgroup reduce over 64 lanes (like Step 6), the correct result is stored in lanes 32 to 63.

However in the current implementation we do not specifically read the value from one of the correct lanes when returning a final value. In some workloads it seems without this specification, the stale value from the first lane is returned instead.

Actual failing test

For a specific example of how the current implementation causes issues, take a look at the IR below which represents an additive reduction over a dynamic dimension.

!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
  %cst_1 = arith.constant 0.000000e+00 : f16
  %c2_i64 = arith.constant 2 : i64
  %0 = tensor.empty() : !matB
  %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
  %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB)  {
  ^bb0(%in: f16, %out: f16):
    %7 = arith.addf %in, %out : f16
    linalg.yield %7 : f16
  } -> !matB
  return %4 : !matB
}

When provided an input of type tensor<1x2xf16> and values {0, 1} to perform the reduction over, the value returned is consistently 4. By the same analysis done above, this shows that the returned value is coming from one of these stale lanes and needs to be read instead from one of the lanes storing the correct result.

llvmbot · 2025-05-28T19:04:51Z

@llvm/pr-subscribers-mlir-gpu

Author: Muzammil (Muzammiluddin-Syed-ECE)

Changes

Fixes issues with subgroup reduce lowering producing erroneous results.

Aligning more closely with triton implementation : triton-lang/triton#5019

Full diff: https://github.com/llvm/llvm-project/pull/141825.diff

1 Files Affected:

(modified) mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp (+18-11)

diff --git a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
index 74face4291353..b7a12763057f0 100644
--- a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
@@ -447,29 +447,36 @@ createSubgroupDPPReduction(PatternRewriter &rewriter, gpu::SubgroupReduceOp op,
   if (ci.clusterSize >= 64) {
     if (chipset.majorVersion <= 9) {
       // Broadcast 31st lane value to rows 2 and 3.
-      // Use row mask to avoid polluting rows 0 and 1.
       dpp = rewriter.create<amdgpu::DPPOp>(
           loc, res.getType(), res, res, amdgpu::DPPPerm::row_bcast_31,
-          rewriter.getUnitAttr(), 0xc, allBanks,
-          /*bound_ctrl*/ false);
+          rewriter.getUnitAttr(), 0xf, allBanks,
+          /*bound_ctrl*/ true);
+      res = vector::makeArithReduction(
+          rewriter, loc, gpu::convertReductionKind(mode), dpp, res);
+      // Obtain reduction from last rows, the previous rows are polluted.
+      Value lane63 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(63));
+      res = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane63);
 
     } else if (chipset.majorVersion <= 12) {
       // Assume reduction across 32 lanes has been done.
       // Perform final reduction manually by summing values in lane 0 and
       // lane 32.
-      Value lane0 = rewriter.create<arith::ConstantOp>(
-          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(0));
-      Value lane32 = rewriter.create<arith::ConstantOp>(
-          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(32));
-      dpp = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane32);
-      res = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane0);
+      Value lane31 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(31));
+      Value lane63 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(63));
+      lane0 =
+          rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane31);
+      lane32 =
+          rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane63);
+      res = vector::makeArithReduction(
+          rewriter, loc, gpu::convertReductionKind(mode), lane31, lane63);
     } else {
       return rewriter.notifyMatchFailure(
           op, "Subgroup reduce lowering to DPP not currently supported for "
               "this device.");
     }
-    res = vector::makeArithReduction(rewriter, loc,
-                                     gpu::convertReductionKind(mode), res, dpp);
   }
   assert(res.getType() == input.getType());
   return res;

llvmbot · 2025-05-28T19:04:51Z

@llvm/pr-subscribers-mlir

Author: Muzammil (Muzammiluddin-Syed-ECE)

Changes

Fixes issues with subgroup reduce lowering producing erroneous results.

Aligning more closely with triton implementation : triton-lang/triton#5019

Full diff: https://github.com/llvm/llvm-project/pull/141825.diff

1 Files Affected:

(modified) mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp (+18-11)

diff --git a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
index 74face4291353..b7a12763057f0 100644
--- a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
@@ -447,29 +447,36 @@ createSubgroupDPPReduction(PatternRewriter &rewriter, gpu::SubgroupReduceOp op,
   if (ci.clusterSize >= 64) {
     if (chipset.majorVersion <= 9) {
       // Broadcast 31st lane value to rows 2 and 3.
-      // Use row mask to avoid polluting rows 0 and 1.
       dpp = rewriter.create<amdgpu::DPPOp>(
           loc, res.getType(), res, res, amdgpu::DPPPerm::row_bcast_31,
-          rewriter.getUnitAttr(), 0xc, allBanks,
-          /*bound_ctrl*/ false);
+          rewriter.getUnitAttr(), 0xf, allBanks,
+          /*bound_ctrl*/ true);
+      res = vector::makeArithReduction(
+          rewriter, loc, gpu::convertReductionKind(mode), dpp, res);
+      // Obtain reduction from last rows, the previous rows are polluted.
+      Value lane63 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(63));
+      res = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane63);
 
     } else if (chipset.majorVersion <= 12) {
       // Assume reduction across 32 lanes has been done.
       // Perform final reduction manually by summing values in lane 0 and
       // lane 32.
-      Value lane0 = rewriter.create<arith::ConstantOp>(
-          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(0));
-      Value lane32 = rewriter.create<arith::ConstantOp>(
-          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(32));
-      dpp = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane32);
-      res = rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane0);
+      Value lane31 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(31));
+      Value lane63 = rewriter.create<arith::ConstantOp>(
+          loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(63));
+      lane0 =
+          rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane31);
+      lane32 =
+          rewriter.create<ROCDL::ReadlaneOp>(loc, res.getType(), res, lane63);
+      res = vector::makeArithReduction(
+          rewriter, loc, gpu::convertReductionKind(mode), lane31, lane63);
     } else {
       return rewriter.notifyMatchFailure(
           op, "Subgroup reduce lowering to DPP not currently supported for "
               "this device.");
     }
-    res = vector::makeArithReduction(rewriter, loc,
-                                     gpu::convertReductionKind(mode), res, dpp);
   }
   assert(res.getType() == input.getType());
   return res;

krzysz00

Approved

pashu123 · 2025-05-28T20:25:08Z

Fixes issues with subgroup reduce lowering producing erroneous results.

Aligning more closely with triton implementation : triton-lang/triton#5019

Please explain what types of erroneous results can occur, providing an example.

Signed-off-by: Muzammiluddin Syed <[email protected]>

krzysz00 · 2025-05-28T21:17:16Z

Please explain what types of erroneous results can occur, providing an example.

To summarize the debugging situation - and to potentially provide language for the PR description - "the final reduction steps were taken from an implementation of scan, not reduction, causing lanes earlier in the wave to have incorrect results due to masking"

Muzammiluddin-Syed-ECE · 2025-05-28T21:30:36Z

Hypothetical example

To provide an explanation of the issue with the current implementation, let's take the simple example of attempting to perform a sum over 64 lanes where the initial values are as follows (first lane has value 1, and all other lanes have value 0):

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

When performing a sum reduction over these 64 lanes, in the current implementation we perform 6 dpp instructions which in sequential order do the following:

sum over clusters of 2 contiguous lanes
sum over clusters of 4 contiguous lanes
sum over clusters of 8 contiguous lanes
sum over an entire row
broadcast the result of last lane in each row to the next row and each lane sums current value with incoming value.
broadcast the result of the 32nd lane to last two rows and each lane sums current value with incoming value.

After step 4) the result for the example above looks like this:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

After step 5) the result looks like this:

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

After step 6) the result looks like this:

[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Note that the correct value here is always 1, yet after the dpp.broadcast ops some lanes have incorrect values. The reason is that for these incorrect lanes, like lanes 0-15 in step 5, the dpp.broadcast op doesn't provide them incoming values from other lanes. Instead these lanes are provided either their own values, or 0 (depending on whether bound_ctrl is true or false) as values to sum over, either way these values are stale and these lanes shouldn't be used in general.

So what this means:

For a subgroup reduce over 32 lanes (like Step 5), the correct result is stored in lanes 16 to 31
For a subgroup reduce over 64 lanes (like Step 6), the correct result is stored in lanes 32 to 63.

However in the current implementation we do not specifically read the value from one of the correct lanes when returning a final value. In some workloads it seems without this specification, the stale value from the first lane is returned instead.

Actual failing test

For a specific example of how the current implementation causes issues, take a look at the IR below which represents an additive reduction over a dynamic dimension.

!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
  %cst_1 = arith.constant 0.000000e+00 : f16
  %c2_i64 = arith.constant 2 : i64
  %0 = tensor.empty() : !matB
  %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
  %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB)  {
  ^bb0(%in: f16, %out: f16):
    %7 = arith.addf %in, %out : f16
    linalg.yield %7 : f16
  } -> !matB
  return %4 : !matB
}

When provided an input of type tensor<1x2xf16> and values {0, 1} to perform the reduction over, the value returned is consistently 4. By the same analysis done above, this shows that the returned value is coming from one of these stale lanes and needs to be read instead from one of the lanes storing the correct result.

Fixes the final reduction steps which were taken from an implementation of scan, not reduction, causing lanes earlier in the wave to have incorrect results due to masking. Now aligning more closely with triton implementation : triton-lang/triton#5019 # Hypothetical example To provide an explanation of the issue with the current implementation, let's take the simple example of attempting to perform a sum over 64 lanes where the initial values are as follows (first lane has value 1, and all other lanes have value 0): ``` [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` When performing a sum reduction over these 64 lanes, in the current implementation we perform 6 dpp instructions which in sequential order do the following: 1) sum over clusters of 2 contiguous lanes 2) sum over clusters of 4 contiguous lanes 3) sum over clusters of 8 contiguous lanes 4) sum over an entire row 5) broadcast the result of last lane in each row to the next row and each lane sums current value with incoming value. 5) broadcast the result of the 32nd lane to last two rows and each lane sums current value with incoming value. After step 4) the result for the example above looks like this: ``` [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 5) the result looks like this: ``` [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 6) the result looks like this: ``` [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] ``` Note that the correct value here is always 1, yet after the `dpp.broadcast` ops some lanes have incorrect values. The reason is that for these incorrect lanes, like lanes 0-15 in step 5, the `dpp.broadcast` op doesn't provide them incoming values from other lanes. Instead these lanes are provided either their own values, or 0 (depending on whether `bound_ctrl` is true or false) as values to sum over, either way these values are stale and these lanes shouldn't be used in general. So what this means: - For a subgroup reduce over 32 lanes (like Step 5), the correct result is stored in lanes 16 to 31 - For a subgroup reduce over 64 lanes (like Step 6), the correct result is stored in lanes 32 to 63. However in the current implementation we do not specifically read the value from one of the correct lanes when returning a final value. In some workloads it seems without this specification, the stale value from the first lane is returned instead. # Actual failing test For a specific example of how the current implementation causes issues, take a look at the IR below which represents an additive reduction over a dynamic dimension. ``` !matA = tensor<1x?xf16> !matB = tensor<1xf16> #map = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0, d1) -> (d0)> func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB { %cst_1 = arith.constant 0.000000e+00 : f16 %c2_i64 = arith.constant 2 : i64 %0 = tensor.empty() : !matB %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB) { ^bb0(%in: f16, %out: f16): %7 = arith.addf %in, %out : f16 linalg.yield %7 : f16 } -> !matB return %4 : !matB } ``` When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to perform the reduction over, the value returned is consistently 4. By the same analysis done above, this shows that the returned value is coming from one of these stale lanes and needs to be read instead from one of the lanes storing the correct result. Signed-off-by: Muzammiluddin Syed <[email protected]>

llvmbot added mlir:gpu mlir labels May 28, 2025

krzysz00 approved these changes May 28, 2025

View reviewed changes

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub2 branch 2 times, most recently from a1ecf0e to 7b7e0e5 Compare May 28, 2025 19:44

[mlir][GPU] Fixes subgroup reduce lowering

75e849e

Signed-off-by: Muzammiluddin Syed <[email protected]>

Muzammiluddin-Syed-ECE force-pushed the muzasyed/sub2 branch from 7b7e0e5 to 75e849e Compare May 28, 2025 21:12

krzysz00 merged commit 893ef7f into llvm:main May 28, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][GPU] Fixes subgroup reduce lowering #141825

[mlir][GPU] Fixes subgroup reduce lowering #141825

Uh oh!

Muzammiluddin-Syed-ECE commented May 28, 2025 •

edited by krzysz00

Loading

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

krzysz00 left a comment

Uh oh!

pashu123 commented May 28, 2025

Uh oh!

krzysz00 commented May 28, 2025

Uh oh!

Muzammiluddin-Syed-ECE commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

[mlir][GPU] Fixes subgroup reduce lowering #141825

[mlir][GPU] Fixes subgroup reduce lowering #141825

Uh oh!

Conversation

Muzammiluddin-Syed-ECE commented May 28, 2025 • edited by krzysz00 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothetical example

Actual failing test

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

pashu123 commented May 28, 2025

Uh oh!

krzysz00 commented May 28, 2025

Uh oh!

Muzammiluddin-Syed-ECE commented May 28, 2025

Hypothetical example

Actual failing test

Uh oh!

Uh oh!

Uh oh!

Muzammiluddin-Syed-ECE commented May 28, 2025 •

edited by krzysz00

Loading