Skip to content

Commit dd099e9

Browse files
authored
[MLIR][NVVM] Fix links in OP definition (llvm#125865)
1 parent 34c7d89 commit dd099e9

File tree

1 file changed

+38
-44
lines changed

1 file changed

+38
-44
lines changed

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

Lines changed: 38 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -477,8 +477,7 @@ def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive">
477477
The default barrier id is 0 that is similar to `nvvm.barrier` Op. When
478478
`barrierId` is not present, the default barrier id is used.
479479

480-
[For more information, see PTX ISA]
481-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar)
480+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar)
482481
}];
483482

484483
let assemblyFormat = "(`id` `=` $barrierId^)? `number_of_threads` `=` $numberOfThreads attr-dict";
@@ -504,8 +503,7 @@ def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
504503

505504
The `aligned` attribute, when provided, generates the .aligned version of the PTX instruction.
506505

507-
[For more information, see PTX ISA]
508-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
506+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
509507
}];
510508

511509
string llvmBuilder = [{
@@ -531,8 +529,7 @@ def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed"> {
531529
ordering and visibility guarantees provided for the memory accesses performed prior to
532530
`cluster.arrive`.
533531

534-
[For more information, see PTX ISA]
535-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
532+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
536533
}];
537534

538535
string llvmBuilder = [{
@@ -553,8 +550,7 @@ def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait"> {
553550
of the cluster to perform `cluster.arrive`. The `aligned` attribute, when provided,
554551
generates the .aligned version of the PTX instruction.
555552

556-
[For more information, see PTX ISA]
557-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
553+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
558554
}];
559555

560556
string llvmBuilder = [{
@@ -606,8 +602,8 @@ def NVVM_FenceProxyOp : NVVM_PTXBuilder_Op<"fence.proxy">,
606602
let description = [{
607603
Fence operation with proxy to establish an ordering between memory accesses
608604
that may happen through different proxies.
609-
[For more information, see PTX ISA]
610-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
605+
606+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
611607
}];
612608

613609
let assemblyFormat = "attr-dict";
@@ -657,8 +653,8 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
657653
value for the `size` operand is 128 and must be an immediate. Generic Addressing
658654
is used unconditionally, and the address specified by the operand `addr` must
659655
fall within the `.global` state space. Otherwise, the behavior is undefined
660-
[For more information, see PTX ISA]
661-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
656+
657+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
662658
}];
663659

664660
let assemblyFormat = "$scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
@@ -685,8 +681,8 @@ def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
685681
subsequent memory access performed via the tensormap proxy. `fence.proxy.release`
686682
operation can form a release sequence that synchronizes with an acquire
687683
sequence that contains the fence.proxy.acquire proxy fence operation
688-
[For more information, see PTX ISA]
689-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
684+
685+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
690686
}];
691687

692688
let assemblyFormat = "$scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
@@ -724,8 +720,8 @@ def NVVM_FenceMbarrierInitOp : NVVM_PTXBuilder_Op<"fence.mbarrier.init"> {
724720
let arguments = (ins );
725721
let description = [{
726722
Fence operation that applies on the prior nvvm.mbarrier.init
727-
[For more information, see PTX ISA]
728-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
723+
724+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
729725
}];
730726

731727
let assemblyFormat = "attr-dict";
@@ -768,8 +764,8 @@ def NVVM_ShflOp :
768764
the source. The `mask_and_clamp` contains two packed values specifying
769765
a mask for logically splitting warps into sub-segments and an upper bound
770766
for clamping the source lane index.
771-
[For more information, refer PTX ISA]
772-
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)
767+
768+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)
773769
}];
774770
string llvmBuilder = [{
775771
auto intId = getShflIntrinsicId(
@@ -814,8 +810,7 @@ def NVVM_ElectSyncOp : NVVM_Op<"elect.sync">
814810
of this Op. The predicate result is set to `True` for the
815811
leader thread, and `False` for all other threads.
816812

817-
[For more information, see PTX ISA]
818-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync)
813+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync)
819814
}];
820815

821816
let results = (outs I1:$pred);
@@ -899,8 +894,8 @@ def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
899894
The `addr` operand specifies the address of the mbarrier object
900895
in generic address space. The `noinc` attr impacts how the
901896
mbarrier's state is updated.
902-
[For more information, refer PTX ISA]
903-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
897+
898+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
904899
}];
905900
let assemblyFormat = "$addr attr-dict `:` type(operands)";
906901

@@ -923,8 +918,9 @@ def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.share
923918
track all prior cp.async operations initiated by the executing thread.
924919
The `addr` operand specifies the address of the mbarrier object in
925920
shared memory. The `noinc` attr impacts how the mbarrier's state
926-
is updated. [For more information, refer PTX ISA]
927-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
921+
is updated.
922+
923+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
928924
}];
929925
let assemblyFormat = "$addr attr-dict `:` type(operands)";
930926

@@ -982,8 +978,8 @@ def NVVM_CvtFloatToTF32Op : NVVM_Op<"cvt.float.to.tf32"> {
982978
The `relu` attribute, when set, lowers to the '.relu' variant of
983979
the cvt instruction. The `rnd` and `sat` attributes specify the
984980
the rounding and saturation modes respectively.
985-
[For more information, see PTX ISA]
986-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt)
981+
982+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt)
987983
}];
988984

989985
let hasVerifier = 1;
@@ -1633,8 +1629,8 @@ def NVVM_StMatrixOp: NVVM_PTXBuilder_Op<"stmatrix">,
16331629
let description = [{
16341630
Collectively store one or more matrices across all threads in a warp to the
16351631
location indicated by the address operand $ptr in shared memory.
1636-
[For more information, see PTX ISA]
1637-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)
1632+
1633+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)
16381634
}];
16391635

16401636
let assemblyFormat = "$ptr `,` $sources attr-dict `:` type(operands)";
@@ -1846,8 +1842,7 @@ def NVVM_CpAsyncBulkCommitGroupOp : NVVM_Op<"cp.async.bulk.commit.group">,
18461842
This Op commits all prior initiated but uncommitted cp.async.bulk
18471843
instructions into a cp.async.bulk-group.
18481844

1849-
[For more information, see PTX ISA]
1850-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group)
1845+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group)
18511846
}];
18521847

18531848
string llvmBuilder = [{
@@ -1871,8 +1866,7 @@ def NVVM_CpAsyncBulkWaitGroupOp : NVVM_Op<"cp.async.bulk.wait_group">,
18711866
async operations in the specified bulk async-group have completed reading
18721867
from their source locations.
18731868

1874-
[For more information, see PTX ISA]
1875-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-wait-group)
1869+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-wait-group)
18761870
}];
18771871

18781872
string llvmBuilder = [{
@@ -1917,8 +1911,7 @@ def NVVM_CpAsyncBulkTensorGlobalToSharedClusterOp :
19171911
The `l2CacheHint` operand is optional, and it is used to specify cache
19181912
eviction policy that may be used during the memory access.
19191913

1920-
[For more information, see PTX ISA]
1921-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor)
1914+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor)
19221915
}];
19231916

19241917
let assemblyFormat = [{
@@ -2034,8 +2027,7 @@ def NVVM_CpAsyncBulkTensorPrefetchOp :
20342027
The `l2CacheHint` operand is optional, and it is used to specify cache
20352028
eviction policy that may be used during the memory access.
20362029

2037-
[For more information, see PTX ISA]
2038-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-prefetch-tensor)
2030+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-prefetch-tensor)
20392031
}];
20402032

20412033
let assemblyFormat = [{
@@ -2134,8 +2126,7 @@ def NVVM_CpAsyncBulkTensorReduceOp :
21342126
The `l2CacheHint` operand is optional, and it is used to specify cache
21352127
eviction policy that may be used during the memory access.
21362128

2137-
[For more information, see PTX ISA]
2138-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor)
2129+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor)
21392130
}];
21402131

21412132
let assemblyFormat = [{
@@ -2194,8 +2185,8 @@ def NVVM_CpAsyncBulkGlobalToSharedClusterOp :
21942185

21952186
The `l2CacheHint` operand is optional, and it is used to specify cache
21962187
eviction policy that may be used during the memory access.
2197-
[For more information, see PTX ISA]
2198-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
2188+
2189+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
21992190
}];
22002191

22012192
let arguments = (ins
@@ -2252,8 +2243,7 @@ def NVVM_CpAsyncBulkSharedCTAToSharedClusterOp :
22522243
Initiates an asynchronous copy operation from Shared CTA memory to Shared
22532244
cluster memory.
22542245

2255-
[For more information, see PTX ISA]
2256-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
2246+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
22572247
}];
22582248

22592249
let arguments = (ins
@@ -2283,8 +2273,8 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
22832273

22842274
The `l2CacheHint` operand is optional, and it is used to specify cache
22852275
eviction policy that may be used during the memory access.
2286-
[For more information, see PTX ISA]
2287-
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
2276+
2277+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk)
22882278
}];
22892279

22902280
let arguments = (ins
@@ -2524,6 +2514,8 @@ def NVVM_GriddepcontrolWaitOp : NVVM_IntrOp<"griddepcontrol.wait", [], 0> {
25242514
Causes the executing thread to wait until all prerequisite grids in flight
25252515
have completed and all the memory operations from the prerequisite grids
25262516
are performed and made visible to the current grid.
2517+
2518+
25272519
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol)
25282520
}];
25292521
}
@@ -2536,6 +2528,8 @@ def NVVM_GriddepcontrolLaunchDependentsOp
25362528
Signals that specific dependents the runtime system designated to react to
25372529
this instruction can be scheduled as soon as all other CTAs in the grid
25382530
issue the same instruction or have completed.
2531+
2532+
25392533
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol)
25402534
}];
25412535
}

0 commit comments

Comments
 (0)