[mlir][vector] Update docs for scalable vectors (#101842)

banach-space · web-flow · commit 673604a5398d · 2024-08-06T11:52:55.000+01:00
Adds a few notes on scalable vectors in the docs for the Vector dialect. This is mostly "repeating" things from LLVM's LangRef. Additionally: * Adds a few basic tests with scalable vectors (those should've been added long time ago), * Updates a comment in "TypeConverter.cpp" (the current comment is out-of-date), * Includes small formatting edits in Vector.md. **NOTE** Depends on #101813 - only review the top commit
diff --git a/mlir/docs/Dialects/Vector.md b/mlir/docs/Dialects/Vector.md
@@ -74,12 +74,30 @@ following top-down rewrites and conversions:
 ### LLVM level
 
 On CPU, the `n-D` `vector` type currently lowers to `!llvm<array<vector>>`.
-More concretely, `vector<4x8x128xf32>` lowers to `!llvm<[4 x [ 8 x [ 128 x
-float ]]]>`. There are tradeoffs involved related to how one can access
-subvectors and how one uses `llvm.extractelement`, `llvm.insertelement` and
-`llvm.shufflevector`. The section on [LLVM Lowering
-Tradeoffs](#llvm-lowering-tradeoffs) offers a deeper dive into the current
-design choices and tradeoffs.
+More concretely,
+* `vector<4x8x128xf32>` lowers to `!llvm<[4 x [ 8 x < 128
+x float >]]>` (fixed-width vector), and
+* `vector<4x8x[128]xf32>` lowers to `!llvm<[4 x [ 8 x < vscale x 128
+x float >]]>` (scalable vector).
+
+There are tradeoffs involved related to how one can access subvectors and how
+one uses `llvm.extractelement`, `llvm.insertelement` and `llvm.shufflevector`.
+The section on [LLVM Lowering Tradeoffs](#llvm-lowering-tradeoffs) offers a
+deeper dive into the current design choices and tradeoffs.
+
+Note, while LLVM supports arrarys of scalable vectors, these are required to be
+fixed-width arrays of 1-D scalable vectors. This means scalable vectors with a
+non-trailing scalable dimension (e.g. `vector<4x[8]x128xf32`) are not
+convertible to LLVM.
+
+Finally, MLIR takes the same view on scalable Vectors as LLVM (c.f. (Vector
+Type)[https://llvm.org/docs/LangRef.html#vector-type]):
+> For scalable vectors, the total number of elements is a constant multiple
+> (called vscale) of the specified number of elements; vscale is a positive
+> integer that is unknown at compile time and the same hardware-dependent
+> constant for all scalable vectors at run time.  The size of a specific
+> scalable vector type is thus constant within IR, even if the exact size in
+> bytes cannot be determined until run time.
 
 ### Hardware Vector Ops
 
@@ -269,11 +287,6 @@ proposal for now, this assumes LLVM only has built-in support for 1-D vector.
 The relationship with the LLVM Matrix proposal is discussed at the end of this
 document.
 
-MLIR does not currently support dynamic vector sizes (i.e. SVE style) so the
-discussion is limited to static rank and static vector sizes (e.g.
-`vector<4x8x16x32xf32>`). This section discusses operations on vectors in LLVM
-and MLIR.
-
 LLVM instructions are prefixed by the `llvm.` dialect prefix (e.g.
 `llvm.insertvalue`). Such ops operate exclusively on 1-D vectors and aggregates
 following the [LLVM LangRef](https://llvm.org/docs/LangRef.html). MLIR
@@ -287,10 +300,11 @@ Consider a vector of rank n with static sizes `{s_0, ... s_{n-1}}` (i.e. an MLIR
 `vector<s_0x...s_{n-1}xf32>`). Lowering such an `n-D` MLIR vector type to an
 LLVM descriptor can be done by either:
 
-1.  Flattening to a `1-D` vector: `!llvm<"(s_0*...*s_{n-1})xfloat">` in the MLIR
+1.  Nested aggregate type of `1-D` vector:
+    `!llvm."[s_0x[s_1x[...<s_{n-1}xf32>]]]">` in the MLIR LLVM dialect (current
+    lowering in MLIR).
+2.  Flattening to a `1-D` vector: `!llvm<"(s_0*...*s_{n-1})xfloat">` in the MLIR
     LLVM dialect.
-2.  Nested aggregate type of `1-D` vector:
-    `!llvm."[s_0x[s_1x[...<s_{n-1}xf32>]]]">` in the MLIR LLVM dialect.
 3.  A mix of both.
 
 There are multiple tradeoffs involved in choosing one or the other that we
@@ -303,9 +317,11 @@ vector<4x8x16x32xf32> to vector<4x4096xf32>` operation, that flattens the most
 
 The first constraint was already mentioned: LLVM only supports `1-D` `vector`
 types natively. Additional constraints are related to the difference in LLVM
-between vector and aggregate types: `“Aggregate Types are a subset of derived
-types that can contain multiple member types. Arrays and structs are aggregate
-types. Vectors are not considered to be aggregate types.”.`
+between vector and
+[aggregate types](https://llvm.org/docs/LangRef.html#aggregate-types):
+> Aggregate Types are a subset of derived types that can contain multiple
+> member types. Arrays and structs are aggregate types. Vectors are not
+> considered to be aggregate types.
 
 This distinction is also reflected in some of the operations. For `1-D` vectors,
 the operations `llvm.extractelement`, `llvm.insertelement`, and
@@ -314,12 +330,15 @@ vectors with `n>1`, and thus aggregate types at LLVM level, the more restrictive
 operations `llvm.extractvalue` and `llvm.insertvalue` apply, which only accept
 static indices. There is no direct shuffling support for aggregate types.
 
-The next sentence illustrates a recurrent tradeoff, also found in MLIR, between
+The next sentence (cf. LangRef [structure
+type](https://llvm.org/docs/LangRef.html#structure-type)) illustrates a
+recurrent tradeoff, also found in MLIR, between
 “value types” (subject to SSA use-def chains) and “memory types” (subject to
-aliasing and side-effects): `“Structures in memory are accessed using ‘load’ and
-‘store’ by getting a pointer to a field with the llvm.getelementptr instruction.
-Structures in registers are accessed using the llvm.extractvalue and
-llvm.insertvalue instructions.”`
+aliasing and side-effects):
+> Structures in memory are accessed using ‘load’ and ‘store’ by getting a
+> pointer to a field with the llvm.getelementptr instruction. Structures in
+> registers are accessed using the llvm.extractvalue and llvm.insertvalue
+> instructions.
 
 When transposing this to MLIR, `llvm.getelementptr` works on pointers to `n-D`
 vectors in memory. For `n-D`, vectors values that live in registers we can use
diff --git a/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp b/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
@@ -509,8 +509,8 @@ Type LLVMTypeConverter::convertMemRefToBarePtr(BaseMemRefType type) const {
 ///  * 1-D `vector<axT>` remains as is while,
 ///  * n>1 `vector<ax...xkxT>` convert via an (n-1)-D array type to
 ///    `!llvm.array<ax...array<jxvector<kxT>>>`.
-/// Returns failure for n-D scalable vector types as LLVM does not support
-/// arrays of scalable vectors.
+/// As LLVM supports arrays of scalable vectors, this method will also convert
+/// n-D scalable vectors provided that only the trailing dim is scalable.
 FailureOr<Type> LLVMTypeConverter::convertVectorType(VectorType type) const {
   auto elementType = convertType(type.getElementType());
   if (!elementType)
@@ -521,7 +521,9 @@ FailureOr<Type> LLVMTypeConverter::convertVectorType(VectorType type) const {
                                     type.getScalableDims().back());
   assert(LLVM::isCompatibleVectorType(vectorType) &&
          "expected vector type compatible with the LLVM dialect");
-  // Only the trailing dimension can be scalable.
+  // For n-D vector types for which a _non-trailing_ dim is scalable,
+  // return a failure. Supporting such cases would require LLVM
+  // to support something akin "scalable arrays" of vectors.
   if (llvm::is_contained(type.getScalableDims().drop_back(), true))
     return failure();
   auto shape = type.getShape();
diff --git a/mlir/test/Dialect/LLVMIR/types.mlir b/mlir/test/Dialect/LLVMIR/types.mlir
@@ -91,6 +91,10 @@ func.func @array() {
   "some.op"() : () -> !llvm.array<10 x ptr<4>>
   // CHECK: !llvm.array<10 x array<4 x f32>>
   "some.op"() : () -> !llvm.array<10 x array<4 x f32>>
+  // CHECK: !llvm.array<10 x array<4 x vector<8xf32>>>
+  "some.op"() : () -> !llvm.array<10 x array<4 x vector<8xf32>>>
+  // CHECK: !llvm.array<10 x array<4 x vector<[8]xf32>>>
+  "some.op"() : () -> !llvm.array<10 x array<4 x vector<[8]xf32>>>
   return
 }
 
diff --git a/mlir/test/Target/LLVMIR/llvmir-types.mlir b/mlir/test/Target/LLVMIR/llvmir-types.mlir
@@ -99,6 +99,10 @@ llvm.func @return_a8_float() -> !llvm.array<8 x f32>
 llvm.func @return_a10_p_4() -> !llvm.array<10 x ptr<4>>
 // CHECK: declare [10 x [4 x float]] @return_a10_a4_float()
 llvm.func @return_a10_a4_float() -> !llvm.array<10 x array<4 x f32>>
+// CHECK: declare [10 x [4 x <4 x float>]] @return_a10_a4_v4_float()
+llvm.func @return_a10_a4_v4_float() -> !llvm.array<10 x array<4 x vector<4xf32>>>
+// CHECK: declare [10 x [4 x <vscale x 4 x float>]] @return_a10_a4_sv4_float()
+llvm.func @return_a10_a4_sv4_float() -> !llvm.array<10 x array<4 x vector<[4]xf32>>>
 
 //
 // Literal structures.