Update on "[ET-VK] Clean up shader library and introduce some new conventions"

SS-JIA · SS-JIA · commit 8a6ae21ec31b · 2024-04-15T09:35:57.000-07:00
## Context This changeset introduces some fairly mechnical improvements to the Vulkan compute graph shader library in order to introduce some new conventions. **Note that backwards compatibility with existing shader authoring methods is preserved**. ### Only List `VALUE` in the `.yaml` files Previously, to generate variants for a combination of vales, the YAML file will contain ``` PACKING: - VALUE: CHANNELS_PACKED SUFFIX: C_packed - VALUE: WIDTH_PACKED SUFFIX: W_packed - VALUE: HEIGHT_PACKED SUFFIX: H_packed ``` however, the shader code generation script will use the `VALUE` as the `SUFFIX` if no `SUFFIX` is provided. Therefore, only the below is needed: ``` PACKING: - VALUE: C_packed - VALUE: W_packed - VALUE: H_packed ``` ### Change indexing utility macros to lowercase Indexing utility macros have been changed to lowercase, and the packing identifiers have been changed due to the change in YAML files. The change to lowercase is to make calls to the macro read more like functions (and indeed they are typically used as functions) in order to help make the code more readable. ``` POS_TO_COORD_${PACKING} -> pos_to_coord_${PACKING} ``` ### Use convention of defining macros in order to reduce Python code blocks usage Previously python code blocks were used in the GLSL code itself in order to vary the shader between different settings. However, usage of Python code blocks negatively impact code readability. Therefore, this diff seeks to introduce a convention of defining macros near the top of the shader to reduce the usage of Python code blocks, i.e. ``` #define pos_to_coord pos_to_coord_${PACKING} #define get_packed_dim get_packed_dim_${PACKING} #define get_packed_stride get_packed_stride_${PACKING} ``` ### Improve GLSL type definitions Previously, the following Python code blocks were used to determine appropriate vectorized and scalar types: ``` ${VEC4_T[DTYPE}} texel = ... ${T[DTYPE]} scalar = ... ``` This changeset replaces that with: ``` #define BUF_T ${buffer_scalar_type(DTYPE)} #define VEC4_T ${texel_type(DTYPE)} #define SCALAR_T ${texel_component_type(DTYPE)} layout(set = 0, binding = 1) buffer PRECISION restrict readonly Buffer { BUF_T data[]; } buffer_in; VEC4_T texel = ... SCALAR_T scalar = ... ``` The main differences are as such: * `buffer_scalar_type()` produces the same result as `T[DTYPE]` * `texel_type()` is not determined from a mapping with `DTYPE`, but is determined indirectly based on the image format that is associated with the `DTYPE`. * `texel_component_type()` is based on the result of `texel_type(DTYPE)` Essentially, the mapping is more in-line with what happens in code. The reason for this change is to enable FP16 support and is a bit complicated. Basically, we need a way to distinguish the scalar type used for buffer storage, vs the scalar type used to store a component of a vec4 type (hence `BUF_T` vs `SCALAR_T`). The reason this is required is that to support half-precision tensors, the buffer representation will use a 16-bit float type but textures will still extract to `vec4` (i.e. 4x34bit floats). Differential Revision: [D56082461](https://our.internmc.facebook.com/intern/diff/D56082461/) [ghstack-poisoned]
diff --git a/.ci/docker/ci_commit_pins/pytorch.txt b/.ci/docker/ci_commit_pins/pytorch.txt
@@ -1 +1 @@
-0a038cf0cff2d071b7359ac0491fd2ba7798a438
+868e5ced5df34f1aef3703654f76e03f5126b534
diff --git a/backends/vulkan/runtime/api/Adapter.cpp b/backends/vulkan/runtime/api/Adapter.cpp
@@ -401,8 +401,7 @@ std::string Adapter::stringize() const {
   ss << "  Memory Info {" << std::endl;
   ss << "    Memory Types [" << std::endl;
   for (size_t i = 0; i < mem_props.memoryTypeCount; ++i) {
-    ss << "      "
-       << " [Heap " << mem_props.memoryTypes[i].heapIndex << "] "
+    ss << "      " << " [Heap " << mem_props.memoryTypes[i].heapIndex << "] "
        << get_memory_properties_str(mem_props.memoryTypes[i].propertyFlags)
        << std::endl;
   }
diff --git a/backends/vulkan/runtime/api/gen_vulkan_spv.py b/backends/vulkan/runtime/api/gen_vulkan_spv.py
@@ -98,6 +98,15 @@
 }
 
 
+def define_variable(name: str) -> str:
+    if name in locals():
+        return f"#define {name} {locals()[name]}"
+    elif name in globals():
+        return f"#define {name} {globals()[name]}"
+    else:
+        raise RuntimeError(f"{name} is not defined")
+
+
 def get_buffer_scalar_type(dtype: str) -> str:
     # TODO(ssjia): use float16_t for half types
     if dtype == "half":
@@ -120,6 +129,11 @@ def get_texel_type(dtype: str) -> str:
     raise AssertionError(f"Invalid image format: {image_format}")
 
 
+def get_gvec_type(dtype: str, n: int) -> str:
+    gvec4_type = get_texel_type(dtype)
+    return gvec4_type[:-1] + str(n)
+
+
 def get_texel_component_type(dtype: str) -> str:
     vec4_type = get_texel_type(dtype)
     if vec4_type[:3] == "vec":
@@ -132,12 +146,14 @@ def get_texel_component_type(dtype: str) -> str:
 
 
 UTILITY_FNS: Dict[str, Any] = {
+    "macro_define": define_variable,
     "get_pos": {
         3: lambda pos: pos,
         2: lambda pos: f"{pos}.xy",
     },
     "buffer_scalar_type": get_buffer_scalar_type,
     "texel_type": get_texel_type,
+    "gvec_type": get_gvec_type,
     "texel_component_type": get_texel_component_type,
 }
 
diff --git a/backends/vulkan/runtime/graph/ops/OperatorRegistry.cpp b/backends/vulkan/runtime/graph/ops/OperatorRegistry.cpp
@@ -16,7 +16,9 @@ bool OperatorRegistry::has_op(const std::string& name) {
 
 OperatorRegistry::OpFunction& OperatorRegistry::get_op_fn(
     const std::string& name) {
-  return table_.find(name)->second;
+  const auto it = table_.find(name);
+  VK_CHECK_COND(it != table_.end(), "Could not find operator with name ", name);
+  return it->second;
 }
 
 void OperatorRegistry::register_op(const std::string& name, OpFunction& fn) {
diff --git a/backends/vulkan/runtime/graph/ops/glsl/binary_op.glsl b/backends/vulkan/runtime/graph/ops/glsl/binary_op.glsl
@@ -9,9 +9,11 @@
 #version 450 core
 
 #define PRECISION ${PRECISION}
+
 #define op(X, Y, A) ${OPERATOR}
 
 #define VEC4_T ${texel_type(DTYPE)}
+
 #define to_tensor_idx to_tensor_idx_${PACKING}
 #define to_texture_pos to_texture_pos_${PACKING}
 
@@ -59,13 +61,13 @@ void main() {
     return;
   }
 
-  ivec4 in_idx = broadcast(idx, in_sizes.data);
+  ivec4 in_idx = broadcast_indices(idx, in_sizes.data);
   VEC4_T in_texel = VEC4_T(texelFetch(
     image_in,
     to_texture_pos(in_idx, in_sizes.data),
     0));
 
-  ivec4 other_idx = broadcast(idx, other_sizes.data);
+  ivec4 other_idx = broadcast_indices(idx, other_sizes.data);
   VEC4_T other_texel = VEC4_T(texelFetch(
     image_other,
     to_texture_pos(other_idx, other_sizes.data),
diff --git a/backends/vulkan/runtime/graph/ops/glsl/broadcasting_utils.h b/backends/vulkan/runtime/graph/ops/glsl/broadcasting_utils.h
@@ -6,7 +6,7 @@
  * LICENSE file in the root directory of this source tree.
  */
 
-ivec4 broadcast(const ivec4 out_idx, const ivec4 in_sizes) {
+ivec4 broadcast_indices(const ivec4 out_idx, const ivec4 in_sizes) {
   ivec4 in_idx = out_idx;
   for (int i = 0; i < 4; ++i) {
     if (out_idx[i] >= in_sizes[i]) {
diff --git a/backends/vulkan/runtime/graph/ops/glsl/conv2d.glsl b/backends/vulkan/runtime/graph/ops/glsl/conv2d.glsl
@@ -71,10 +71,10 @@ void main() {
   ivec2 kstart = (start - ipos) / params.dilation;
   // During prepacking, the weight tensor was rearranged in order to optimize
   // for data access linearity in this shader. Therefore we need to adjust the
-  // canonical idxinates to the corresponding index in the rearranged weight
-  // tensor. The x-idxinate is multipled by 4 since each group of 4 channels
-  // is folded into the X axis. The y-idxinate is offset based on the z-
-  // idxinate because the 2D planes were stacked atop each other vertically.
+  // canonical coordinates to the corresponding index in the rearranged weight
+  // tensor. The x-coordinate is multipled by 4 since each group of 4 channels
+  // is folded into the X axis. The y-coordinate is offset based on the z-
+  // coordinate because the 2D planes were stacked atop each other vertically.
   kstart.x *= 4;
   kstart.y += pos.z * params.kernel_size.y;
 
diff --git a/backends/vulkan/runtime/graph/ops/glsl/indexing_utils.h b/backends/vulkan/runtime/graph/ops/glsl/indexing_utils.h
@@ -43,11 +43,11 @@
 // describe sizes. As an example, let's say we want to swap dimensions 0,1 for a
 // tensor of shape {4,3,2,24} to obtain {3,4,2,24}. Then, x=4, y=3 and
 // plane=2*24=48.
-#define swap_adj_dims(cur, x, y, plane)                       \
-  cur +                                                       \
-      plane*(                                                 \
-          (1 - y) * ((cur % (x * y * plane)) / (y * plane)) + \
-          (x - 1) * ((cur % (y * plane)) / plane))
+#define swap_adj_dims(cur, x, y, plane)                        \
+  cur +                                                        \
+      plane *                                                  \
+          ((1 - y) * ((cur % (x * y * plane)) / (y * plane)) + \
+           (x - 1) * ((cur % (y * plane)) / plane))
 
 // Kept for backwards compatibility
 // TODO(ssjia): remove once there are no shaders that use these macros
diff --git a/docs/README.md b/docs/README.md
@@ -57,7 +57,11 @@ To build the documentation locally:
    ```bash
    pip3 install -r ./.ci/docker/requirements-ci.txt
    ```
+1. Update submodules
 
+   ```bash
+   git submodule sync && git submodule update --init
+   ```
 1. Run:
 
    ```bash
diff --git a/examples/models/llama2/runner/runner.cpp b/examples/models/llama2/runner/runner.cpp
@@ -472,8 +472,7 @@ std::string statsToJsonString(const Runner::Stats& stats) {
      << "\"prompt_eval_end_ms\":" << stats.prompt_eval_end_ms << ","
      << "\"first_token_ms\":" << stats.first_token_ms << ","
      << "\"aggregate_sampling_time_ms\":" << stats.aggregate_sampling_time_ms
-     << ","
-     << "\"SCALING_FACTOR_UNITS_PER_SECOND\":"
+     << "," << "\"SCALING_FACTOR_UNITS_PER_SECOND\":"
      << stats.SCALING_FACTOR_UNITS_PER_SECOND << "}";
   return ss.str();
 }
diff --git a/kernels/portable/cpu/op_cumsum.cpp b/kernels/portable/cpu/op_cumsum.cpp
@@ -11,8 +11,8 @@
 #include <executorch/runtime/platform/assert.h>
 #include <cmath>
 #include <cstddef>
-//#include <cstdint>
-//#include <type_traits>
+// #include <cstdint>
+// #include <type_traits>
 
 namespace torch {
 namespace executor {
diff --git a/runtime/core/portable_type/optional.h b/runtime/core/portable_type/optional.h
@@ -74,8 +74,8 @@ class optional final {
   }
 
   optional& operator=(optional&& rhs) noexcept(
-      std::is_nothrow_move_assignable<T>::value&&
-          std::is_nothrow_move_constructible<T>::value) {
+      std::is_nothrow_move_assignable<T>::value &&
+      std::is_nothrow_move_constructible<T>::value) {
     if (init_ && !rhs.init_) {
       clear();
     } else if (!init_ && rhs.init_) {
diff --git a/sdk/etdump/etdump_flatcc.cpp b/sdk/etdump/etdump_flatcc.cpp
@@ -103,7 +103,8 @@ ETDumpGen::ETDumpGen(Span<uint8_t> buffer) {
     alloc.set_buffer(
         (uint8_t*)buffer_with_builder,
         buffer_size,
-        (size_t)((buffer_size / 4 > max_alloc_buf_size) ? max_alloc_buf_size : buffer_size / 4));
+        (size_t)((buffer_size / 4 > max_alloc_buf_size) ? max_alloc_buf_size
+                                                        : buffer_size / 4));
     et_flatcc_custom_init(builder, &alloc);
   } else {
     builder = (struct flatcc_builder*)malloc(sizeof(struct flatcc_builder));
diff --git a/third-party/pytorch b/third-party/pytorch
@@ -1 +1 @@
-Subproject commit e3ea31662334770bbbb6da4abd881abc875e04c3
+Subproject commit 868e5ced5df34f1aef3703654f76e03f5126b534

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-0a038cf0cff2d071b7359ac0491fd2ba7798a438`
	`1`	`+868e5ced5df34f1aef3703654f76e03f5126b534`
Original file line number	Diff line number	Diff line change
`@@ -16,7 +16,9 @@ bool OperatorRegistry::has_op(const std::string& name) {`
`16`	`16`
`17`	`17`	`OperatorRegistry::OpFunction& OperatorRegistry::get_op_fn(`
`18`	`18`	`const std::string& name) {`
`19`		`- return table_.find(name)->second;`
	`19`	`+ const auto it = table_.find(name);`
	`20`	`+ VK_CHECK_COND(it != table_.end(), "Could not find operator with name ", name);`
	`21`	`+ return it->second;`
`20`	`22`	`}`
`21`	`23`
`22`	`24`	`void OperatorRegistry::register_op(const std::string& name, OpFunction& fn) {`