Update on "[ET-VK][int4] Wrap int4 linear calls with view_copy nodes to squeeze/unsqueeze inputs"

Nathanael See · Nathanael See · commit 4eeea39109f4 · 2025-02-05T14:19:36.000-08:00
This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) [ghstack-poisoned]
diff --git a/backends/vulkan/_passes/squeeze_int4_linear_inputs.py b/backends/vulkan/_passes/squeeze_int4_linear_inputs.py
@@ -62,4 +62,3 @@ def _squeezable(shape: List[int]) -> bool:
             kwargs,
             meta,
         )
-    
diff --git a/backends/vulkan/runtime/graph/ops/impl/QuantizedLinear.cpp b/backends/vulkan/runtime/graph/ops/impl/QuantizedLinear.cpp
@@ -352,8 +352,8 @@ void add_q_4w_linear_node(
       local_wg_size,
       // Inputs and Outputs
       {{out_W_packed, vkapi::MemoryAccessType::WRITE},
-       {{mat1_W_packed, mat2, scales_and_zeros}, 
-       vkapi::MemoryAccessType::READ}},
+       {{mat1_W_packed, mat2, scales_and_zeros},
+        vkapi::MemoryAccessType::READ}},
       // Shader params buffers
       ubos,
       // Specialization Constants

Original file line number	Diff line number	Diff line change
`@@ -62,4 +62,3 @@ def _squeezable(shape: List[int]) -> bool:`
`62`	`62`	`kwargs,`
`63`	`63`	`meta,`
`64`	`64`	`)`
`65`		`-`