Update on "[XNNPACK][Partitioner] SDPA Config"

mcr229 · mcr229 · commit aa58f62d0d34 · 2024-08-16T16:13:18.000-07:00
We add the SDPA Config here for partitioner. Currently there is an issue with SDPA when used from the FairSeq Multihead attention models, so I currently have it disabled for the base partitioner until we resolve that. Otherwise, for our tests, we can use the SDPA correctly from there. We have to track D60553559. Will follow up on this later. Differential Revision: [D60323285](https://our.internmc.facebook.com/intern/diff/D60323285/) [ghstack-poisoned]
diff --git a/backends/xnnpack/operators/op_sdpa.py b/backends/xnnpack/operators/op_sdpa.py
@@ -66,12 +66,12 @@ def define_node(
 
         # Hack to broadcast the scale
         q_shape = get_shape(get_input_node(node, 0))
-        C = q_shape[-1]
-        scale = 1 / (C**0.5)
+        embedding_dim = q_shape[-1]
+        scale = 1 / (embedding_dim**0.5)
         if "scale" in node.kwargs and node.kwargs["scale"]:
             scale = node.kwargs["scale"]
 
-        t = torch.full((C,), scale, dtype=mask_dtype)
+        t = torch.full((embedding_dim,), scale, dtype=mask_dtype)
         scale_node = self.get_fake_attr("scale", t)
         self.define_tensor(
             scale_node,