Resolve recurring errors where query is c10::Half and key and value float

Michael Gschwind · facebook-github-bot · commit 1d74652e6125 · 2024-02-25T20:07:24.000-08:00
Summary:
Resolve recurring errors where query is c10::Half and key and value float.  This should ideally work from first principles, but somehow it does not.

We need to fix this but in the meantime this ugly have will enable us to proceed and allow others to debug other aspects of ET lowering.

Reviewed By: mavlyutovr

Differential Revision: D54167581

fbshipit-source-id: 6cc4e76e3abbf107014b5b9da00e817ee3b2ab03
diff --git a/examples/models/llama2/model.py b/examples/models/llama2/model.py
@@ -262,6 +262,10 @@ def forward(
         # tensor will be 2-dimensional, regarldess of the values of L & S
         mask = torch.squeeze(mask, [0, 1])
 
+        # FIXME: This should be so automatically! MKG
+        keys = keys.to(dtype=xq.dtype)
+        values = values.to(dtype=xq.dtype)
+
         output = F.scaled_dot_product_attention(
             xq, keys, values, attn_mask=mask, dropout_p=0.0
         )