Fix BinaryOp broadcasting for packed dim #2653

jorgep31415 · 2024-03-25T17:36:10Z

Summary:
As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between sizes() vs gpu_sizes() once again! These concepts are explained in #2520

We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements.

Example

Given inputs torch.ones(2, 3) and torch.ones(2, 1) and GPUMemoryLayout::WIDTH_PACKED, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements.

torch.ones(2, 3)

(2, 3) = (H, W) = sizes
[[1 1 1]
 [1 1 1]]
-> (W, H) = (3, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 1 1 0] [1 1 1 0]

torch.ones(2, 1)

(2, 1)  = (H, W) = sizes
[[1]
 [1]]
-> (W, H) = (1, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 0 0 0] [1 0 0 0]
-> (broadcast from this change)
[1 1 1 1] [1 1 1 1]

torch.ones(2, 3) + torch.ones(2, 1)

Ignore the final element of each texel as it's just padding we never read.

No broadcast:
[1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0]

Broadcast:
[1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1]

Cleanup

Remove unneeded check_broadcastable() since this is caught earlier in the PyTorch compiler pipeline. For example, torch.ones(2, 3) + torch.ones(2, 2) triggers this error:

TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}):
Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3]

Differential Revision: D55278527

pytorch-bot · 2024-03-25T17:36:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2653

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0d2f6f7 with merge base a3bf63b ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos (buck2) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-03-25T17:36:17Z

This pull request was exported from Phabricator. Differential Revision: D55278527

facebook-github-bot · 2024-03-25T22:40:31Z

This pull request was exported from Phabricator. Differential Revision: D55278527

Summary: Pull Request resolved: pytorch#2653 As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520 We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements. # Example Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements. ## torch.ones(2, 3) ``` (2, 3) = (H, W) = sizes [[1 1 1] [1 1 1]] -> (W, H) = (3, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 1 1 0] [1 1 1 0] ``` ## torch.ones(2, 1) ``` (2, 1) = (H, W) = sizes [[1] [1]] -> (W, H) = (1, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 0 0 0] [1 0 0 0] -> (broadcast from this change) [1 1 1 1] [1 1 1 1] ``` ## torch.ones(2, 3) + torch.ones(2, 1) Ignore the final element of each texel as it's just padding we never read. ``` No broadcast: [1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0] Broadcast: [1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1] ``` # Cleanup Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error: ``` TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}): Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3] ``` Differential Revision: D55278527

Summary: As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520 We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements. # Example Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements. ## torch.ones(2, 3) ``` (2, 3) = (H, W) = sizes [[1 1 1] [1 1 1]] -> (W, H) = (3, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 1 1 0] [1 1 1 0] ``` ## torch.ones(2, 1) ``` (2, 1) = (H, W) = sizes [[1] [1]] -> (W, H) = (1, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 0 0 0] [1 0 0 0] -> (broadcast from this change) [1 1 1 1] [1 1 1 1] ``` ## torch.ones(2, 3) + torch.ones(2, 1) Ignore the final element of each texel as it's just padding we never read. ``` No broadcast: [1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0] Broadcast: [1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1] ``` # Cleanup Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error: ``` TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}): Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3] ``` Differential Revision: D55278527

facebook-github-bot · 2024-03-25T23:14:43Z

This pull request was exported from Phabricator. Differential Revision: D55278527

Summary: Pull Request resolved: pytorch#2653 As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520 We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements. # Example Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements. ## torch.ones(2, 3) ``` (2, 3) = (H, W) = sizes [[1 1 1] [1 1 1]] -> (W, H) = (3, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 1 1 0] [1 1 1 0] ``` ## torch.ones(2, 1) ``` (2, 1) = (H, W) = sizes [[1] [1]] -> (W, H) = (1, 2) → (4, 2) = gpu_sizes -> extents = (1, 2) [1 0 0 0] [1 0 0 0] -> (broadcast from this change) [1 1 1 1] [1 1 1 1] ``` ## torch.ones(2, 3) + torch.ones(2, 1) Ignore the final element of each texel as it's just padding we never read. ``` No broadcast: [1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0] Broadcast: [1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1] ``` # Cleanup Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error: ``` TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}): Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3] ``` Differential Revision: D55278527

facebook-github-bot · 2024-03-25T23:18:36Z

This pull request was exported from Phabricator. Differential Revision: D55278527

facebook-github-bot · 2024-03-27T22:52:54Z

This pull request has been merged in 25c5b67.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2024

facebook-github-bot added the fb-exported label Mar 25, 2024

jorgep31415 force-pushed the export-D55278527 branch from 1f16f5e to 482476b Compare March 25, 2024 22:40

jorgep31415 force-pushed the export-D55278527 branch from 482476b to 4794269 Compare March 25, 2024 23:14

jorgep31415 force-pushed the export-D55278527 branch from 4794269 to 0d2f6f7 Compare March 25, 2024 23:18

SS-JIA self-requested a review March 27, 2024 22:13

SS-JIA approved these changes Mar 27, 2024

View reviewed changes

facebook-github-bot closed this in 25c5b67 Mar 27, 2024

facebook-github-bot added the Merged label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix BinaryOp broadcasting for packed dim #2653

Fix BinaryOp broadcasting for packed dim #2653

Uh oh!

jorgep31415 commented Mar 25, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 25, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 27, 2024

Uh oh!

Uh oh!

Fix BinaryOp broadcasting for packed dim #2653

Fix BinaryOp broadcasting for packed dim #2653

Uh oh!

Conversation

jorgep31415 commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

torch.ones(2, 3)

torch.ones(2, 1)

torch.ones(2, 3) + torch.ones(2, 1)

Cleanup

Uh oh!

pytorch-bot bot commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2653

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 25, 2024

Uh oh!

facebook-github-bot commented Mar 27, 2024

Uh oh!

Uh oh!

jorgep31415 commented Mar 25, 2024 •

edited

Loading

pytorch-bot bot commented Mar 25, 2024 •

edited

Loading