Skip to content

Fix BinaryOp broadcasting for packed dim #2653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jorgep31415
Copy link
Contributor

@jorgep31415 jorgep31415 commented Mar 25, 2024

Summary:
As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between sizes() vs gpu_sizes() once again! These concepts are explained in #2520

We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements.

Example

Given inputs torch.ones(2, 3) and torch.ones(2, 1) and GPUMemoryLayout::WIDTH_PACKED, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements.

torch.ones(2, 3)

(2, 3) = (H, W) = sizes
[[1 1 1]
 [1 1 1]]
-> (W, H) = (3, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 1 1 0] [1 1 1 0]

torch.ones(2, 1)

(2, 1)  = (H, W) = sizes
[[1]
 [1]]
-> (W, H) = (1, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 0 0 0] [1 0 0 0]
-> (broadcast from this change)
[1 1 1 1] [1 1 1 1]

torch.ones(2, 3) + torch.ones(2, 1)

Ignore the final element of each texel as it's just padding we never read.

No broadcast:
[1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0]

Broadcast:
[1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1]

Cleanup

Remove unneeded check_broadcastable() since this is caught earlier in the PyTorch compiler pipeline. For example, torch.ones(2, 3) + torch.ones(2, 2) triggers this error:

TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}):
Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3]

Differential Revision: D55278527

Copy link

pytorch-bot bot commented Mar 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2653

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0d2f6f7 with merge base a3bf63b (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55278527

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55278527

jorgep31415 added a commit to jorgep31415/executorch that referenced this pull request Mar 25, 2024
Summary:
Pull Request resolved: pytorch#2653

As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520

We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements.

# Example

Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements.

## torch.ones(2, 3)
```
(2, 3) = (H, W) = sizes
[[1 1 1]
 [1 1 1]]
-> (W, H) = (3, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 1 1 0] [1 1 1 0]
```

## torch.ones(2, 1)
```
(2, 1)  = (H, W) = sizes
[[1]
 [1]]
-> (W, H) = (1, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 0 0 0] [1 0 0 0]
-> (broadcast from this change)
[1 1 1 1] [1 1 1 1]
```

## torch.ones(2, 3) + torch.ones(2, 1)
Ignore the final element of each texel as it's just padding we never read.
```
No broadcast:
[1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0]

Broadcast:
[1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1]
```

# Cleanup

Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error:
```
TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}):
Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3]
```

Differential Revision: D55278527
jorgep31415 added a commit to jorgep31415/executorch that referenced this pull request Mar 25, 2024
Summary:

As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520

We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements.

# Example

Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements.

## torch.ones(2, 3)
```
(2, 3) = (H, W) = sizes
[[1 1 1]
 [1 1 1]]
-> (W, H) = (3, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 1 1 0] [1 1 1 0]
```

## torch.ones(2, 1)
```
(2, 1)  = (H, W) = sizes
[[1]
 [1]]
-> (W, H) = (1, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 0 0 0] [1 0 0 0]
-> (broadcast from this change)
[1 1 1 1] [1 1 1 1]
```

## torch.ones(2, 3) + torch.ones(2, 1)
Ignore the final element of each texel as it's just padding we never read.
```
No broadcast:
[1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0]

Broadcast:
[1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1]
```

# Cleanup

Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error:
```
TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}):
Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3]
```

Differential Revision: D55278527
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55278527

Summary:
Pull Request resolved: pytorch#2653

As copyrightly pointed out, broadcasting was not working properly for the example below. I root caused the to confusion between `sizes()` vs `gpu_sizes()` once again! These concepts are explained in pytorch#2520

We should use the CPU size, not the GPU size to detect when we should broadcast across the packed-dim texel's elements.

# Example

Given inputs `torch.ones(2, 3)` and `torch.ones(2, 1)` and `GPUMemoryLayout::WIDTH_PACKED`, we have CPU widths 3 and 1, respectively. These are aligned up to GPU widths 4 and 4, and hence we were failing to broadcast along the packed-dim texel's elements.

## torch.ones(2, 3)
```
(2, 3) = (H, W) = sizes
[[1 1 1]
 [1 1 1]]
-> (W, H) = (3, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 1 1 0] [1 1 1 0]
```

## torch.ones(2, 1)
```
(2, 1)  = (H, W) = sizes
[[1]
 [1]]
-> (W, H) = (1, 2) → (4, 2) = gpu_sizes
-> extents = (1, 2)
[1 0 0 0] [1 0 0 0]
-> (broadcast from this change)
[1 1 1 1] [1 1 1 1]
```

## torch.ones(2, 3) + torch.ones(2, 1)
Ignore the final element of each texel as it's just padding we never read.
```
No broadcast:
[1 1 1 0] [1 1 1 0] + [1 0 0 0] [1 0 0 0] = [2 1 1 0] [2 1 1 0]

Broadcast:
[1 1 1 0] [1 1 1 0] + [1 1 1 1] [1 1 1 1] = [2 2 2 1] [2 2 2 1]
```

# Cleanup

Remove unneeded `check_broadcastable()` since this is caught earlier in the PyTorch compiler pipeline. For example, `torch.ones(2, 3) + torch.ones(2, 2)` triggers this error:
```
TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., size=(2, 3)), FakeTensor(..., size=(2, 2))), **{}):
Attempting to broadcast a dimension of length 2 at -1! Mismatching argument at index 1 had torch.Size([2, 2]); but expected shape should be broadcastable to [2, 3]
```

Differential Revision: D55278527
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55278527

@SS-JIA SS-JIA self-requested a review March 27, 2024 22:13
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 25c5b67.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants