Skip to content

TOSA: Fold clamp after cast if dynamic range allows #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

@mgehre-amd mgehre-amd requested a review from TinaAMD January 10, 2024 21:35
Copy link

@TinaAMD TinaAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very narrow, but looks fine to me.

OpFoldResult ClampOp::fold(FoldAdaptor adaptor) {
// TODO: This can generalize to any cast (or other operation)
// where the output values are within a computable range.
if (auto cast = llvm::dyn_cast_or_null<CastOp>(getInput().getDefiningOp())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment on why this fold makes sense, something about the range being wider than the type width we just casted from.

// TODO: This can generalize to any cast (or other operation)
// where the output values are within a computable range.
if (auto cast = llvm::dyn_cast_or_null<CastOp>(getInput().getDefiningOp())) {
if (cast.getType().getElementType().isF32() &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allow all of {fp16_t, bf16_t, fp32_t}?

@mgehre-amd mgehre-amd closed this Feb 28, 2024
mgehre-amd pushed a commit that referenced this pull request Mar 11, 2024
This PR adds support for thread names in lldb on Windows.

```
(lldb) thr list
Process 2960 stopped
  thread #53: tid = 0x03a0, 0x00007ff84582db34 ntdll.dll`NtWaitForMultipleObjects + 20
  thread #29: tid = 0x04ec, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'SPUW.6'
  thread #89: tid = 0x057c, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1000019] physics[main]'
  thread #3: tid = 0x0648, 0x00007ff843c2cafe combase.dll`InternalDoATClassCreate + 39518
  thread #93: tid = 0x0688, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x100501d] uMovie::StreamingThread'
  thread #1: tid = 0x087c, 0x00007ff842e7a104 win32u.dll`NtUserMsgWaitForMultipleObjectsEx + 20
  thread #96: tid = 0x0890, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1002020] HLE Video Decoder'
<...>
```
@mgehre-amd mgehre-amd deleted the matthias.fold_cast_clamp branch September 3, 2024 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants