-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc] Add Timing Utils for AMDGPU #96828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
add_header_library( | ||
amdgpu_timing | ||
HDRS | ||
timing.h | ||
DEPENDS | ||
libc.src.__support.common | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
//===------------- AMDGPU implementation of timing utils --------*- C++ -*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU | ||
#define LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU | ||
|
||
#include "src/__support/GPU/utils.h" | ||
#include "src/__support/common.h" | ||
#include "src/__support/macros/attributes.h" | ||
#include "src/__support/macros/config.h" | ||
|
||
#include <stdint.h> | ||
|
||
namespace LIBC_NAMESPACE { | ||
|
||
// Returns the overhead associated with calling the profiling region. This | ||
// allows us to substract the constant-time overhead from the latency to | ||
// obtain a true result. This can vary with system load. | ||
[[gnu::noinline]] static LIBC_INLINE uint64_t overhead() { | ||
gpu::memory_fence(); | ||
uint64_t start = gpu::processor_clock(); | ||
uint32_t result = 0.0; | ||
asm("v_or_b32 %[v_reg], 0, %[v_reg]\n" ::[v_reg] "v"(result) :); | ||
asm("" ::"s"(start)); | ||
uint64_t stop = gpu::processor_clock(); | ||
return stop - start; | ||
} | ||
|
||
// Profile a simple function and obtain its latency in clock cycles on the | ||
// system. This function cannot be inlined or else it will disturb the very | ||
// delicate balance of hard-coded dependencies. | ||
template <typename F, typename T> | ||
[[gnu::noinline]] static LIBC_INLINE uint64_t latency(F f, T t) { | ||
// We need to store the input somewhere to guarantee that the compiler will | ||
// not constant propagate it and remove the profiling region. | ||
volatile uint32_t storage = t; | ||
float arg = storage; | ||
asm("" ::"s"(arg)); | ||
|
||
// The AMDGPU architecture needs to wait on pending results. | ||
gpu::memory_fence(); | ||
// Get the current timestamp from the clock. | ||
uint64_t start = gpu::processor_clock(); | ||
|
||
// This forces the compiler to load the input argument and run the clock cycle | ||
// counter before the profiling region. | ||
asm("" ::"s"(arg), "s"(start)); | ||
|
||
// Run the function under test and return its value. | ||
auto result = f(arg); | ||
|
||
// This inline assembly performs a no-op which forces the result to both be | ||
// used and prevents us from exiting this region before it's complete. | ||
asm("v_or_b32 %[v_reg], 0, %[v_reg]\n" ::[v_reg] "v"(result) :); | ||
|
||
// Obtain the current timestamp after running the calculation and force | ||
// ordering. | ||
uint64_t stop = gpu::processor_clock(); | ||
asm("" ::"s"(stop)); | ||
gpu::memory_fence(); | ||
|
||
// Return the time elapsed. | ||
return stop - start; | ||
} | ||
|
||
} // namespace LIBC_NAMESPACE | ||
|
||
#endif // LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
T
, notfloat
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need versions with more than one input, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many do you think we need? I think I remember you mentioned 6 inputs at max - is that still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a guess, for now at least two, we can add more as needed. Maybe I just don't know enough C++ magic to automate it since the inputs can be potentially different types. (Plus all that C++ magic needs to exist in the
libc
project)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there problems with trying to link when it's generic? I didn't notice before but trying to use
T
causeslld
to throw errors such aserror: couldn't allocate input reg for constraint 'r'
(And for thes
asm input constraint) when I switched fromuint64_t
toT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific constraint that generics need to use, I tried a couple like
v
andSg
butlld
still threw errors - am I missing a step that we need to work with generics since the size isn't known for sure?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never use the r constraint. We should just error on it.
GCC went and added a bunch of constraints on their own. We don't support all of the ones there.
What do you mean by generics? The constraints don't care what the type is (other than maybe for the weird immediate cases), they just need to be scalar or vector. The SIMD analog doesn't actually work, these shouldn't be thought of as SIMD vectors vs. scalars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the short answer is 99% of the time you want
"v"
. There's definitely a lot of weirdness so it's not exactly, VGPRs are more like a big block of resource that can be assigned. I think you can even unassign a VGPR during execution? I should probably find a concise way to explain the magic of SIMT and hardware.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was mixing this up with something else. I think I meant to ask if there are certain conditions that need to be met for these assembly constraints when using templates.
lld
throws an error that it can't allocate the input register for thev
constraint for my templated local. Is it because the compiler doesn't know how much space the variable could take up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be a legal, handled type which is about it