[libc] Add Timing Utils for AMDGPU #96828

jameshu15869 · 2024-06-26T23:05:43Z

PR for adding AMDGPU timing utils for benchmarking.

I was not able to test this code since I do not have an AMD GPU, but I was able to successfully compile this code using -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo -DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to force the code to compile without having an AMD gpu on my machine.

@jhuber6

llvmbot · 2024-06-26T23:06:14Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libc

Author: None (jameshu15869)

Changes

PR for adding AMDGPU timing utils for benchmarking.

I was not able to test this code since I do not have an AMD GPU, but I was able to successfully compile this code using -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo -DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to force the code to compile without having an AMD gpu on my machine.

Full diff: https://github.com/llvm/llvm-project/pull/96828.diff

3 Files Affected:

(added) libc/benchmarks/gpu/timing/amdgpu/CMakeLists.txt (+7)
(added) libc/benchmarks/gpu/timing/amdgpu/timing.h (+73)
(modified) libc/benchmarks/gpu/timing/timing.h (+1-1)

diff --git a/libc/benchmarks/gpu/timing/amdgpu/CMakeLists.txt b/libc/benchmarks/gpu/timing/amdgpu/CMakeLists.txt
new file mode 100644
index 0000000000000..179429db9a09a
--- /dev/null
+++ b/libc/benchmarks/gpu/timing/amdgpu/CMakeLists.txt
@@ -0,0 +1,7 @@
+add_header_library(
+  amdgpu_timing
+  HDRS
+    timing.h
+  DEPENDS
+    libc.src.__support.common
+)
diff --git a/libc/benchmarks/gpu/timing/amdgpu/timing.h b/libc/benchmarks/gpu/timing/amdgpu/timing.h
new file mode 100644
index 0000000000000..3d13826ffee30
--- /dev/null
+++ b/libc/benchmarks/gpu/timing/amdgpu/timing.h
@@ -0,0 +1,73 @@
+//===------------- AMDGPU implementation of timing utils --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU
+#define LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/attributes.h"
+#include "src/__support/macros/config.h"
+
+#include <stdint.h>
+
+namespace LIBC_NAMESPACE {
+
+// Returns the overhead associated with calling the profiling region. This
+// allows us to substract the constant-time overhead from the latency to
+// obtain a true result. This can vary with system load.
+[[gnu::noinline]] static LIBC_INLINE uint64_t overhead() {
+  gpu::memory_fence();
+  uint64_t start = gpu::processor_clock();
+  uint32_t result = 0.0;
+  asm("v_or_b32 %[v_reg], 0, %[v_reg]\n" ::[v_reg] "v"(result) :);
+  asm("" ::"s"(start));
+  uint64_t stop = gpu::processor_clock();
+  return stop - start;
+}
+
+// Profile a simple function and obtain its latency in clock cycles on the
+// system. This function cannot be inlined or else it will disturb the very
+// delicate balance of hard-coded dependencies.
+template <typename F, typename T>
+[[gnu::noinline]] static LIBC_INLINE uint64_t latency(F f, T t) {
+  // We need to store the input somewhere to guarantee that the compiler will
+  // not constant propagate it and remove the profiling region.
+  volatile uint32_t storage = t;
+  float arg = storage;
+  asm("" ::"s"(arg));
+
+  // The AMDGPU architecture needs to wait on pending results.
+  gpu::memory_fence();
+  // Get the current timestamp from the clock.
+  uint64_t start = gpu::processor_clock();
+
+  // This forces the compiler to load the input argument and run the clock cycle
+  // counter before the profiling region.
+  asm("" ::"s"(arg), "s"(start));
+
+  // Run the function under test and return its value.
+  auto result = f(arg);
+
+  // This inline assembly performs a no-op which forces the result to both be
+  // used and prevents us from exiting this region before it's complete.
+  asm("v_or_b32 %[v_reg], 0, %[v_reg]\n" ::[v_reg] "v"(result) :);
+
+  // Obtain the current timestamp after running the calculation and force
+  // ordering.
+  uint64_t stop = gpu::processor_clock();
+  asm("" ::"s"(stop));
+  gpu::memory_fence();
+
+  // Return the time elapsed.
+  return stop - start;
+}
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU
diff --git a/libc/benchmarks/gpu/timing/timing.h b/libc/benchmarks/gpu/timing/timing.h
index 180ea77954ae5..2e098feb4b3a5 100644
--- a/libc/benchmarks/gpu/timing/timing.h
+++ b/libc/benchmarks/gpu/timing/timing.h
@@ -12,7 +12,7 @@
 #include "src/__support/macros/properties/architectures.h"
 
 #if defined(LIBC_TARGET_ARCH_IS_AMDGPU)
-#error "amdgpu not yet supported"
+#include "amdgpu/timing.h"
 #elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
 #include "nvptx/timing.h"
 #else

jhuber6 · 2024-06-27T00:43:00Z

libc/benchmarks/gpu/timing/amdgpu/timing.h

+  // We need to store the input somewhere to guarantee that the compiler will
+  // not constant propagate it and remove the profiling region.
+  volatile uint32_t storage = t;
+  float arg = storage;


This should be T, not float.

Also we need versions with more than one input, right.

How many do you think we need? I think I remember you mentioned 6 inputs at max - is that still correct?

Just a guess, for now at least two, we can add more as needed. Maybe I just don't know enough C++ magic to automate it since the inputs can be potentially different types. (Plus all that C++ magic needs to exist in the libc project)

Are there problems with trying to link when it's generic? I didn't notice before but trying to use T causes lld to throw errors such as error: couldn't allocate input reg for constraint 'r'(And for the s asm input constraint) when I switched from uint64_t to T

Is there a specific constraint that generics need to use, I tried a couple like v and Sg but lld still threw errors - am I missing a step that we need to work with generics since the size isn't known for sure?

Never use the r constraint. We should just error on it.

GCC went and added a bunch of constraints on their own. We don't support all of the ones there.

What do you mean by generics? The constraints don't care what the type is (other than maybe for the weird immediate cases), they just need to be scalar or vector. The SIMD analog doesn't actually work, these shouldn't be thought of as SIMD vectors vs. scalars

I guess the short answer is 99% of the time you want "v". There's definitely a lot of weirdness so it's not exactly, VGPRs are more like a big block of resource that can be assigned. I think you can even unassign a VGPR during execution? I should probably find a concise way to explain the magic of SIMT and hardware.

What do you mean by generics? The constraints don't care what the type is (other than maybe for the weird immediate cases), they just need to be scalar or vector. The SIMD analog doesn't actually work, these shouldn't be thought of as SIMD vectors vs. scalars

Ah, I was mixing this up with something else. I think I meant to ask if there are certain conditions that need to be met for these assembly constraints when using templates. lld throws an error that it can't allocate the input register for the v constraint for my templated local. Is it because the compiler doesn't know how much space the variable could take up?

It needs to be a legal, handled type which is about it

@jhuber6

PR for adding AMDGPU timing utils for benchmarking. I was not able to test this code since I do not have an AMD GPU, but I was able to successfully compile this code using -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo -DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to force the code to compile without having an AMD gpu on my machine. @jhuber6

add timing utils for amdgpu

0d27d54

llvmbot added backend:AMDGPU libc labels Jun 26, 2024

jhuber6 reviewed Jun 27, 2024

View reviewed changes

correctly store input arguments into registers

2a6f15d

jhuber6 approved these changes Jul 1, 2024

View reviewed changes

jhuber6 approved these changes Jul 10, 2024

View reviewed changes

jhuber6 merged commit eb66e31 into llvm:main Jul 10, 2024
6 checks passed

jameshu15869 deleted the amdgpu-profiling branch July 14, 2024 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc] Add Timing Utils for AMDGPU #96828

[libc] Add Timing Utils for AMDGPU #96828

Uh oh!

jameshu15869 commented Jun 26, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jun 26, 2024 •

edited

Loading

Uh oh!

jhuber6 Jun 27, 2024

Uh oh!

jhuber6 Jun 27, 2024

Uh oh!

jameshu15869 Jun 27, 2024

Uh oh!

jhuber6 Jun 27, 2024

Uh oh!

jameshu15869 Jun 27, 2024

Uh oh!

jameshu15869 Jun 27, 2024

Uh oh!

arsenm Jun 27, 2024

Uh oh!

jhuber6 Jun 27, 2024

Uh oh!

jameshu15869 Jun 28, 2024 •

edited

Loading

Uh oh!

arsenm Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

[libc] Add Timing Utils for AMDGPU #96828

[libc] Add Timing Utils for AMDGPU #96828

Uh oh!

Conversation

jameshu15869 commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameshu15869 Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jameshu15869 commented Jun 26, 2024 •

edited

Loading

llvmbot commented Jun 26, 2024 •

edited

Loading

jameshu15869 Jun 28, 2024 •

edited

Loading