[Offload] Allow to record kernel launch stack traces #100472

jdoerfert · 2024-07-24T21:34:24Z

Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print <NUM>, set via
OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For trap errors, we print the actual kernel name, and trace if
recorded.

llvmbot · 2024-07-24T21:34:58Z

@llvm/pr-subscribers-offload

Author: Johannes Doerfert (jdoerfert)

Changes

Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print <NUM>, set via
OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For trap errors, we print the actual kernel name, and trace if
recorded.

Patch is 48.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100472.diff

19 Files Affected:

(modified) offload/include/Shared/EnvironmentVar.h (+5-1)
(modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h (+1)
(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+66-11)
(added) offload/plugins-nextgen/common/include/ErrorReporting.h (+293)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+87)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+53)
(modified) offload/src/omptarget.cpp (+3-1)
(added) offload/test/sanitizer/double_free.c (+68)
(added) offload/test/sanitizer/free_host_ptr.c (+25)
(added) offload/test/sanitizer/free_wrong_ptr_kind.c (+35)
(added) offload/test/sanitizer/free_wrong_ptr_kind.cpp (+38)
(added) offload/test/sanitizer/kernel_crash.c (+44)
(added) offload/test/sanitizer/kernel_crash_async.c (+38)
(added) offload/test/sanitizer/kernel_crash_many.c (+70)
(added) offload/test/sanitizer/kernel_crash_single.c (+34)
(added) offload/test/sanitizer/kernel_trap.c (+39)
(added) offload/test/sanitizer/kernel_trap_async.c (+38)
(added) offload/test/sanitizer/kernel_trap_many.c (+33)
(modified) openmp/docs/design/Runtimes.rst (+14)

diff --git a/offload/include/Shared/EnvironmentVar.h b/offload/include/Shared/EnvironmentVar.h
index 4cbdad695a0ee..82f434e91a85b 100644
--- a/offload/include/Shared/EnvironmentVar.h
+++ b/offload/include/Shared/EnvironmentVar.h
@@ -28,6 +28,7 @@ struct StringParser {
 /// Class for reading and checking environment variables. Currently working with
 /// integer, floats, std::string and bool types.
 template <typename Ty> class Envar {
+  llvm::StringRef Name;
   Ty Data;
   bool IsPresent;
   bool Initialized;
@@ -53,7 +54,7 @@ template <typename Ty> class Envar {
   /// take the value read from the environment variable, or the default if it
   /// was not set or not correct. This constructor is not fallible.
   Envar(llvm::StringRef Name, Ty Default = Ty())
-      : Data(Default), IsPresent(false), Initialized(true) {
+      : Name(Name), Data(Default), IsPresent(false), Initialized(true) {
 
     if (const char *EnvStr = getenv(Name.data())) {
       // Check whether the envar is defined and valid.
@@ -84,6 +85,9 @@ template <typename Ty> class Envar {
   /// Get the definitive value.
   operator Ty() const { return get(); }
 
+  /// Return the environment variable name.
+  llvm::StringRef getName() const { return Name; }
+
   /// Indicate whether the environment variable was defined and valid.
   bool isPresent() const { return IsPresent; }
 
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
index 64a1d3308aed0..5d9fb5d7dc7cd 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
@@ -31,6 +31,7 @@ typedef enum {
   HSA_STATUS_ERROR = 0x1000,
   HSA_STATUS_ERROR_INVALID_CODE_OBJECT = 0x1010,
   HSA_STATUS_ERROR_NOT_INITIALIZED = 0x100B,
+  HSA_STATUS_ERROR_EXCEPTION = 0x1016,
 } hsa_status_t;
 
 hsa_status_t hsa_status_string(hsa_status_t status, const char **status_string);
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index e6643d3260eb4..371ac71d6defe 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -13,13 +13,16 @@
 #include <atomic>
 #include <cassert>
 #include <cstddef>
+#include <cstdint>
 #include <deque>
+#include <functional>
 #include <mutex>
 #include <string>
 #include <system_error>
 #include <unistd.h>
 #include <unordered_map>
 
+#include "ErrorReporting.h"
 #include "Shared/APITypes.h"
 #include "Shared/Debug.h"
 #include "Shared/Environment.h"
@@ -43,6 +46,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
 #include "llvm/Support/raw_ostream.h"
 
 #if !defined(__BYTE_ORDER__) || !defined(__ORDER_LITTLE_ENDIAN__) ||           \
@@ -685,12 +689,12 @@ struct AMDGPUQueueTy {
   AMDGPUQueueTy() : Queue(nullptr), Mutex(), NumUsers(0) {}
 
   /// Lazily initialize a new queue belonging to a specific agent.
-  Error init(hsa_agent_t Agent, int32_t QueueSize) {
+  Error init(GenericDeviceTy &Device, hsa_agent_t Agent, int32_t QueueSize) {
     if (Queue)
       return Plugin::success();
     hsa_status_t Status =
         hsa_queue_create(Agent, QueueSize, HSA_QUEUE_TYPE_MULTI, callbackError,
-                         nullptr, UINT32_MAX, UINT32_MAX, &Queue);
+                         &Device, UINT32_MAX, UINT32_MAX, &Queue);
     return Plugin::check(Status, "Error in hsa_queue_create: %s");
   }
 
@@ -875,10 +879,8 @@ struct AMDGPUQueueTy {
   }
 
   /// Callack that will be called when an error is detected on the HSA queue.
-  static void callbackError(hsa_status_t Status, hsa_queue_t *Source, void *) {
-    auto Err = Plugin::check(Status, "Received error in queue %p: %s", Source);
-    FATAL_MESSAGE(1, "%s", toString(std::move(Err)).data());
-  }
+  static void callbackError(hsa_status_t Status, hsa_queue_t *Source,
+                            void *Data);
 
   /// The HSA queue.
   hsa_queue_t *Queue;
@@ -1214,6 +1216,9 @@ struct AMDGPUStreamTy {
   /// Deinitialize the stream's signals.
   Error deinit() { return Plugin::success(); }
 
+  /// Return the associated (device) agent.
+  hsa_agent_t getAgent() const { return Agent; }
+
   /// Attach an RPC server to this stream.
   void setRPCServer(RPCServerTy *Server) { RPCServer = Server; }
 
@@ -1484,6 +1489,8 @@ struct AMDGPUStreamTy {
     return true;
   }
 
+  const AMDGPUQueueTy *getQueue() const { return Queue; }
+
   /// Record the state of the stream on an event.
   Error recordEvent(AMDGPUEventTy &Event) const;
 
@@ -1594,7 +1601,7 @@ struct AMDGPUStreamManagerTy final
   using ResourcePoolTy = GenericDeviceResourceManagerTy<ResourceRef>;
 
   AMDGPUStreamManagerTy(GenericDeviceTy &Device, hsa_agent_t HSAAgent)
-      : GenericDeviceResourceManagerTy(Device),
+      : GenericDeviceResourceManagerTy(Device), Device(Device),
         OMPX_QueueTracking("LIBOMPTARGET_AMDGPU_HSA_QUEUE_BUSY_TRACKING", true),
         NextQueue(0), Agent(HSAAgent) {}
 
@@ -1603,7 +1610,7 @@ struct AMDGPUStreamManagerTy final
     QueueSize = HSAQueueSize;
     MaxNumQueues = NumHSAQueues;
     // Initialize one queue eagerly
-    if (auto Err = Queues.front().init(Agent, QueueSize))
+    if (auto Err = Queues.front().init(Device, Agent, QueueSize))
       return Err;
 
     return GenericDeviceResourceManagerTy::init(InitialSize);
@@ -1660,7 +1667,7 @@ struct AMDGPUStreamManagerTy final
     }
 
     // Make sure the queue is initialized, then add user & assign.
-    if (auto Err = Queues[Index].init(Agent, QueueSize))
+    if (auto Err = Queues[Index].init(Device, Agent, QueueSize))
       return Err;
     Queues[Index].addUser();
     Stream->Queue = &Queues[Index];
@@ -1668,6 +1675,9 @@ struct AMDGPUStreamManagerTy final
     return Plugin::success();
   }
 
+  /// The device associated with this stream.
+  GenericDeviceTy &Device;
+
   /// Envar for controlling the tracking of busy HSA queues.
   BoolEnvar OMPX_QueueTracking;
 
@@ -3074,7 +3084,7 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
     Initialized = true;
 
     // Register event handler to detect memory errors on the devices.
-    Status = hsa_amd_register_system_event_handler(eventHandler, nullptr);
+    Status = hsa_amd_register_system_event_handler(eventHandler, this);
     if (auto Err = Plugin::check(
             Status, "Error in hsa_amd_register_system_event_handler: %s"))
       return std::move(Err);
@@ -3209,7 +3219,8 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
 
 private:
   /// Event handler that will be called by ROCr if an event is detected.
-  static hsa_status_t eventHandler(const hsa_amd_event_t *Event, void *) {
+  static hsa_status_t eventHandler(const hsa_amd_event_t *Event,
+                                   void *PluginPtr) {
     if (Event->event_type != HSA_AMD_GPU_MEMORY_FAULT_EVENT)
       return HSA_STATUS_SUCCESS;
 
@@ -3240,6 +3251,26 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
     uint32_t Node = -1;
     hsa_agent_get_info(Event->memory_fault.agent, HSA_AGENT_INFO_NODE, &Node);
 
+    AMDGPUPluginTy &Plugin = *reinterpret_cast<AMDGPUPluginTy *>(PluginPtr);
+    for (uint32_t I = 0, E = Plugin.getNumDevices();
+         Node != uint32_t(-1) && I < E; ++I) {
+      AMDGPUDeviceTy &AMDGPUDevice =
+          reinterpret_cast<AMDGPUDeviceTy &>(Plugin.getDevice(I));
+      auto KernelTraceInfoRecord =
+          AMDGPUDevice.KernelLaunchTraces.getExclusiveAccessor();
+
+      uint32_t DeviceNode = -1;
+      if (auto Err =
+              AMDGPUDevice.getDeviceAttr(HSA_AGENT_INFO_NODE, DeviceNode)) {
+        consumeError(std::move(Err));
+        continue;
+      }
+      if (DeviceNode != Node)
+        continue;
+
+      ErrorReporter::reportKernelTraces(AMDGPUDevice, *KernelTraceInfoRecord);
+    }
+
     // Abort the execution since we do not recover from this error.
     FATAL_MESSAGE(1,
                   "Memory access fault by GPU %" PRIu32 " (agent 0x%" PRIx64
@@ -3480,6 +3511,30 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
   return Alloc;
 }
 
+void AMDGPUQueueTy::callbackError(hsa_status_t Status, hsa_queue_t *Source,
+                                  void *Data) {
+  auto &AMDGPUDevice = *reinterpret_cast<AMDGPUDeviceTy *>(Data);
+
+  if (Status == HSA_STATUS_ERROR_EXCEPTION) {
+    auto KernelTraceInfoRecord =
+        AMDGPUDevice.KernelLaunchTraces.getExclusiveAccessor();
+    std::function<bool(__tgt_async_info &)> AsyncInfoWrapperMatcher =
+        [=](__tgt_async_info &AsyncInfo) {
+          auto *Stream = reinterpret_cast<AMDGPUStreamTy *>(AsyncInfo.Queue);
+          if (!Stream)
+            return false;
+          if (!Stream->getQueue())
+            return false;
+          return Stream->getQueue()->Queue == Source;
+        };
+    ErrorReporter::reportTrapInKernel(AMDGPUDevice, *KernelTraceInfoRecord,
+                                      AsyncInfoWrapperMatcher);
+  }
+
+  auto Err = Plugin::check(Status, "Received error in queue %p: %s", Source);
+  FATAL_MESSAGE(1, "%s", toString(std::move(Err)).data());
+}
+
 } // namespace plugin
 } // namespace target
 } // namespace omp
diff --git a/offload/plugins-nextgen/common/include/ErrorReporting.h b/offload/plugins-nextgen/common/include/ErrorReporting.h
new file mode 100644
index 0000000000000..f94b8e9c35997
--- /dev/null
+++ b/offload/plugins-nextgen/common/include/ErrorReporting.h
@@ -0,0 +1,293 @@
+//===- ErrorReporting.h - Helper to provide nice error messages ----- c++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
+#define OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
+
+#include "PluginInterface.h"
+#include "Shared/EnvironmentVar.h"
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/ErrorHandling.h"
+
+#include <cstdint>
+#include <cstdio>
+#include <cstdlib>
+#include <functional>
+#include <optional>
+#include <string>
+
+namespace llvm {
+namespace omp {
+namespace target {
+namespace plugin {
+
+class ErrorReporter {
+  /// The banner printed at the beginning of an error report.
+  static constexpr auto ErrorBanner = "OFFLOAD ERROR: ";
+
+  /// Terminal color codes
+  ///
+  /// TODO: determine if the terminal supports colors.
+  ///@{
+  static constexpr auto Green = []() { return "\033[1m\033[32m"; };
+  static constexpr auto Blue = []() { return "\033[1m\033[34m"; };
+  static constexpr auto Red = []() { return "\033[1m\033[31m"; };
+  static constexpr auto Magenta = []() { return "\033[1m\033[35m"; };
+  static constexpr auto Cyan = []() { return "\033[1m\033[36m"; };
+  static constexpr auto Default = []() { return "\033[1m\033[0m"; };
+  ///@}
+
+  /// The size of the getBuffer() buffer.
+  static constexpr unsigned BufferSize = 1024;
+
+  /// Return a buffer of size BufferSize that can be used for formatting.
+  static char *getBuffer() {
+    static char *Buffer = nullptr;
+    if (!Buffer)
+      Buffer = reinterpret_cast<char *>(malloc(BufferSize));
+    return Buffer;
+  }
+
+  /// Return the device id as string, or n/a if not available.
+  static std::string getDeviceIdStr(GenericDeviceTy *Device) {
+    return Device ? std::to_string(Device->getDeviceId()) : "n/a";
+  }
+
+  /// Return a nice name for an TargetAllocTy.
+  static std::string getAllocTyName(TargetAllocTy Kind) {
+    switch (Kind) {
+    case TARGET_ALLOC_DEVICE_NON_BLOCKING:
+    case TARGET_ALLOC_DEFAULT:
+    case TARGET_ALLOC_DEVICE:
+      return "device memory";
+    case TARGET_ALLOC_HOST:
+      return "pinned host memory";
+    case TARGET_ALLOC_SHARED:
+      return "managed memory";
+      break;
+    }
+    llvm_unreachable("Unknown target alloc kind");
+  }
+
+  /// Return a C string after \p Format has been instantiated with \p Args.
+  template <typename... ArgsTy>
+  static const char *getCString(const char *Format, ArgsTy &&...Args) {
+    std::snprintf(getBuffer(), BufferSize, Format,
+                  std::forward<ArgsTy>(Args)...);
+    return getBuffer();
+  }
+
+  /// Print \p Format, instantiated with \p Args to stderr.
+  /// TODO: Allow redirection into a file stream.
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wgcc-compat"
+#pragma clang diagnostic ignored "-Wformat-security"
+  template <typename... ArgsTy>
+  [[gnu::format(__printf__, 1, 2)]] static void print(const char *Format,
+                                                      ArgsTy &&...Args) {
+    fprintf(stderr, Format, std::forward<ArgsTy>(Args)...);
+  }
+
+  /// Report an error.
+  template <typename... ArgsTy>
+  [[gnu::format(__printf__, 1, 2)]] static void reportError(const char *Format,
+                                                            ArgsTy &&...Args) {
+    print(getCString("%s%s%s\n%s", Red(), ErrorBanner, Format, Default()),
+          Args...);
+  }
+#pragma clang diagnostic pop
+
+  /// Pretty print a stack trace.
+  static void reportStackTrace(StringRef StackTrace) {
+    if (StackTrace.empty())
+      return;
+
+    SmallVector<StringRef> Lines, Parts;
+    StackTrace.split(Lines, "\n", /*MaxSplit=*/-1, /*KeepEmpty=*/false);
+    int Start = Lines.empty() || !Lines[0].contains("PrintStackTrace") ? 0 : 1;
+    for (int I = Start, E = Lines.size(); I < E; ++I) {
+      auto Line = Lines[I];
+      Parts.clear();
+      Line = Line.drop_while([](char C) { return std::isspace(C); });
+      Line.split(Parts, " ", /*MaxSplit=*/2);
+      if (Parts.size() != 3 || Parts[0].size() < 2 || Parts[0][0] != '#') {
+        print("%s\n", Line.str().c_str());
+        continue;
+      }
+      unsigned FrameIdx = std::stoi(Parts[0].drop_front(1).str());
+      if (Start)
+        FrameIdx -= 1;
+      print("    %s%s%s%u %s%s%s %s\n", Magenta(),
+            Parts[0].take_front().str().c_str(), Green(), FrameIdx, Blue(),
+            Parts[1].str().c_str(), Default(), Parts[2].str().c_str());
+    }
+
+    printf("\n");
+  }
+
+  /// Report information about an allocation associated with \p ATI.
+  static void reportAllocationInfo(AllocationTraceInfoTy *ATI) {
+    if (!ATI)
+      return;
+
+    if (!ATI->DeallocationTrace.empty()) {
+      print("%s%s\n%s", Cyan(), "Last deallocation:", Default());
+      reportStackTrace(ATI->DeallocationTrace);
+    }
+
+    if (ATI->HostPtr)
+      print("%sLast allocation of size %lu for host pointer %p:\n%s", Cyan(),
+            ATI->Size, ATI->HostPtr, Default());
+    else
+      print("%sLast allocation of size %lu:\n%s", Cyan(), ATI->Size, Default());
+    reportStackTrace(ATI->AllocationTrace);
+    if (!ATI->LastAllocationInfo)
+      return;
+
+    unsigned I = 0;
+    print("%sPrior allocations with the same base pointer:", Cyan());
+    while (ATI->LastAllocationInfo) {
+      print("\n%s", Default());
+      ATI = ATI->LastAllocationInfo;
+      print("%s #%u Prior deallocation of size %lu:\n%s", Cyan(), I, ATI->Size,
+            Default());
+      reportStackTrace(ATI->DeallocationTrace);
+      if (ATI->HostPtr)
+        print("%s #%u Prior allocation for host pointer %p:\n%s", Cyan(), I,
+              ATI->HostPtr, Default());
+      else
+        print("%s #%u Prior allocation:\n%s", Cyan(), I, Default());
+      reportStackTrace(ATI->AllocationTrace);
+      ++I;
+    }
+  }
+
+public:
+  /// Check if the deallocation of \p DevicePtr is valid given \p ATI. Stores \p
+  /// StackTrace to \p ATI->DeallocationTrace if there was no error.
+  static void checkDeallocation(GenericDeviceTy *Device, void *DevicePtr,
+                                TargetAllocTy Kind, AllocationTraceInfoTy *ATI,
+                                std::string &StackTrace) {
+#define DEALLOCATION_ERROR(Format, ...)                                        \
+  reportError(Format, __VA_ARGS__);                                            \
+  reportStackTrace(StackTrace);                                                \
+  reportAllocationInfo(ATI);                                                   \
+  abort();
+
+    if (!ATI) {
+      DEALLOCATION_ERROR("deallocation of non-allocated %s: %p",
+                         getAllocTyName(Kind).c_str(), DevicePtr);
+    }
+
+    if (!ATI->DeallocationTrace.empty()) {
+      DEALLOCATION_ERROR("double-free of %s: %p", getAllocTyName(Kind).c_str(),
+                         DevicePtr);
+    }
+
+    if (ATI->Kind != Kind) {
+      DEALLOCATION_ERROR("deallocation requires %s but allocation was %s: %p",
+                         getAllocTyName(Kind).c_str(),
+                         getAllocTyName(ATI->Kind).c_str(), DevicePtr);
+    }
+
+    ATI->DeallocationTrace = StackTrace;
+
+#undef DEALLOCATION_ERROR
+  }
+
+  /// Report that a kernel encountered a trap instruction.
+  static void reportTrapInKernel(
+      GenericDeviceTy &Device, KernelTraceInfoRecordTy &KTIR,
+      std::function<bool(__tgt_async_info &)> AsyncInfoWrapperMatcher) {
+    assert(AsyncInfoWrapperMatcher && "A matcher is required");
+
+    uint32_t Idx = 0;
+    for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
+      auto KTI = KTIR.getKernelTraceInfo(I);
+      if (KTI.Kernel == nullptr)
+        break;
+      // Skip kernels issued in other queues.
+      if (KTI.AsyncInfo && !(AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+        continue;
+      Idx = I;
+      break;
+    }
+
+    auto KTI = KTIR.getKernelTraceInfo(Idx);
+    if (KTI.AsyncInfo && (AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+      reportError("Kernel '%s'", KTI.Kernel->getName());
+    reportError("execution interrupted by hardware trap instruction");
+    if (KTI.AsyncInfo && (AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+      reportStackTrace(KTI.LaunchTrace);
+    abort();
+  }
+
+  /// Report the kernel traces taken from \p KTIR, up to
+  /// OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES many.
+  static void reportKernelTraces(GenericDeviceTy &Device,
+                                 KernelTraceInfoRecordTy &KTIR) {
+    uint32_t NumKTIs = 0;
+    for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
+      auto KTI = KTIR.getKernelTraceInfo(I);
+      if (KTI.Kernel == nullptr)
+        break;
+      ++NumKTIs;
+    }
+    if (NumKTIs == 0) {
+      print("%sNo kernel launches known\n%s", Red(), Default());
+      return;
+    }
+
+    uint32_t TracesToShow =
+        std::min(Device.OMPX_TrackNumKernelLaunches.get(), NumKTIs);
+    if (TracesToShow == 0) {
+      if (NumKTIs == 1) {
+        print("%sDisplay only launched kernel:\n%s", Cyan(), Default());
+      } else {
+        print("%sDisplay last %u kernels launched:\n%s", Cyan(), NumKTIs,
+              Default());
+      }
+    } else {
+      if (NumKTIs == 1) {
+        print("%sDisplay kernel launch trace:\n%s", Cyan(), Default());
+      } else {
+        print("%sDisplay %u of the %u last kernel launch traces:\n%s", Cyan(),
+              TracesToShow, NumKTIs, Default());
+      }
+    }
+
+    for (uint32_t Idx = 0, I = 0; I < NumKTIs; ++Idx) {
+      auto KTI = KTIR.getKernelTraceInfo(Idx);
+      if (NumKTIs == 1) {
+        print("%sKernel '%s'\n%s", Magenta(), KTI.Kernel->getName(), Default());
+      } else {
+        print("%sKernel %d: '%s'\n%s", Magenta(), I, KTI.Kernel->getName(),
+              Default());
+      }
+      reportStackTrace(KTI.LaunchTrace);
+      ++I;
+    }
+
+    if (NumKTIs != 1) {
+      print("Use '%s=<num>' to adjust the number of shown traces (up to %zu)\n",
+            Device.OMPX_TrackNumKernelLaunches.getName().data(), KTIR.size());
+    }
+    // TODO: Let users know how to serialize kernels
+  }
+};
+
+} // namespace plugin
+} // namespace target
+} // namespace omp
+} // namespace llvm
+
+#endif // OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 973add0ba1000..391607aecd8da 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -19,6 +19,...
[truncated]

llvmbot · 2024-07-24T21:34:58Z

@llvm/pr-subscribers-backend-amdgpu

Author: Johannes Doerfert (jdoerfert)

Changes

Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print <NUM>, set via
OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For trap errors, we print the actual kernel name, and trace if
recorded.

Patch is 48.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100472.diff

19 Files Affected:

(modified) offload/include/Shared/EnvironmentVar.h (+5-1)
(modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h (+1)
(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+66-11)
(added) offload/plugins-nextgen/common/include/ErrorReporting.h (+293)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+87)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+53)
(modified) offload/src/omptarget.cpp (+3-1)
(added) offload/test/sanitizer/double_free.c (+68)
(added) offload/test/sanitizer/free_host_ptr.c (+25)
(added) offload/test/sanitizer/free_wrong_ptr_kind.c (+35)
(added) offload/test/sanitizer/free_wrong_ptr_kind.cpp (+38)
(added) offload/test/sanitizer/kernel_crash.c (+44)
(added) offload/test/sanitizer/kernel_crash_async.c (+38)
(added) offload/test/sanitizer/kernel_crash_many.c (+70)
(added) offload/test/sanitizer/kernel_crash_single.c (+34)
(added) offload/test/sanitizer/kernel_trap.c (+39)
(added) offload/test/sanitizer/kernel_trap_async.c (+38)
(added) offload/test/sanitizer/kernel_trap_many.c (+33)
(modified) openmp/docs/design/Runtimes.rst (+14)

diff --git a/offload/include/Shared/EnvironmentVar.h b/offload/include/Shared/EnvironmentVar.h
index 4cbdad695a0ee..82f434e91a85b 100644
--- a/offload/include/Shared/EnvironmentVar.h
+++ b/offload/include/Shared/EnvironmentVar.h
@@ -28,6 +28,7 @@ struct StringParser {
 /// Class for reading and checking environment variables. Currently working with
 /// integer, floats, std::string and bool types.
 template <typename Ty> class Envar {
+  llvm::StringRef Name;
   Ty Data;
   bool IsPresent;
   bool Initialized;
@@ -53,7 +54,7 @@ template <typename Ty> class Envar {
   /// take the value read from the environment variable, or the default if it
   /// was not set or not correct. This constructor is not fallible.
   Envar(llvm::StringRef Name, Ty Default = Ty())
-      : Data(Default), IsPresent(false), Initialized(true) {
+      : Name(Name), Data(Default), IsPresent(false), Initialized(true) {
 
     if (const char *EnvStr = getenv(Name.data())) {
       // Check whether the envar is defined and valid.
@@ -84,6 +85,9 @@ template <typename Ty> class Envar {
   /// Get the definitive value.
   operator Ty() const { return get(); }
 
+  /// Return the environment variable name.
+  llvm::StringRef getName() const { return Name; }
+
   /// Indicate whether the environment variable was defined and valid.
   bool isPresent() const { return IsPresent; }
 
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
index 64a1d3308aed0..5d9fb5d7dc7cd 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
@@ -31,6 +31,7 @@ typedef enum {
   HSA_STATUS_ERROR = 0x1000,
   HSA_STATUS_ERROR_INVALID_CODE_OBJECT = 0x1010,
   HSA_STATUS_ERROR_NOT_INITIALIZED = 0x100B,
+  HSA_STATUS_ERROR_EXCEPTION = 0x1016,
 } hsa_status_t;
 
 hsa_status_t hsa_status_string(hsa_status_t status, const char **status_string);
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index e6643d3260eb4..371ac71d6defe 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -13,13 +13,16 @@
 #include <atomic>
 #include <cassert>
 #include <cstddef>
+#include <cstdint>
 #include <deque>
+#include <functional>
 #include <mutex>
 #include <string>
 #include <system_error>
 #include <unistd.h>
 #include <unordered_map>
 
+#include "ErrorReporting.h"
 #include "Shared/APITypes.h"
 #include "Shared/Debug.h"
 #include "Shared/Environment.h"
@@ -43,6 +46,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
 #include "llvm/Support/raw_ostream.h"
 
 #if !defined(__BYTE_ORDER__) || !defined(__ORDER_LITTLE_ENDIAN__) ||           \
@@ -685,12 +689,12 @@ struct AMDGPUQueueTy {
   AMDGPUQueueTy() : Queue(nullptr), Mutex(), NumUsers(0) {}
 
   /// Lazily initialize a new queue belonging to a specific agent.
-  Error init(hsa_agent_t Agent, int32_t QueueSize) {
+  Error init(GenericDeviceTy &Device, hsa_agent_t Agent, int32_t QueueSize) {
     if (Queue)
       return Plugin::success();
     hsa_status_t Status =
         hsa_queue_create(Agent, QueueSize, HSA_QUEUE_TYPE_MULTI, callbackError,
-                         nullptr, UINT32_MAX, UINT32_MAX, &Queue);
+                         &Device, UINT32_MAX, UINT32_MAX, &Queue);
     return Plugin::check(Status, "Error in hsa_queue_create: %s");
   }
 
@@ -875,10 +879,8 @@ struct AMDGPUQueueTy {
   }
 
   /// Callack that will be called when an error is detected on the HSA queue.
-  static void callbackError(hsa_status_t Status, hsa_queue_t *Source, void *) {
-    auto Err = Plugin::check(Status, "Received error in queue %p: %s", Source);
-    FATAL_MESSAGE(1, "%s", toString(std::move(Err)).data());
-  }
+  static void callbackError(hsa_status_t Status, hsa_queue_t *Source,
+                            void *Data);
 
   /// The HSA queue.
   hsa_queue_t *Queue;
@@ -1214,6 +1216,9 @@ struct AMDGPUStreamTy {
   /// Deinitialize the stream's signals.
   Error deinit() { return Plugin::success(); }
 
+  /// Return the associated (device) agent.
+  hsa_agent_t getAgent() const { return Agent; }
+
   /// Attach an RPC server to this stream.
   void setRPCServer(RPCServerTy *Server) { RPCServer = Server; }
 
@@ -1484,6 +1489,8 @@ struct AMDGPUStreamTy {
     return true;
   }
 
+  const AMDGPUQueueTy *getQueue() const { return Queue; }
+
   /// Record the state of the stream on an event.
   Error recordEvent(AMDGPUEventTy &Event) const;
 
@@ -1594,7 +1601,7 @@ struct AMDGPUStreamManagerTy final
   using ResourcePoolTy = GenericDeviceResourceManagerTy<ResourceRef>;
 
   AMDGPUStreamManagerTy(GenericDeviceTy &Device, hsa_agent_t HSAAgent)
-      : GenericDeviceResourceManagerTy(Device),
+      : GenericDeviceResourceManagerTy(Device), Device(Device),
         OMPX_QueueTracking("LIBOMPTARGET_AMDGPU_HSA_QUEUE_BUSY_TRACKING", true),
         NextQueue(0), Agent(HSAAgent) {}
 
@@ -1603,7 +1610,7 @@ struct AMDGPUStreamManagerTy final
     QueueSize = HSAQueueSize;
     MaxNumQueues = NumHSAQueues;
     // Initialize one queue eagerly
-    if (auto Err = Queues.front().init(Agent, QueueSize))
+    if (auto Err = Queues.front().init(Device, Agent, QueueSize))
       return Err;
 
     return GenericDeviceResourceManagerTy::init(InitialSize);
@@ -1660,7 +1667,7 @@ struct AMDGPUStreamManagerTy final
     }
 
     // Make sure the queue is initialized, then add user & assign.
-    if (auto Err = Queues[Index].init(Agent, QueueSize))
+    if (auto Err = Queues[Index].init(Device, Agent, QueueSize))
       return Err;
     Queues[Index].addUser();
     Stream->Queue = &Queues[Index];
@@ -1668,6 +1675,9 @@ struct AMDGPUStreamManagerTy final
     return Plugin::success();
   }
 
+  /// The device associated with this stream.
+  GenericDeviceTy &Device;
+
   /// Envar for controlling the tracking of busy HSA queues.
   BoolEnvar OMPX_QueueTracking;
 
@@ -3074,7 +3084,7 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
     Initialized = true;
 
     // Register event handler to detect memory errors on the devices.
-    Status = hsa_amd_register_system_event_handler(eventHandler, nullptr);
+    Status = hsa_amd_register_system_event_handler(eventHandler, this);
     if (auto Err = Plugin::check(
             Status, "Error in hsa_amd_register_system_event_handler: %s"))
       return std::move(Err);
@@ -3209,7 +3219,8 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
 
 private:
   /// Event handler that will be called by ROCr if an event is detected.
-  static hsa_status_t eventHandler(const hsa_amd_event_t *Event, void *) {
+  static hsa_status_t eventHandler(const hsa_amd_event_t *Event,
+                                   void *PluginPtr) {
     if (Event->event_type != HSA_AMD_GPU_MEMORY_FAULT_EVENT)
       return HSA_STATUS_SUCCESS;
 
@@ -3240,6 +3251,26 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
     uint32_t Node = -1;
     hsa_agent_get_info(Event->memory_fault.agent, HSA_AGENT_INFO_NODE, &Node);
 
+    AMDGPUPluginTy &Plugin = *reinterpret_cast<AMDGPUPluginTy *>(PluginPtr);
+    for (uint32_t I = 0, E = Plugin.getNumDevices();
+         Node != uint32_t(-1) && I < E; ++I) {
+      AMDGPUDeviceTy &AMDGPUDevice =
+          reinterpret_cast<AMDGPUDeviceTy &>(Plugin.getDevice(I));
+      auto KernelTraceInfoRecord =
+          AMDGPUDevice.KernelLaunchTraces.getExclusiveAccessor();
+
+      uint32_t DeviceNode = -1;
+      if (auto Err =
+              AMDGPUDevice.getDeviceAttr(HSA_AGENT_INFO_NODE, DeviceNode)) {
+        consumeError(std::move(Err));
+        continue;
+      }
+      if (DeviceNode != Node)
+        continue;
+
+      ErrorReporter::reportKernelTraces(AMDGPUDevice, *KernelTraceInfoRecord);
+    }
+
     // Abort the execution since we do not recover from this error.
     FATAL_MESSAGE(1,
                   "Memory access fault by GPU %" PRIu32 " (agent 0x%" PRIx64
@@ -3480,6 +3511,30 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
   return Alloc;
 }
 
+void AMDGPUQueueTy::callbackError(hsa_status_t Status, hsa_queue_t *Source,
+                                  void *Data) {
+  auto &AMDGPUDevice = *reinterpret_cast<AMDGPUDeviceTy *>(Data);
+
+  if (Status == HSA_STATUS_ERROR_EXCEPTION) {
+    auto KernelTraceInfoRecord =
+        AMDGPUDevice.KernelLaunchTraces.getExclusiveAccessor();
+    std::function<bool(__tgt_async_info &)> AsyncInfoWrapperMatcher =
+        [=](__tgt_async_info &AsyncInfo) {
+          auto *Stream = reinterpret_cast<AMDGPUStreamTy *>(AsyncInfo.Queue);
+          if (!Stream)
+            return false;
+          if (!Stream->getQueue())
+            return false;
+          return Stream->getQueue()->Queue == Source;
+        };
+    ErrorReporter::reportTrapInKernel(AMDGPUDevice, *KernelTraceInfoRecord,
+                                      AsyncInfoWrapperMatcher);
+  }
+
+  auto Err = Plugin::check(Status, "Received error in queue %p: %s", Source);
+  FATAL_MESSAGE(1, "%s", toString(std::move(Err)).data());
+}
+
 } // namespace plugin
 } // namespace target
 } // namespace omp
diff --git a/offload/plugins-nextgen/common/include/ErrorReporting.h b/offload/plugins-nextgen/common/include/ErrorReporting.h
new file mode 100644
index 0000000000000..f94b8e9c35997
--- /dev/null
+++ b/offload/plugins-nextgen/common/include/ErrorReporting.h
@@ -0,0 +1,293 @@
+//===- ErrorReporting.h - Helper to provide nice error messages ----- c++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
+#define OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
+
+#include "PluginInterface.h"
+#include "Shared/EnvironmentVar.h"
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/ErrorHandling.h"
+
+#include <cstdint>
+#include <cstdio>
+#include <cstdlib>
+#include <functional>
+#include <optional>
+#include <string>
+
+namespace llvm {
+namespace omp {
+namespace target {
+namespace plugin {
+
+class ErrorReporter {
+  /// The banner printed at the beginning of an error report.
+  static constexpr auto ErrorBanner = "OFFLOAD ERROR: ";
+
+  /// Terminal color codes
+  ///
+  /// TODO: determine if the terminal supports colors.
+  ///@{
+  static constexpr auto Green = []() { return "\033[1m\033[32m"; };
+  static constexpr auto Blue = []() { return "\033[1m\033[34m"; };
+  static constexpr auto Red = []() { return "\033[1m\033[31m"; };
+  static constexpr auto Magenta = []() { return "\033[1m\033[35m"; };
+  static constexpr auto Cyan = []() { return "\033[1m\033[36m"; };
+  static constexpr auto Default = []() { return "\033[1m\033[0m"; };
+  ///@}
+
+  /// The size of the getBuffer() buffer.
+  static constexpr unsigned BufferSize = 1024;
+
+  /// Return a buffer of size BufferSize that can be used for formatting.
+  static char *getBuffer() {
+    static char *Buffer = nullptr;
+    if (!Buffer)
+      Buffer = reinterpret_cast<char *>(malloc(BufferSize));
+    return Buffer;
+  }
+
+  /// Return the device id as string, or n/a if not available.
+  static std::string getDeviceIdStr(GenericDeviceTy *Device) {
+    return Device ? std::to_string(Device->getDeviceId()) : "n/a";
+  }
+
+  /// Return a nice name for an TargetAllocTy.
+  static std::string getAllocTyName(TargetAllocTy Kind) {
+    switch (Kind) {
+    case TARGET_ALLOC_DEVICE_NON_BLOCKING:
+    case TARGET_ALLOC_DEFAULT:
+    case TARGET_ALLOC_DEVICE:
+      return "device memory";
+    case TARGET_ALLOC_HOST:
+      return "pinned host memory";
+    case TARGET_ALLOC_SHARED:
+      return "managed memory";
+      break;
+    }
+    llvm_unreachable("Unknown target alloc kind");
+  }
+
+  /// Return a C string after \p Format has been instantiated with \p Args.
+  template <typename... ArgsTy>
+  static const char *getCString(const char *Format, ArgsTy &&...Args) {
+    std::snprintf(getBuffer(), BufferSize, Format,
+                  std::forward<ArgsTy>(Args)...);
+    return getBuffer();
+  }
+
+  /// Print \p Format, instantiated with \p Args to stderr.
+  /// TODO: Allow redirection into a file stream.
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wgcc-compat"
+#pragma clang diagnostic ignored "-Wformat-security"
+  template <typename... ArgsTy>
+  [[gnu::format(__printf__, 1, 2)]] static void print(const char *Format,
+                                                      ArgsTy &&...Args) {
+    fprintf(stderr, Format, std::forward<ArgsTy>(Args)...);
+  }
+
+  /// Report an error.
+  template <typename... ArgsTy>
+  [[gnu::format(__printf__, 1, 2)]] static void reportError(const char *Format,
+                                                            ArgsTy &&...Args) {
+    print(getCString("%s%s%s\n%s", Red(), ErrorBanner, Format, Default()),
+          Args...);
+  }
+#pragma clang diagnostic pop
+
+  /// Pretty print a stack trace.
+  static void reportStackTrace(StringRef StackTrace) {
+    if (StackTrace.empty())
+      return;
+
+    SmallVector<StringRef> Lines, Parts;
+    StackTrace.split(Lines, "\n", /*MaxSplit=*/-1, /*KeepEmpty=*/false);
+    int Start = Lines.empty() || !Lines[0].contains("PrintStackTrace") ? 0 : 1;
+    for (int I = Start, E = Lines.size(); I < E; ++I) {
+      auto Line = Lines[I];
+      Parts.clear();
+      Line = Line.drop_while([](char C) { return std::isspace(C); });
+      Line.split(Parts, " ", /*MaxSplit=*/2);
+      if (Parts.size() != 3 || Parts[0].size() < 2 || Parts[0][0] != '#') {
+        print("%s\n", Line.str().c_str());
+        continue;
+      }
+      unsigned FrameIdx = std::stoi(Parts[0].drop_front(1).str());
+      if (Start)
+        FrameIdx -= 1;
+      print("    %s%s%s%u %s%s%s %s\n", Magenta(),
+            Parts[0].take_front().str().c_str(), Green(), FrameIdx, Blue(),
+            Parts[1].str().c_str(), Default(), Parts[2].str().c_str());
+    }
+
+    printf("\n");
+  }
+
+  /// Report information about an allocation associated with \p ATI.
+  static void reportAllocationInfo(AllocationTraceInfoTy *ATI) {
+    if (!ATI)
+      return;
+
+    if (!ATI->DeallocationTrace.empty()) {
+      print("%s%s\n%s", Cyan(), "Last deallocation:", Default());
+      reportStackTrace(ATI->DeallocationTrace);
+    }
+
+    if (ATI->HostPtr)
+      print("%sLast allocation of size %lu for host pointer %p:\n%s", Cyan(),
+            ATI->Size, ATI->HostPtr, Default());
+    else
+      print("%sLast allocation of size %lu:\n%s", Cyan(), ATI->Size, Default());
+    reportStackTrace(ATI->AllocationTrace);
+    if (!ATI->LastAllocationInfo)
+      return;
+
+    unsigned I = 0;
+    print("%sPrior allocations with the same base pointer:", Cyan());
+    while (ATI->LastAllocationInfo) {
+      print("\n%s", Default());
+      ATI = ATI->LastAllocationInfo;
+      print("%s #%u Prior deallocation of size %lu:\n%s", Cyan(), I, ATI->Size,
+            Default());
+      reportStackTrace(ATI->DeallocationTrace);
+      if (ATI->HostPtr)
+        print("%s #%u Prior allocation for host pointer %p:\n%s", Cyan(), I,
+              ATI->HostPtr, Default());
+      else
+        print("%s #%u Prior allocation:\n%s", Cyan(), I, Default());
+      reportStackTrace(ATI->AllocationTrace);
+      ++I;
+    }
+  }
+
+public:
+  /// Check if the deallocation of \p DevicePtr is valid given \p ATI. Stores \p
+  /// StackTrace to \p ATI->DeallocationTrace if there was no error.
+  static void checkDeallocation(GenericDeviceTy *Device, void *DevicePtr,
+                                TargetAllocTy Kind, AllocationTraceInfoTy *ATI,
+                                std::string &StackTrace) {
+#define DEALLOCATION_ERROR(Format, ...)                                        \
+  reportError(Format, __VA_ARGS__);                                            \
+  reportStackTrace(StackTrace);                                                \
+  reportAllocationInfo(ATI);                                                   \
+  abort();
+
+    if (!ATI) {
+      DEALLOCATION_ERROR("deallocation of non-allocated %s: %p",
+                         getAllocTyName(Kind).c_str(), DevicePtr);
+    }
+
+    if (!ATI->DeallocationTrace.empty()) {
+      DEALLOCATION_ERROR("double-free of %s: %p", getAllocTyName(Kind).c_str(),
+                         DevicePtr);
+    }
+
+    if (ATI->Kind != Kind) {
+      DEALLOCATION_ERROR("deallocation requires %s but allocation was %s: %p",
+                         getAllocTyName(Kind).c_str(),
+                         getAllocTyName(ATI->Kind).c_str(), DevicePtr);
+    }
+
+    ATI->DeallocationTrace = StackTrace;
+
+#undef DEALLOCATION_ERROR
+  }
+
+  /// Report that a kernel encountered a trap instruction.
+  static void reportTrapInKernel(
+      GenericDeviceTy &Device, KernelTraceInfoRecordTy &KTIR,
+      std::function<bool(__tgt_async_info &)> AsyncInfoWrapperMatcher) {
+    assert(AsyncInfoWrapperMatcher && "A matcher is required");
+
+    uint32_t Idx = 0;
+    for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
+      auto KTI = KTIR.getKernelTraceInfo(I);
+      if (KTI.Kernel == nullptr)
+        break;
+      // Skip kernels issued in other queues.
+      if (KTI.AsyncInfo && !(AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+        continue;
+      Idx = I;
+      break;
+    }
+
+    auto KTI = KTIR.getKernelTraceInfo(Idx);
+    if (KTI.AsyncInfo && (AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+      reportError("Kernel '%s'", KTI.Kernel->getName());
+    reportError("execution interrupted by hardware trap instruction");
+    if (KTI.AsyncInfo && (AsyncInfoWrapperMatcher(*KTI.AsyncInfo)))
+      reportStackTrace(KTI.LaunchTrace);
+    abort();
+  }
+
+  /// Report the kernel traces taken from \p KTIR, up to
+  /// OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES many.
+  static void reportKernelTraces(GenericDeviceTy &Device,
+                                 KernelTraceInfoRecordTy &KTIR) {
+    uint32_t NumKTIs = 0;
+    for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
+      auto KTI = KTIR.getKernelTraceInfo(I);
+      if (KTI.Kernel == nullptr)
+        break;
+      ++NumKTIs;
+    }
+    if (NumKTIs == 0) {
+      print("%sNo kernel launches known\n%s", Red(), Default());
+      return;
+    }
+
+    uint32_t TracesToShow =
+        std::min(Device.OMPX_TrackNumKernelLaunches.get(), NumKTIs);
+    if (TracesToShow == 0) {
+      if (NumKTIs == 1) {
+        print("%sDisplay only launched kernel:\n%s", Cyan(), Default());
+      } else {
+        print("%sDisplay last %u kernels launched:\n%s", Cyan(), NumKTIs,
+              Default());
+      }
+    } else {
+      if (NumKTIs == 1) {
+        print("%sDisplay kernel launch trace:\n%s", Cyan(), Default());
+      } else {
+        print("%sDisplay %u of the %u last kernel launch traces:\n%s", Cyan(),
+              TracesToShow, NumKTIs, Default());
+      }
+    }
+
+    for (uint32_t Idx = 0, I = 0; I < NumKTIs; ++Idx) {
+      auto KTI = KTIR.getKernelTraceInfo(Idx);
+      if (NumKTIs == 1) {
+        print("%sKernel '%s'\n%s", Magenta(), KTI.Kernel->getName(), Default());
+      } else {
+        print("%sKernel %d: '%s'\n%s", Magenta(), I, KTI.Kernel->getName(),
+              Default());
+      }
+      reportStackTrace(KTI.LaunchTrace);
+      ++I;
+    }
+
+    if (NumKTIs != 1) {
+      print("Use '%s=<num>' to adjust the number of shown traces (up to %zu)\n",
+            Device.OMPX_TrackNumKernelLaunches.getName().data(), KTIR.size());
+    }
+    // TODO: Let users know how to serialize kernels
+  }
+};
+
+} // namespace plugin
+} // namespace target
+} // namespace omp
+} // namespace llvm
+
+#endif // OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_ERROR_REPORTING_H
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 973add0ba1000..391607aecd8da 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -19,6 +19,...
[truncated]

github-actions · 2024-07-24T21:38:35Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff b7db119eadde630f7ba7e4abc1ca6ee4fa8ccdfe e0816f3d7482e38dce23e37ebc58c957733f8d38 --extensions cpp,c,h -- offload/test/sanitizer/kernel_crash.c offload/test/sanitizer/kernel_crash_async.c offload/test/sanitizer/kernel_crash_many.c offload/test/sanitizer/kernel_crash_single.c offload/test/sanitizer/kernel_trap.c offload/test/sanitizer/kernel_trap_async.c offload/test/sanitizer/kernel_trap_many.c offload/include/Shared/EnvironmentVar.h offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h offload/plugins-nextgen/amdgpu/src/rtl.cpp offload/plugins-nextgen/common/include/ErrorReporting.h offload/plugins-nextgen/common/include/PluginInterface.h offload/plugins-nextgen/common/src/PluginInterface.cpp

View the diff from clang-format here.

diff --git a/offload/test/sanitizer/kernel_crash.c b/offload/test/sanitizer/kernel_crash.c
index 457d953a33..e2b1bad53e 100644
--- a/offload/test/sanitizer/kernel_crash.c
+++ b/offload/test/sanitizer/kernel_crash.c
@@ -22,18 +22,13 @@
 int main(void) {
   int *A = 0;
 #pragma omp target
-  {
-  }
+  {}
 #pragma omp target
-  {
-  }
+  {}
 #pragma omp target
-  {
-    *A = 42;
-  }
+  { *A = 42; }
 #pragma omp target
-  {
-  }
+  {}
 }
 // TRACE: Display 1 of the 3 last kernel launch traces
 // TRACE: Kernel 0: '__omp_offloading_{{.*}}_main_l30'
diff --git a/offload/test/sanitizer/kernel_crash_async.c b/offload/test/sanitizer/kernel_crash_async.c
index 6aebf1b42a..b01c8ae937 100644
--- a/offload/test/sanitizer/kernel_crash_async.c
+++ b/offload/test/sanitizer/kernel_crash_async.c
@@ -22,15 +22,11 @@
 int main(void) {
   int *A = 0;
 #pragma omp target nowait
-  {
-  }
+  {}
 #pragma omp target nowait
-  {
-  }
+  {}
 #pragma omp target nowait
-  {
-    *A = 42;
-  }
+  { *A = 42; }
 #pragma omp taskwait
 }
 
diff --git a/offload/test/sanitizer/kernel_crash_many.c b/offload/test/sanitizer/kernel_crash_many.c
index 9e3f4f1630..1e06a7f135 100644
--- a/offload/test/sanitizer/kernel_crash_many.c
+++ b/offload/test/sanitizer/kernel_crash_many.c
@@ -21,13 +21,10 @@ int main(void) {
   int *A = 0;
   for (int i = 0; i < 10; ++i) {
 #pragma omp target
-    {
-    }
+    {}
   }
 #pragma omp target
-  {
-    *A = 42;
-  }
+  { *A = 42; }
 }
 // CHECK: Display 8 of the 8 last kernel launch traces
 // CHECK: Kernel 0: '__omp_offloading_{{.*}}_main_l27'
diff --git a/offload/test/sanitizer/kernel_crash_single.c b/offload/test/sanitizer/kernel_crash_single.c
index 16a8159f07..0dfe93b19d 100644
--- a/offload/test/sanitizer/kernel_crash_single.c
+++ b/offload/test/sanitizer/kernel_crash_single.c
@@ -22,9 +22,7 @@
 int main(void) {
   int *A = 0;
 #pragma omp target
-  {
-    *A = 42;
-  }
+  { *A = 42; }
 }
 // TRACE: Display kernel launch trace
 // TRACE: Kernel '__omp_offloading_{{.*}}_main_l24'
diff --git a/offload/test/sanitizer/kernel_trap.c b/offload/test/sanitizer/kernel_trap.c
index 13fe6f2fb7..cb7b35e6ee 100644
--- a/offload/test/sanitizer/kernel_trap.c
+++ b/offload/test/sanitizer/kernel_trap.c
@@ -22,18 +22,13 @@
 int main(void) {
 
 #pragma omp target
-  {
-  }
+  {}
 #pragma omp target
-  {
-  }
+  {}
 #pragma omp target
-  {
-    __builtin_trap();
-  }
+  { __builtin_trap(); }
 #pragma omp target
-  {
-  }
+  {}
 }
 // CHECK: OFFLOAD ERROR: Kernel '__omp_offloading_{{.*}}_main_l30'
 // CHECK: OFFLOAD ERROR: execution interrupted by hardware trap instruction
diff --git a/offload/test/sanitizer/kernel_trap_async.c b/offload/test/sanitizer/kernel_trap_async.c
index 65e8880798..c0b0cf0835 100644
--- a/offload/test/sanitizer/kernel_trap_async.c
+++ b/offload/test/sanitizer/kernel_trap_async.c
@@ -22,15 +22,11 @@
 int main(void) {
 
 #pragma omp target nowait
-  {
-  }
+  {}
 #pragma omp target nowait
-  {
-  }
+  {}
 #pragma omp target nowait
-  {
-    __builtin_trap();
-  }
+  { __builtin_trap(); }
 #pragma omp taskwait
 }
 
diff --git a/offload/test/sanitizer/kernel_trap_many.c b/offload/test/sanitizer/kernel_trap_many.c
index 3f1796e891..d9f06ac3f3 100644
--- a/offload/test/sanitizer/kernel_trap_many.c
+++ b/offload/test/sanitizer/kernel_trap_many.c
@@ -21,13 +21,10 @@ int main(void) {
 
   for (int i = 0; i < 10; ++i) {
 #pragma omp target
-    {
-    }
+    {}
   }
 #pragma omp target
-  {
-    __builtin_trap();
-  }
+  { __builtin_trap(); }
 }
 // TRACE: OFFLOAD ERROR: Kernel '__omp_offloading_{{.*}}_main_l27'
 // TRACE: OFFLOAD ERROR: execution interrupted by hardware trap instruction

jhuber6

Some nits for now. Would it be possible to split the allocation tracing and the stack trace printing?

offload/plugins-nextgen/amdgpu/src/rtl.cpp

offload/plugins-nextgen/common/include/ErrorReporting.h

jdoerfert · 2024-07-24T21:50:08Z

Some nits for now. Would it be possible to split the allocation tracing and the stack trace printing?

It's two commits, once the other one is in, this one will be one commit. I can't split-split it as it depends on the printer functionality. Maybe just look at the topmost commit or review the other PR first?

(I looked into graphite for stacked commits; not a fan)

shiltian · 2024-07-26T04:20:37Z

It looks like this PR now has two separate commits that should go to two PRs?

jdoerfert · 2024-07-26T16:58:14Z

It looks like this PR now has two separate commits that should go to two PRs?

Two commits, two PRs:

jhuber6 · 2024-07-30T21:54:51Z

offload/plugins-nextgen/common/include/ErrorReporting.h

+    for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
+      auto KTI = KTIR.getKernelTraceInfo(I);


Does llvm::enumerate work here?

Likely, is that better? It has no precedent in Offload rn, maybe worth introducing in one swoop, if it is preferred.

Not a huge deal, just tends to be cleaner.

I can change it here for the final version, assuming I won't encounter any problems.

jhuber6 · 2024-07-30T21:58:42Z

offload/plugins-nextgen/common/include/PluginInterface.h

+  }
+
+private:
+  std::array<KernelTraceInfoTy, 8> KTIs;


Why are we using a static array here? We could use SmallVector<x, 8> and get the same stack layout, or is there a need for no dynamic resizing.

Right now, I use it as a fixed size ring buffer. It is unclear to me why one would store more than the last few traces, at least for now.

jhuber6

Makes sense overall, is t his only AMDGPU?

jhuber6 · 2024-07-30T22:07:10Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

+          if (!Stream)
+            return false;
+          if (!Stream->getQueue())
+            return false;
+          return Stream->getQueue()->Queue == Source;


This could probably be collapsed into one return statement.

Will do two, one for false, one for the equality. One return would be less legible, IMHO.

jdoerfert · 2024-07-30T22:09:33Z

Makes sense overall, is t his only AMDGPU?

Yes, mostly because I haven't tested it on NVIDIA and need to setup the handler to call into ErrorReporting. I will ask someone to port that stuff soon, it should not be hard.

Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.

jdoerfert requested review from kevinsala, shiltian, mhalk and jhuber6 July 24, 2024 21:34

llvmbot added backend:AMDGPU openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime offload labels Jul 24, 2024

jdoerfert force-pushed the pr/kernel_traces branch from a6af910 to 6444ee5 Compare July 24, 2024 21:34

jhuber6 reviewed Jul 24, 2024

View reviewed changes

jdoerfert force-pushed the pr/kernel_traces branch 3 times, most recently from 8542660 to e018174 Compare July 25, 2024 19:53

jdoerfert force-pushed the pr/kernel_traces branch 2 times, most recently from 88d23c1 to 6cb0493 Compare July 30, 2024 21:42

jhuber6 reviewed Jul 30, 2024

View reviewed changes

jhuber6 approved these changes Jul 30, 2024

View reviewed changes

jdoerfert force-pushed the pr/kernel_traces branch from 6cb0493 to 39f6ec0 Compare July 31, 2024 18:37

jdoerfert force-pushed the pr/kernel_traces branch from 39f6ec0 to e0816f3 Compare July 31, 2024 18:49

jdoerfert merged commit 9a10132 into llvm:main Jul 31, 2024
5 of 6 checks passed

jdoerfert deleted the pr/kernel_traces branch August 14, 2024 14:38

		for (uint32_t I = 0, E = KTIR.size(); I < E; ++I) {
		auto KTI = KTIR.getKernelTraceInfo(I);

[Offload] Allow to record kernel launch stack traces #100472

[Offload] Allow to record kernel launch stack traces #100472

Uh oh!

Conversation

jdoerfert commented Jul 24, 2024

Uh oh!

llvmbot commented Jul 24, 2024

Uh oh!

llvmbot commented Jul 24, 2024

Uh oh!

github-actions bot commented Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdoerfert commented Jul 24, 2024

Uh oh!

shiltian commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdoerfert commented Jul 26, 2024

Uh oh!

jhuber6 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jdoerfert Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jdoerfert Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jdoerfert Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

jdoerfert Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdoerfert commented Jul 30, 2024

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 24, 2024 •

edited

Loading

shiltian commented Jul 26, 2024 •

edited

Loading

jdoerfert Jul 30, 2024 •

edited

Loading