Skip to content

[SYCL][CUDA] Add support of CUDA XPTI tracing #6373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions sycl/cmake/modules/FindCUDACupti.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
macro(find_cuda_cupti_library)
# The following if can be removed when FindCUDA -> FindCUDAToolkit
find_library(CUDA_cupti_LIBRARY
NAMES cupti
HINTS ${CUDA_TOOLKIT_ROOT_DIR}
ENV CUDA_PATH
PATH_SUFFIXES nvidia/current lib64 lib/x64 lib
../extras/CUPTI/lib64/
../extras/CUPTI/lib/
)
endmacro()

macro(find_cuda_cupti_include_dir)
find_path(CUDA_CUPTI_INCLUDE_DIR cupti.h PATHS
"${CUDA_TOOLKIT_ROOT_DIR}/extras/CUPTI/include"
"${CUDA_INCLUDE_DIRS}/../extras/CUPTI/include"
"${CUDA_INCLUDE_DIRS}"
NO_DEFAULT_PATH)
endmacro()

20 changes: 20 additions & 0 deletions sycl/doc/design/SYCLInstrumentationUsingXPTI.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,3 +320,23 @@ arguments made by SYCL application.
| :------------------------: | :-------------------- | :------- |
| `function_with_args_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_with_args_begin` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi.debug` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_with_args_begin` event with the `function_with_args_end` event. </li> <li> **user_data**: A pointer to `function_with_args_t` object, that includes function ID, name, and arguments. </li></div> | None |
| `function_with_args_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_with_args_end` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi.debug` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_with_args_begin` event with the `function_with_args_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `function_with_args_begin` </li> <li> **user_data**: A pointer to `function_with_args_t` object, that includes function ID, name, arguments, and return value. </li></div> | None |

## SYCL Stream `"sycl.experimental.cuda.call"` Notification Signatures

This stream transfers events about CUDA Driver API calls made by SYCL
application.

| Trace Point Type | Parameter Description | Metadata |
| :--------------: | :-------------------- | :------- |
| `function_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_begin` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_begin` event with the `function_end` event. </li> <li> **user_data**: Name of the function being called sent in as `const char *` </li></div> | None |
| `function_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_end` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_begin` event with the `function_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `function_begin` </li> <li> **user_data**: Name of the function being called sent in as `const char *` </li></div> | None |

## SYCL Stream `"sycl.experimental.cuda.debug"` Notification Signatures

This stream transfers events about CUDA Driver API calls and their function
arguments made by SYCL application.

| Trace Point Type | Parameter Description | Metadata |
| :------------------------: | :-------------------- | :------- |
| `function_with_args_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_with_args_begin` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi.debug` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_with_args_begin` event with the `function_with_args_end` event. </li> <li> **user_data**: A pointer to `function_with_args_t` object, that includes function ID, name, and arguments. </li></div> | None |
| `function_with_args_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::function_with_args_end` that marks the beginning of a function</li> <li> **parent**: Event ID created for all functions in the `sycl.pi.debug` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `function_with_args_begin` event with the `function_with_args_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `function_with_args_begin` </li> <li> **user_data**: A pointer to `function_with_args_t` object, that includes function ID, name, arguments, and return value. </li></div> | None |
34 changes: 34 additions & 0 deletions sycl/plugins/cuda/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,48 @@ else()
)
endif()

if (SYCL_ENABLE_XPTI_TRACING)
set(XPTI_PROXY_SRC "${CMAKE_SOURCE_DIR}/../xpti/src/xpti_proxy.cpp")
endif()

# The following two if's can be removed when FindCUDA -> FindCUDAToolkit.
# CUDA_CUPTI_INCLUDE_DIR -> CUDAToolkit_CUPTI_INCLUDE_DIR
include(FindCUDACupti)
if(NOT CUDA_CUPTI_INCLUDE_DIR)
find_cuda_cupti_include_dir()
endif()
# CUDA_cupti_LIBRARY -> CUDAToolkit_cupti_LIBRARY
if(NOT CUDA_cupti_LIBRARY)
find_cuda_cupti_library()
endif()

add_sycl_plugin(cuda
SOURCES
"${sycl_inc_dir}/sycl/detail/pi.h"
"${sycl_inc_dir}/sycl/detail/pi.hpp"
"pi_cuda.hpp"
"pi_cuda.cpp"
"tracing.cpp"
${XPTI_PROXY_SRC}
LIBRARIES
cudadrv
${CUDA_cupti_LIBRARY}
)

if (SYCL_ENABLE_XPTI_TRACING)
target_compile_definitions(pi_cuda PRIVATE
XPTI_ENABLE_INSTRUMENTATION
XPTI_STATIC_LIBRARY
)
target_include_directories(pi_cuda PRIVATE "${CMAKE_SOURCE_DIR}/../xpti/include")
target_link_libraries(pi_cuda PRIVATE ${CMAKE_DL_LIBS})
endif()


target_include_directories(pi_cuda
PRIVATE
${sycl_inc_dir}
${CUDA_CUPTI_INCLUDE_DIR}
)

set_target_properties(pi_cuda PROPERTIES LINKER_LANGUAGE CXX)
Expand Down
11 changes: 10 additions & 1 deletion sycl/plugins/cuda/pi_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
#include <mutex>
#include <regex>

// Forward declarations
void enableCUDATracing();
void disableCUDATracing();

namespace {
std::string getCudaVersionString() {
int driver_version = 0;
Expand Down Expand Up @@ -5292,7 +5296,10 @@ pi_result cuda_piextUSMGetMemAllocInfo(pi_context context, const void *ptr,
// This API is called by Sycl RT to notify the end of the plugin lifetime.
// TODO: add a global variable lifetime management code here (see
// pi_level_zero.cpp for reference) Currently this is just a NOOP.
pi_result cuda_piTearDown(void *) { return PI_SUCCESS; }
pi_result cuda_piTearDown(void *) {
disableCUDATracing();
return PI_SUCCESS;
}

const char SupportedVersion[] = _PI_CUDA_PLUGIN_VERSION_STRING;

Expand All @@ -5311,6 +5318,8 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
std::memset(&(PluginInit->PiFunctionTable), 0,
sizeof(PluginInit->PiFunctionTable));

enableCUDATracing();

// Forward calls to CUDA RT.
#define _PI_CL(pi_api, cuda_api) \
(PluginInit->PiFunctionTable).pi_api = (decltype(&::pi_api))(&cuda_api);
Expand Down
109 changes: 109 additions & 0 deletions sycl/plugins/cuda/tracing.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
//===-------------- tracing.cpp - CUDA Host API Tracing --------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifdef XPTI_ENABLE_INSTRUMENTATION
#include <xpti/xpti_data_types.h>
#include <xpti/xpti_trace_framework.h>
#endif

#include <cuda.h>
#include <cupti.h>

#include <exception>
#include <iostream>

constexpr auto CUDA_CALL_STREAM_NAME = "sycl.experimental.cuda.call";
constexpr auto CUDA_DEBUG_STREAM_NAME = "sycl.experimental.cuda.debug";

thread_local uint64_t CallCorrelationID = 0;
thread_local uint64_t DebugCorrelationID = 0;

#ifdef XPTI_ENABLE_INSTRUMENTATION
static xpti_td *GCallEvent = nullptr;
static xpti_td *GDebugEvent = nullptr;
#endif // XPTI_ENABLE_INSTRUMENTATION

constexpr auto GVerStr = "0.1";
constexpr int GMajVer = 0;
constexpr int GMinVer = 1;

#ifdef XPTI_ENABLE_INSTRUMENTATION
static void cuptiCallback(void *userdata, CUpti_CallbackDomain,
CUpti_CallbackId CBID, const void *CBData) {
if (xptiTraceEnabled()) {
const auto *CBInfo = static_cast<const CUpti_CallbackData *>(CBData);

if (CBInfo->callbackSite == CUPTI_API_ENTER) {
CallCorrelationID = xptiGetUniqueId();
DebugCorrelationID = xptiGetUniqueId();
}

const char *FuncName = CBInfo->functionName;
uint32_t FuncID = static_cast<uint32_t>(CBID);
uint16_t TraceTypeArgs = CBInfo->callbackSite == CUPTI_API_ENTER
? xpti::trace_function_with_args_begin
: xpti::trace_function_with_args_end;
uint16_t TraceType = CBInfo->callbackSite == CUPTI_API_ENTER
? xpti::trace_function_begin
: xpti::trace_function_end;

uint8_t CallStreamID = xptiRegisterStream(CUDA_CALL_STREAM_NAME);
uint8_t DebugStreamID = xptiRegisterStream(CUDA_DEBUG_STREAM_NAME);

xptiNotifySubscribers(CallStreamID, TraceType, GCallEvent, nullptr,
CallCorrelationID, FuncName);

xpti::function_with_args_t Payload{
FuncID, FuncName, const_cast<void *>(CBInfo->functionParams),
CBInfo->functionReturnValue, CBInfo->context};
xptiNotifySubscribers(DebugStreamID, TraceTypeArgs, GDebugEvent, nullptr,
DebugCorrelationID, &Payload);
}
}
#endif

void enableCUDATracing() {
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (!xptiTraceEnabled())
return;

xptiRegisterStream(CUDA_CALL_STREAM_NAME);
xptiInitialize(CUDA_CALL_STREAM_NAME, GMajVer, GMinVer, GVerStr);
xptiRegisterStream(CUDA_DEBUG_STREAM_NAME);
xptiInitialize(CUDA_DEBUG_STREAM_NAME, GMajVer, GMinVer, GVerStr);

uint64_t Dummy;
xpti::payload_t CUDAPayload("CUDA Plugin Layer");
GCallEvent =
xptiMakeEvent("CUDA Plugin Layer", &CUDAPayload,
xpti::trace_algorithm_event, xpti_at::active, &Dummy);

xpti::payload_t CUDADebugPayload("CUDA Plugin Debug Layer");
GDebugEvent =
xptiMakeEvent("CUDA Plugin Debug Layer", &CUDADebugPayload,
xpti::trace_algorithm_event, xpti_at::active, &Dummy);

CUpti_SubscriberHandle Subscriber;
cuptiSubscribe(&Subscriber, cuptiCallback, nullptr);
cuptiEnableDomain(1, Subscriber, CUPTI_CB_DOMAIN_DRIVER_API);
cuptiEnableCallback(0, Subscriber, CUPTI_CB_DOMAIN_DRIVER_API,
CUPTI_DRIVER_TRACE_CBID_cuGetErrorString);
cuptiEnableCallback(0, Subscriber, CUPTI_CB_DOMAIN_DRIVER_API,
CUPTI_DRIVER_TRACE_CBID_cuGetErrorName);
#endif
}

void disableCUDATracing() {
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (!xptiTraceEnabled())
return;

xptiFinalize(CUDA_CALL_STREAM_NAME);
xptiFinalize(CUDA_DEBUG_STREAM_NAME);
#endif // XPTI_ENABLE_INSTRUMENTATION
}
6 changes: 6 additions & 0 deletions sycl/tools/sycl-prof/collector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,12 @@ XPTI_CALLBACK_API void xptiTraceInit(unsigned int /*major_version*/,
apiBeginEndCallback);
xptiRegisterCallback(StreamID, xpti::trace_function_end,
apiBeginEndCallback);
} else if (NameView == "sycl.experimental.cuda.call") {
uint8_t StreamID = xptiRegisterStream(StreamName);
xptiRegisterCallback(StreamID, xpti::trace_function_begin,
apiBeginEndCallback);
xptiRegisterCallback(StreamID, xpti::trace_function_end,
apiBeginEndCallback);
}
}

Expand Down
59 changes: 59 additions & 0 deletions sycl/tools/sycl-trace/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ add_library(sycl_pi_trace_collector SHARED
collector.cpp
pi_trace_collector.cpp
ze_trace_collector.cpp
$<$<BOOL:${SYCL_BUILD_PI_CUDA}>:cuda_trace_collector.cpp>
)

find_package(Python3 REQUIRED)
Expand Down Expand Up @@ -56,6 +57,64 @@ target_include_directories(sycl_pi_trace_collector PRIVATE
)

add_dependencies(sycl_pi_trace_collector pi-pretty-printers ze-pretty-printers)

if(SYCL_BUILD_PI_CUDA)

find_package(CUDA 10.1 REQUIRED)

target_compile_definitions(sycl_pi_trace_collector
PRIVATE
$<$<BOOL:${SYCL_BUILD_PI_CUDA}>:USE_PI_CUDA>
)

# The following two if's can be removed when FindCUDA -> FindCUDAToolkit.
# CUDA_CUPTI_INCLUDE_DIR -> CUDAToolkit_CUPTI_INCLUDE_DIR
include(FindCUDACupti)
if(NOT CUDA_CUPTI_INCLUDE_DIR)
find_cuda_cupti_include_dir()
endif()
# CUDA_cupti_LIBRARY -> CUDAToolkit_cupti_LIBRARY
if(NOT CUDA_cupti_LIBRARY)
find_cuda_cupti_library()
endif()

target_include_directories(sycl_pi_trace_collector
PRIVATE
${CUDA_CUPTI_INCLUDE_DIR}
)

target_link_libraries(sycl_pi_trace_collector
PRIVATE
cudadrv
${CUDA_cupti_LIBRARY}
)

find_path(GEN_CUDA_META_H_DIR generated_cuda_meta.h PATHS
"${CUDA_TOOLKIT_ROOT_DIR}/extras/CUPTI/include"
"${CUDA_INCLUDE_DIRS}/../extras/CUPTI/include"
"${CUDA_INCLUDE_DIRS}"
NO_DEFAULT_PATH)

if( EXISTS "${GEN_CUDA_META_H_DIR}/generated_cuda_meta.h" )
add_custom_target(cuda-pretty-printers
COMMAND ${Python3_EXECUTABLE}
${CMAKE_CURRENT_SOURCE_DIR}/generate_cuda_pretty_printers.py
${GEN_CUDA_META_H_DIR}/generated_cuda_meta.h
DEPENDS pi_cuda
BYPRODUCTS
${CMAKE_CURRENT_BINARY_DIR}/cuda_printers.def
)
else()
message(WARNING "generated_cuda_meta.h not FOUND!")
message(WARNING "CUDA printer definitions cannot be generated.")
file(TOUCH "${CMAKE_CURRENT_BINARY_DIR}/cuda_printers.def")
add_custom_target(cuda-pretty-printers)
endif()

add_dependencies(sycl_pi_trace_collector cuda-pretty-printers)

endif()

add_dependencies(sycl-trace sycl_pi_trace_collector)
add_dependencies(sycl-toolchain sycl-trace)

Expand Down
27 changes: 26 additions & 1 deletion sycl/tools/sycl-trace/collector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,15 @@
sycl::detail::SpinLock GlobalLock;

bool HasZEPrinter = false;
bool HasCUPrinter = false;
bool HasPIPrinter = false;

void zePrintersInit();
void zePrintersFinish();
#ifdef USE_PI_CUDA
void cuPrintersInit();
void cuPrintersFinish();
#endif
void piPrintersInit();
void piPrintersFinish();

Expand All @@ -28,7 +33,12 @@ XPTI_CALLBACK_API void zeCallback(uint16_t TraceType,
xpti::trace_event_data_t *Parent,
xpti::trace_event_data_t *Event,
uint64_t Instance, const void *UserData);

#ifdef USE_PI_CUDA
XPTI_CALLBACK_API void cuCallback(uint16_t TraceType,
xpti::trace_event_data_t *Parent,
xpti::trace_event_data_t *Event,
uint64_t Instance, const void *UserData);
#endif
XPTI_CALLBACK_API void xptiTraceInit(unsigned int /*major_version*/,
unsigned int /*minor_version*/,
const char * /*version_str*/,
Expand All @@ -50,6 +60,16 @@ XPTI_CALLBACK_API void xptiTraceInit(unsigned int /*major_version*/,
zeCallback);
xptiRegisterCallback(StreamID, xpti::trace_function_with_args_end,
zeCallback);
#ifdef USE_PI_CUDA
} else if (std::string_view(StreamName) == "sycl.experimental.cuda.debug" &&
std::getenv("SYCL_TRACE_CU_ENABLE")) {
cuPrintersInit();
uint16_t StreamID = xptiRegisterStream(StreamName);
xptiRegisterCallback(StreamID, xpti::trace_function_with_args_begin,
cuCallback);
xptiRegisterCallback(StreamID, xpti::trace_function_with_args_end,
cuCallback);
#endif
}
}

Expand All @@ -61,4 +81,9 @@ XPTI_CALLBACK_API void xptiTraceFinish(const char *StreamName) {
"sycl.experimental.level_zero.debug" &&
std::getenv("SYCL_TRACE_ZE_ENABLE"))
zePrintersFinish();
#ifdef USE_PI_CUDA
else if (std::string_view(StreamName) == "sycl.experimental.cuda.debug" &&
std::getenv("SYCL_TRACE_CU_ENABLE"))
cuPrintersFinish();
#endif
}
Loading