-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP][OMPT] Add OMPT callback for device data exchange 'Device-to-Device' #81991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a4ed787
to
5fd8e50
Compare
Removed assertions w.r.t. |
Do we have a place to document this decision other than this PR / commit message? |
Can confirm that this pull request also works for NVIDIA GPUs, though I weren't able to test it with multiple accelerators due to build issues on our HPC machines. $ clang --version
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5fd8e50feff94dac7e741b07c956622b7c25bc6a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jreuter/Projects/Compilers/llvm-project/_build/_install/bin
$ clang -fopenmp --offload-arch=native reproducer.c
$ ./a.out
Callback Init: device_num=0 type=sm_75 device=0x55d75ee03a40 lookup=0x7fb3518ebb50 doc=(nil)
Allocating memory on device
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000001) src=(nil) src_device_num=1 dest=(nil) dest_device_num=0 bytes=4 code=0x55d75ce76853
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000001) src=(nil) src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce76853
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000002) src=(nil) src_device_num=1 dest=(nil) dest_device_num=0 bytes=4 code=0x55d75ce76864
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000002) src=(nil) src_device_num=1 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce76864
Testing host to device
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000003) src=0x55d75f66b200 src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce768ca
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000003) src=0x55d75f66b200 src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce768ca
Testing device to device
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000004) src=0x7fb325a00000 src_device_num=0 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce768fc
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000004) src=0x7fb325a00000 src_device_num=0 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce768fc
Testing device to host
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000005) src=0x7fb325a00200 src_device_num=0 dest=0x55d75f66b200 dest_device_num=1 bytes=4 code=0x55d75ce76942
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000005) src=0x7fb325a00200 src_device_num=0 dest=0x55d75f66b200 dest_device_num=1 bytes=4 code=0x55d75ce76942
Checking correctness
Freeing memory on device
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000006) src=0x7fb325a00000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769a4
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000006) src=0x7fb325a00000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769a4
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000007) src=0x7fb325a00200 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769b0
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000007) src=0x7fb325a00200 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769b0
Callback Fini: device_num=0 x86_64 still reports two transfers even though $ clang --version
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5fd8e50feff94dac7e741b07c956622b7c25bc6a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jreuter/Projects/Compilers/llvm-project/_build/_install/bin
$ clang -fopenmp -fopenmp-targets=x86_64 reproducer.c
$ ./a.out
Callback Init: device_num=0 type=generic-64bit device=0x5644e0820950 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=1 type=generic-64bit device=0x5644e0821380 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=2 type=generic-64bit device=0x5644e08219a0 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=3 type=generic-64bit device=0x5644e08221d0 lookup=0x7fb3d245bb50 doc=(nil)
Allocating memory on device
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=(nil) dest_device_num=0 bytes=4 code=0x5644dee0c853
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c853
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=(nil) dest_device_num=1 bytes=4 code=0x5644dee0c864
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c864
Testing host to device
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000003) src=0x5644e081b990 src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c8ca
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000003) src=0x5644e081b990 src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c8ca
Testing device to device
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000004) src=0x5644e0820790 src_device_num=0 dest=0x5644e0820880 dest_device_num=4 bytes=4 code=0x5644dee0c8fc
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000004) src=0x5644e0820790 src_device_num=0 dest=0x5644e0820880 dest_device_num=4 bytes=4 code=0x5644dee0c8fc
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000005) src=0x5644e0820880 src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c8fc
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000005) src=0x5644e0820880 src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c8fc
Testing device to host
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000006) src=0x5644e07fafb0 src_device_num=1 dest=0x5644e081b990 dest_device_num=4 bytes=4 code=0x5644dee0c942
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000006) src=0x5644e07fafb0 src_device_num=1 dest=0x5644e081b990 dest_device_num=4 bytes=4 code=0x5644dee0c942
Checking correctness
Freeing memory on device
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000007) src=0x5644e0820790 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9a4
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000007) src=0x5644e0820790 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9a4
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000008) src=0x5644e07fafb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9b0
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000008) src=0x5644e07fafb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9b0
Callback Fini: device_num=0
Callback Fini: device_num=1
Callback Fini: device_num=2
Callback Fini: device_num=3 Click to expand output with LIBOMPTARGET_DEBUG
|
No, currently not. (But now that I think of it we should add another easily accessible doc.) |
5fd8e50
to
029ed0a
Compare
Just reproduced this behavior within the testcase and I'll discuss this shortly but my guess is that the Observing |
Just wanted to bring it up, so that it is known 😄 I also agree that observing |
} else if (ompt_callback_target_data_op_fn) { | ||
// HostOpId is set by the runtime | ||
HostOpId = createOpId(); | ||
// Invoke the tool supplied data op callback | ||
ompt_callback_target_data_op_fn( | ||
TargetData.value, HostOpId, ompt_target_data_transfer_from_device, | ||
TgtPtrBegin, DeviceId, HstPtrBegin, | ||
/*TgtDeviceNum=*/omp_get_initial_device(), Size, Code); | ||
DstPtrBegin, DstDeviceId, SrcPtrBegin, SrcDeviceId, Size, Code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reversed. Shouldn't it be
SrcPtrBegin, SrcDeviceId, DstPtrBegin, DstDeviceId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch! (I think this got reversed twice.)
edit: This also inspired me to add 'HOST' and 'DEVICE' captures to the related non-EMI test of yours.
openmp/libomptarget/src/device.cpp
Outdated
OMPT_IF_BUILT( | ||
InterfaceRAII TargetDataExchangeRAII( | ||
RegionInterface.getCallbacks<ompt_target_data_transfer_from_device>(), | ||
DeviceID, SrcPtr, DstDev.RTLDeviceID, DstPtr, Size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use RTLDeviceID instead of DeviceID in the callback since that's what is used in the actual RTL->data_exchange invocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, might be less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more comments from my side on this one. I'll let @dhruvachak accept once he is happy with it.
int Host = omp_get_initial_device(); | ||
|
||
printf("Allocating Memory on Device\n"); | ||
int *DevPtr = (int *)omp_target_alloc(sizeof(int), Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that DevPtr is not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Will add an assert()
.
*HstPtr = 42; | ||
|
||
printf("Testing: Host to Device\n"); | ||
omp_target_memcpy(DevPtr, HstPtr, sizeof(int), 0, 0, Device, Host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value of all calls to omp_target_memcpy. Otherwise, if it fails, the host value could remain 42 but the program still failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! Let me know if assert
s work for you (if not: let me know what you'd prefer.).
029ed0a
to
abeb1ae
Compare
…Device' Since there's no `ompt_target_data_transfer_tofrom_device` (within ompt_target_data_op_t enum) or something other that conveys the meaning of inter-device data exchange we decided to indicate a Device-to-Device transfer by using: optype == ompt_target_data_transfer_from_device (=3) Hence, a device transfer may be identified e.g. by checking for: (optype == 3) && (src_device_num < omp_get_num_devices()) && (dest_device_num < omp_get_num_devices()) Fixes: llvm#66478
abeb1ae
to
8d7ac0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…Device' (llvm#81991) Since there's no `ompt_target_data_transfer_tofrom_device` (within ompt_target_data_op_t enum) or something other that conveys the meaning of inter-device data exchange we decided to indicate a Device-to-Device transfer by using: optype == ompt_target_data_transfer_from_device (=3) Hence, a device transfer may be identified e.g. by checking for: (optype == 3) && (src_device_num < omp_get_num_devices()) && (dest_device_num < omp_get_num_devices()) Fixes: llvm#66478 Change-Id: I4c382ee61a05102c7ffc6de9b765e072f6386f11
Since there's no
ompt_target_data_transfer_tofrom_device
(within ompt_target_data_op_t enum) or something other that conveys the meaning of inter-device data exchange we decided to indicate a Device-to-Device transfer by using: optype == ompt_target_data_transfer_from_device (=3)Hence, a device transfer may be identified e.g. by checking for: (optype == 3) &&
(src_device_num < omp_get_num_devices()) &&
(dest_device_num < omp_get_num_devices())
Fixes: #66478