Skip to content

[SYCL][L0] Report root- vs sub- device free memory differently #7789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 42 additions & 22 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -790,14 +790,24 @@ pi_result _pi_device::initialize(int SubSubDeviceOrdinal,

ZeDeviceMemoryProperties.Compute =
[ZeDevice](
std::vector<ZeStruct<ze_device_memory_properties_t>> &Properties) {
std::pair<std::vector<ZeStruct<ze_device_memory_properties_t>>,
std::vector<ZeStruct<ze_device_memory_ext_properties_t>>>
&Properties) {
uint32_t Count = 0;
ZE_CALL_NOCHECK(zeDeviceGetMemoryProperties,
(ZeDevice, &Count, nullptr));

Properties.resize(Count);
auto &PropertiesVector = Properties.first;
auto &PropertiesExtVector = Properties.second;

PropertiesVector.resize(Count);
PropertiesExtVector.resize(Count);
// Request for extended memory properties be read in
for (uint32_t I = 0; I < Count; ++I)
PropertiesVector[I].pNext = (void *)&PropertiesExtVector[I];

ZE_CALL_NOCHECK(zeDeviceGetMemoryProperties,
(ZeDevice, &Count, Properties.data()));
(ZeDevice, &Count, PropertiesVector.data()));
};

ZeDeviceMemoryAccessProperties.Compute =
Expand Down Expand Up @@ -2814,9 +2824,9 @@ pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
return ReturnValue(pi_uint64{Device->ZeDeviceProperties->maxMemAllocSize});
case PI_DEVICE_INFO_GLOBAL_MEM_SIZE: {
uint64_t GlobalMemSize = 0;
for (uint32_t I = 0; I < Device->ZeDeviceMemoryProperties->size(); I++) {
GlobalMemSize +=
(*Device->ZeDeviceMemoryProperties.operator->())[I].totalSize;
for (const auto &ZeDeviceMemoryExtProperty :
Device->ZeDeviceMemoryProperties->second) {
GlobalMemSize += ZeDeviceMemoryExtProperty.physicalSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel : I guess we are adding here because ZeDevice from line 810 is a a subdevice, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Device/ZeDevice may be either sub-device or root- device here.
We are adding because that's what we were doing before and I only change where we take the "total" size from.
So, should we be adding or not?

I assume today there is only one memory, so it's probably not important at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

physicalSize reported by Core properties for a root device already is the aggregation of the sub-devices. could you confirm whether you are getting the correct numbers here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @smaslov-intel : code is good.

}
return ReturnValue(pi_uint64{GlobalMemSize});
}
Expand Down Expand Up @@ -3170,21 +3180,32 @@ pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
// Only report device memory which zeMemAllocDevice can allocate from.
// Currently this is only the one enumerated with ordinal 0.
uint64_t FreeMemory = 0;
uint32_t MemCount = 1;
zes_mem_handle_t ZesMemHandle;
ZE_CALL(zesDeviceEnumMemoryModules, (ZeDevice, &MemCount, &ZesMemHandle));
uint32_t MemCount = 0;
ZE_CALL(zesDeviceEnumMemoryModules, (ZeDevice, &MemCount, nullptr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ZeDevice is sub-device then isn't zesDeviceEnumMemoryModules supposed to return memory modules/handes only for that subdevice? I am trying to understand why we need this if:
https://github.com/intel/llvm/pull/7789/files#diff-15dd1eb076d2164bd9e87d9057f05f652a716498e8cdf5975e564c65309a0985R3195-R3196

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you are right, so maybe check for subdeviceID is redundant in the loop (but not wrong).
@jandres742 : would you comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. For a sub-device, we have only one memory module currently, but logic below is scalable if that changes in future.

Line 3189 traverses over the memory modules in the ZexMemHandles vector (it would be N for a parent device with N sub-devices, and 1 for a sub-device that is not further partitioned by L0).

Line 3200 is executed several times only for a root device. For a sub-device, it is executed only if the memory module has the same sub-device id, and currently that would be only once.

if (MemCount != 0) {
ZesStruct<zes_mem_properties_t> ZeMemProperties;
ZE_CALL(zesMemoryGetProperties, (ZesMemHandle, &ZeMemProperties));
ZesStruct<zes_mem_state_t> ZeMemState;
ZE_CALL(zesMemoryGetState, (ZesMemHandle, &ZeMemState));
FreeMemory += ZeMemState.free;
std::vector<zes_mem_handle_t> ZesMemHandles(MemCount);
ZE_CALL(zesDeviceEnumMemoryModules,
(ZeDevice, &MemCount, ZesMemHandles.data()));
for (auto &ZesMemHandle : ZesMemHandles) {
ZesStruct<zes_mem_properties_t> ZesMemProperties;
ZE_CALL(zesMemoryGetProperties, (ZesMemHandle, &ZesMemProperties));
// For root-device report memory from all memory modules since that
// is what totally available in the default implicit scaling mode.
// For sub-devices only report memory local to them.
if (!Device->isSubDevice() || Device->ZeDeviceProperties->subdeviceId ==
ZesMemProperties.subdeviceId) {

ZesStruct<zes_mem_state_t> ZesMemState;
ZE_CALL(zesMemoryGetState, (ZesMemHandle, &ZesMemState));
FreeMemory += ZesMemState.free;
}
}
}
return ReturnValue(FreeMemory);
}
case PI_EXT_INTEL_DEVICE_INFO_MEMORY_CLOCK_RATE: {
// If there are not any memory modules then return 0.
if (Device->ZeDeviceMemoryProperties->empty())
if (Device->ZeDeviceMemoryProperties->first.empty())
return ReturnValue(pi_uint32{0});

// If there are multiple memory modules on the device then we have to report
Expand All @@ -3194,13 +3215,13 @@ pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
return A.maxClockRate < B.maxClockRate;
};
auto MinIt =
std::min_element(Device->ZeDeviceMemoryProperties->begin(),
Device->ZeDeviceMemoryProperties->end(), Comp);
std::min_element(Device->ZeDeviceMemoryProperties->first.begin(),
Device->ZeDeviceMemoryProperties->first.end(), Comp);
return ReturnValue(pi_uint32{MinIt->maxClockRate});
}
case PI_EXT_INTEL_DEVICE_INFO_MEMORY_BUS_WIDTH: {
// If there are not any memory modules then return 0.
if (Device->ZeDeviceMemoryProperties->empty())
if (Device->ZeDeviceMemoryProperties->first.empty())
return ReturnValue(pi_uint32{0});

// If there are multiple memory modules on the device then we have to report
Expand All @@ -3210,8 +3231,8 @@ pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
return A.maxBusWidth < B.maxBusWidth;
};
auto MinIt =
std::min_element(Device->ZeDeviceMemoryProperties->begin(),
Device->ZeDeviceMemoryProperties->end(), Comp);
std::min_element(Device->ZeDeviceMemoryProperties->first.begin(),
Device->ZeDeviceMemoryProperties->first.end(), Comp);
return ReturnValue(pi_uint32{MinIt->maxBusWidth});
}
case PI_DEVICE_INFO_GPU_EU_COUNT: {
Expand Down Expand Up @@ -8003,8 +8024,7 @@ pi_result USMSharedReadOnlyMemoryAlloc::allocateImpl(void **ResultPtr,
pi_uint32 Alignment) {
pi_usm_mem_properties Props[] = {PI_MEM_ALLOC_FLAGS,
PI_MEM_ALLOC_DEVICE_READ_ONLY, 0};
return USMSharedAllocImpl(ResultPtr, Context, Device, Props, Size,
Alignment);
return USMSharedAllocImpl(ResultPtr, Context, Device, Props, Size, Alignment);
}

pi_result USMDeviceMemoryAlloc::allocateImpl(void **ResultPtr, size_t Size,
Expand Down
4 changes: 3 additions & 1 deletion sycl/plugins/level_zero/pi_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
#include <shared_mutex>
#include <string>
#include <sycl/detail/pi.h>
#include <tuple>
#include <unordered_map>
#include <unordered_set>
#include <vector>
Expand Down Expand Up @@ -384,7 +385,8 @@ struct _pi_device : _pi_object {
ZeCache<ZeStruct<ze_device_compute_properties_t>> ZeDeviceComputeProperties;
ZeCache<ZeStruct<ze_device_image_properties_t>> ZeDeviceImageProperties;
ZeCache<ZeStruct<ze_device_module_properties_t>> ZeDeviceModuleProperties;
ZeCache<std::vector<ZeStruct<ze_device_memory_properties_t>>>
ZeCache<std::pair<std::vector<ZeStruct<ze_device_memory_properties_t>>,
std::vector<ZeStruct<ze_device_memory_ext_properties_t>>>>
ZeDeviceMemoryProperties;
ZeCache<ZeStruct<ze_device_memory_access_properties_t>>
ZeDeviceMemoryAccessProperties;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ ze_structure_type_t getZeStructureType<ze_device_memory_properties_t>() {
return ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;
}
template <>
ze_structure_type_t getZeStructureType<ze_device_memory_ext_properties_t>() {
return ZE_STRUCTURE_TYPE_DEVICE_MEMORY_EXT_PROPERTIES;
}
template <>
ze_structure_type_t getZeStructureType<ze_device_memory_access_properties_t>() {
return ZE_STRUCTURE_TYPE_DEVICE_MEMORY_ACCESS_PROPERTIES;
}
Expand Down