Skip to content

Commit 1c84c13

Browse files
sherry-yuanpcolberg
authored andcommitted
Initialize max number of global memory definition for simulator
Simulator does not have any global memory interface information until the actuall aocx is loaded. (Note this is only a problem for simulator not hardware run, in hardware run, we can communicate with BSP to query memory interface information) Prior to loading aocx it uses predefined autodiscovery [1] to initialize its global memory interface, which has only 1 global memory In the sycl runtime flow today, the USM device allocation call happens before aocx is loaded. The aocx is loaded when clCreateProgram is called, which typically happen on first kernel launch in sycl runtime. The USM device allocation on mutli global memory system will fail because there are in total 1 global memory as defined in [1] but the user is requesting more than 1 device global memory. User could go around this issue by launching a sacrificial kernel that uses shared allocation as kernel argument. This will setup the correct global memory interface in runtime. This change eliminate the need to run a sacrificial kernel. However there are a few downside: 1. The address range/size may not be exactly the same as the one that is in aocx, but this is not too large of a problem because runtime first fit allocation algorithm will fill the lowest address range first. Unless user requested more than what is availble. 2. it potentially occupied more space than required 3. will not error out when user requested a non-existing device global memory because we are using ACL_MAX_GLOBAL_MEM for num_global_mem_systems [1] https://github.com/intel/fpga-runtime-for-opencl/blob/950f21dd079dfd55a473ba4122a4a9dca450e36f/include/acl_shipped_board_cfgs.h#L7
1 parent dfdd0d8 commit 1c84c13

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

src/acl_kernel_if.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -715,6 +715,29 @@ int acl_kernel_if_init(acl_kernel_if *kern, acl_bsp_io bsp_io,
715715
auto parse_result = acl_load_device_def_from_str(
716716
std::string(acl_shipped_board_cfgs[1]),
717717
sysdef->device[0].autodiscovery_def, err_msg);
718+
// Fill in definition for all device global memory
719+
// Simulator does not have any global memory interface information until the
720+
// actual aocx is loaded. (Note this is only a problem for simulator not
721+
// hardware run, in hardware run, we can communicate with BSP to query
722+
// memory interface information). In the flow today, the USM device
723+
// allocation call happens before aocx is loaded. The aocx is loaded when
724+
// clCreateProgram is called, which typically happen on first kernel launch
725+
// in sycl runtime. In order to prevent the USM device allocation from
726+
// failing on mutli global memory system, initialize as much global memory
727+
// system as possible for simulation flow. However there are a few downside:
728+
// 1. The address range/size may not be exactly the same as the one that is
729+
// in aocx, but this is not too large of a problem because runtime first fit
730+
// allocation algorithm will fill the lowest address range first. Unless
731+
// user requested more than what is availble.
732+
// 2. it potentially occupied more space than required
733+
// 3. will not error out when user requested a non-existing device global
734+
// memory because we are using ACL_MAX_GLOBAL_MEM for num_global_mem_systems
735+
sysdef->device[0].autodiscovery_def.num_global_mem_systems =
736+
ACL_MAX_GLOBAL_MEM;
737+
for (int i = 0; i < ACL_MAX_GLOBAL_MEM; i++) {
738+
sysdef->device[0].autodiscovery_def.global_mem_defs[i] =
739+
sysdef->device[0].autodiscovery_def.global_mem_defs[0];
740+
}
718741
if (parse_result)
719742
sysdef->num_devices = 1;
720743
// Override the device name to the simulator.

0 commit comments

Comments
 (0)