Skip to content

Commit 16ae15a

Browse files
authored
[SYCL][Scheduler] Rework of host accessor and host allocation (#724)
* [SYCL][Scheduler] Rework of host accessor and host allocation Background for use_host_ptr property: The SYCL specification defines use_host_ptr property which can be passed into the buffer or image constructors. The SYCL specification says the following: The use_host_ptr property adds the requirement that the SYCL runtime must not allocate any memory for the SYCL buffer and instead uses the provided host pointer directly. This prevents the SYCL runtime from allocating additional temporary storage on the host. Current implementation violates this rule in some scenarios, such as moving the latest data of the buffer from one OCL device context to another and on update_host type of explicit memory operation. Background for host accessor: Current implementation of host accessor requires that the underlying backend supports async/delayed execution: SYCL RT calls Unmap or Write commands that are initially blocked by an user event. Because of that, for instance, host accessors work incorrectly if the latest memory are located on the host device. Also, user events are usually not very well supported, there is "UseExclusiveQueue" workaround because of that. Solution: The patch eliminates the need to have more then one host allocation by introducing concept of linked alloca commands. One device and one host alloca commands can be linked together so they share* the same host memory. Such commands have "active" state which indicates that it's valid to work with corresponding memory allocation. Only one alloca command from a pair can be active at the same time. Memory transition between them is performed using map/unmap commands. * share from the SYCL point of view, because even if SYCL RT asks underlying backends to reuse host pointer they are free to allocate additional memory on their own. In order to resolve the limitations of current implementation of the host accessor the patch updates addHostAccessor API to emit map or read operations along with an empty command which is in blocked state(cannot be enqueued). An empty command is unblocked inside in the new releaseHostAccessor API which is called from host accessor destructor. As bonuses/side effects the patch: 1. Allows having multiple host accessor with read access mode at the same time in all cases. 2. Adds environment variable SYCL_THROW_ON_BLOCK which if set makes SYCL RT throw an exception on attempt to wait for a blocked command. 3. Makes so the latest state of memory object is still on the host after host accessor destructor in all cases, so when multiple host accessors are constructed in a row no map/read operations are performed starting from the second host accessor. Before the patch memory was moved back to the source(for example, to OpenCL memory allocation) so SYCL RT had to make map/read each time. 4. Makes Scheduler considering host and different instances of the host device to be in the same context when it looks for a suitable allocation. Signed-off-by: Vlad Romanov <[email protected]>
1 parent 9c2035a commit 16ae15a

17 files changed

+492
-346
lines changed

sycl/include/CL/sycl/detail/accessor_impl.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ namespace cl {
2020
namespace sycl {
2121
namespace detail {
2222

23+
class Command;
24+
2325
// The class describes a requirement to access a SYCL memory object such as
2426
// sycl::buffer and sycl::image. For example, each accessor used in a kernel,
2527
// except one with access target "local", adds such requirement for the command
@@ -70,10 +72,8 @@ class AccessorImplHost {
7072
MElemSize(ElemSize), MOffsetInBytes(OffsetInBytes),
7173
MIsSubBuffer(IsSubBuffer) {}
7274

73-
~AccessorImplHost() {
74-
if (BlockingEvent)
75-
BlockingEvent->setComplete();
76-
}
75+
~AccessorImplHost();
76+
7777
AccessorImplHost(const AccessorImplHost &Other)
7878
: MOffset(Other.MOffset), MAccessRange(Other.MAccessRange),
7979
MMemoryRange(Other.MMemoryRange), MAccessMode(Other.MAccessMode),
@@ -97,7 +97,7 @@ class AccessorImplHost {
9797

9898
void *MData = nullptr;
9999

100-
EventImplPtr BlockingEvent;
100+
Command *MBlockedCmd = nullptr;
101101
};
102102

103103
using AccessorImplPtr = std::shared_ptr<AccessorImplHost>;

sycl/include/CL/sycl/detail/buffer_impl.hpp

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,17 @@ class buffer_impl final : public SYCLMemObjT<AllocatorT> {
143143
}
144144

145145
void *allocateMem(ContextImplPtr Context, bool InitFromUserData,
146-
RT::PiEvent &OutEventToWait) override {
146+
void *HostPtr, RT::PiEvent &OutEventToWait) override {
147147

148-
void *UserPtr = InitFromUserData ? BaseT::getUserPtr() : nullptr;
148+
assert(!(InitFromUserData && HostPtr) &&
149+
"Cannot init from user data and reuse host ptr provided "
150+
"simultaneously");
151+
152+
void *UserPtr = InitFromUserData ? BaseT::getUserPtr() : HostPtr;
153+
154+
assert(!(nullptr == UserPtr && BaseT::useHostPtr() && Context->is_host()) &&
155+
"Internal error. Allocating memory on the host "
156+
"while having use_host_ptr property");
149157

150158
return MemoryManager::allocateMemBuffer(
151159
std::move(Context), this, UserPtr, BaseT::MHostPtrReadOnly,

sycl/include/CL/sycl/detail/image_impl.hpp

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,8 +280,13 @@ class image_impl final : public SYCLMemObjT<AllocatorT> {
280280
size_t get_count() const { return MRange.size(); }
281281

282282
void *allocateMem(ContextImplPtr Context, bool InitFromUserData,
283-
RT::PiEvent &OutEventToWait) override {
284-
void *UserPtr = InitFromUserData ? BaseT::getUserPtr() : nullptr;
283+
void *HostPtr, RT::PiEvent &OutEventToWait) override {
284+
285+
assert(!(InitFromUserData && HostPtr) &&
286+
"Cannot init from user data and reuse host ptr provided "
287+
"simultaneously");
288+
289+
void *UserPtr = InitFromUserData ? BaseT::getUserPtr() : HostPtr;
285290

286291
RT::PiMemImageDesc Desc = getImageDesc(UserPtr != nullptr);
287292
assert(checkImageDesc(Desc, Context, UserPtr) &&

sycl/include/CL/sycl/detail/memory_manager.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ class MemoryManager {
4242
// The following method allocates memory allocation of memory object.
4343
// Depending on the context it allocates memory on host or on device.
4444
static void *allocate(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
45-
bool InitFromUserData, std::vector<RT::PiEvent> DepEvents,
45+
bool InitFromUserData, void *HostPtr,
46+
std::vector<RT::PiEvent> DepEvents,
4647
RT::PiEvent &OutEvent);
4748

4849
// The following method creates OpenCL sub buffer for specified

sycl/include/CL/sycl/detail/scheduler/commands.hpp

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,8 @@ class Command {
155155
bool MIsBlockable = false;
156156
// Indicates whether the command is blocked from enqueueing
157157
std::atomic<bool> MCanEnqueue;
158+
159+
const char *MBlockReason = "Unknown";
158160
};
159161

160162
// The command does nothing during enqueue. The task can be used to implement
@@ -193,8 +195,10 @@ class ReleaseCommand : public Command {
193195

194196
class AllocaCommandBase : public Command {
195197
public:
196-
AllocaCommandBase(CommandType Type, QueueImplPtr Queue, Requirement Req)
197-
: Command(Type, Queue), MReleaseCmd(Queue, this),
198+
AllocaCommandBase(CommandType Type, QueueImplPtr Queue, Requirement Req,
199+
AllocaCommandBase *LinkedAllocaCmd)
200+
: Command(Type, Queue), MLinkedAllocaCmd(LinkedAllocaCmd),
201+
MIsLeaderAlloca(nullptr == LinkedAllocaCmd), MReleaseCmd(Queue, this),
198202
MRequirement(std::move(Req)) {
199203
MRequirement.MAccessMode = access::mode::read_write;
200204
}
@@ -207,20 +211,35 @@ class AllocaCommandBase : public Command {
207211

208212
const Requirement *getRequirement() const final { return &MRequirement; }
209213

214+
void *MMemAllocation = nullptr;
215+
216+
// Alloca command linked with current command.
217+
// Device and host alloca commands can be linked, so they may share the same
218+
// memory. Only one allocation from a pair can be accessed at a time. Alloca
219+
// commands associated with such allocation is "active". In order to switch
220+
// "active" status between alloca commands map/unmap operations are used.
221+
AllocaCommandBase *MLinkedAllocaCmd = nullptr;
222+
// Indicates that current alloca is active one.
223+
bool MIsActive = true;
224+
225+
// Indicates that the command owns memory allocation in case of connected
226+
// alloca command
227+
bool MIsLeaderAlloca = true;
228+
210229
protected:
211230
ReleaseCommand MReleaseCmd;
212231
Requirement MRequirement;
213-
void *MMemAllocation = nullptr;
214232
};
215233

216234
// The command enqueues allocation of instance of memory object on Host or
217235
// underlying framework.
218236
class AllocaCommand : public AllocaCommandBase {
219237
public:
220238
AllocaCommand(QueueImplPtr Queue, Requirement Req,
221-
bool InitFromUserData = true)
222-
: AllocaCommandBase(CommandType::ALLOCA, std::move(Queue),
223-
std::move(Req)),
239+
bool InitFromUserData = true,
240+
AllocaCommandBase *LinkedAllocaCmd = nullptr)
241+
: AllocaCommandBase(CommandType::ALLOCA, std::move(Queue), std::move(Req),
242+
LinkedAllocaCmd),
224243
MInitFromUserData(InitFromUserData) {
225244
addDep(DepDesc(nullptr, getRequirement(), this));
226245
}
@@ -240,7 +259,8 @@ class AllocaSubBufCommand : public AllocaCommandBase {
240259
AllocaSubBufCommand(QueueImplPtr Queue, Requirement Req,
241260
AllocaCommandBase *ParentAlloca)
242261
: AllocaCommandBase(CommandType::ALLOCA_SUB_BUF, std::move(Queue),
243-
std::move(Req)),
262+
std::move(Req),
263+
/*LinkedAllocaCmd*/ nullptr),
244264
MParentAlloca(ParentAlloca) {
245265
addDep(DepDesc(MParentAlloca, getRequirement(), MParentAlloca));
246266
}
@@ -251,7 +271,7 @@ class AllocaSubBufCommand : public AllocaCommandBase {
251271
private:
252272
cl_int enqueueImp() final;
253273

254-
AllocaCommandBase *MParentAlloca;
274+
AllocaCommandBase *MParentAlloca = nullptr;
255275
};
256276

257277
class MapMemObject : public Command {
@@ -295,10 +315,6 @@ class MemCpyCommand : public Command {
295315
QueueImplPtr SrcQueue, QueueImplPtr DstQueue,
296316
bool UseExclusiveQueue = false);
297317

298-
void setAccessorToUpdate(Requirement *AccToUpdate) {
299-
MAccToUpdate = AccToUpdate;
300-
}
301-
302318
void printDot(std::ostream &Stream) const final;
303319
const Requirement *getRequirement() const final { return &MDstReq; }
304320

@@ -310,7 +326,6 @@ class MemCpyCommand : public Command {
310326
AllocaCommandBase *MSrcAllocaCmd = nullptr;
311327
Requirement MDstReq;
312328
AllocaCommandBase *MDstAllocaCmd = nullptr;
313-
Requirement *MAccToUpdate = nullptr;
314329
};
315330

316331
// The command enqueues memory copy between two instances of memory object.

sycl/include/CL/sycl/detail/scheduler/scheduler.hpp

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,17 @@ class Scheduler {
7171
// sycl::image destructors.
7272
void removeMemoryObject(detail::SYCLMemObjI *MemObj);
7373

74+
// Creates nodes in the graph, that update Req with the pointer to the host
75+
// memory which contains the latest data of the memory object. New operations
76+
// with the same memory object that have side effects are blocked until
77+
// releaseHostAccessor is called.
78+
// Returns an event which indicates when these nodes are completed and host
79+
// accessor is ready for using.
7480
EventImplPtr addHostAccessor(Requirement *Req);
7581

82+
// Unblocks operations with the memory object.
83+
void releaseHostAccessor(Requirement *Req);
84+
7685
// Returns an instance of the scheduler object.
7786
static Scheduler &getInstance();
7887

@@ -101,7 +110,7 @@ class Scheduler {
101110
QueueImplPtr HostQueue);
102111

103112
Command *addCopyBack(Requirement *Req);
104-
Command *addHostAccessor(Requirement *Req, EventImplPtr &RetEvent);
113+
Command *addHostAccessor(Requirement *Req);
105114

106115
// [Provisional] Optimizes the whole graph.
107116
void optimize();
@@ -142,18 +151,20 @@ class Scheduler {
142151
std::vector<SYCLMemObjI *> MMemObjs;
143152

144153
private:
145-
// The method inserts memory copy operation from the context where the
146-
// memory current lives to the context bound to Queue.
147-
MemCpyCommand *insertMemCpyCmd(MemObjRecord *Record, Requirement *Req,
148-
const QueueImplPtr &Queue,
149-
bool UseExclusiveQueue = false);
154+
// The method inserts required command to make so the latest state for the
155+
// memory object Record refers to resides in the context which is bound to
156+
// the Queue. Can insert copy/map/unmap operations depending on the source
157+
// and destination.
158+
Command *insertMemoryMove(MemObjRecord *Record, Requirement *Req,
159+
const QueueImplPtr &Queue,
160+
bool UseExclusiveQueue = false);
150161

151162
UpdateHostRequirementCommand *
152163
insertUpdateHostReqCmd(MemObjRecord *Record, Requirement *Req,
153164
const QueueImplPtr &Queue);
154165

155166
std::set<Command *> findDepsForReq(MemObjRecord *Record, Requirement *Req,
156-
QueueImplPtr Context);
167+
const ContextImplPtr &Context);
157168

158169
// Searches for suitable alloca in memory record.
159170
AllocaCommandBase *findAllocaForReq(MemObjRecord *Record, Requirement *Req,

sycl/include/CL/sycl/detail/sycl_mem_obj_i.hpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,12 @@ class SYCLMemObjI {
3838
// point to event that should be waited before using the memory.
3939
// InitFromUserData indicates that the returned memory should be intialized
4040
// with the data provided by user(if any). Usually it should happen on the
41-
// first allocation of memory for the buffer.
41+
// first allocation of memory for the memory object.
42+
// Non null HostPtr requires allocation to be made with USE_HOST_PTR property.
4243
// Method returns a pointer to host allocation if Context is host one and
4344
// cl_mem obect if not.
4445
virtual void *allocateMem(ContextImplPtr Context, bool InitFromUserData,
45-
RT::PiEvent &InteropEvent) = 0;
46+
void *HostPtr, RT::PiEvent &InteropEvent) = 0;
4647

4748
// Should be used for memory object created without use_host_ptr property.
4849
virtual void *allocateHostMem() = 0;

sycl/source/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ endfunction(add_sycl_rt_library)
3434

3535
set(SYCL_SOURCES
3636
"${sycl_inc_dir}/CL/sycl.hpp"
37+
"detail/accessor_impl.cpp"
3738
"detail/builtins_common.cpp"
3839
"detail/builtins_geometric.cpp"
3940
"detail/builtins_integer.cpp"

sycl/source/detail/accessor_impl.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
//==---------------- accessor_impl.cpp - SYCL standard source file ---------==//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
#include <CL/sycl/detail/accessor_impl.hpp>
10+
#include <CL/sycl/detail/scheduler/scheduler.hpp>
11+
12+
namespace cl {
13+
namespace sycl {
14+
namespace detail {
15+
16+
AccessorImplHost::~AccessorImplHost() {
17+
if (MBlockedCmd)
18+
detail::Scheduler::getInstance().releaseHostAccessor(this);
19+
}
20+
}
21+
}
22+
}
23+

sycl/source/detail/event_impl.cpp

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,6 @@ event_impl::~event_impl() {
3636
}
3737
}
3838

39-
void event_impl::setComplete() {
40-
PI_CALL(RT::piEventSetStatus, m_Event, CL_COMPLETE);
41-
}
42-
4339
void event_impl::waitInternal() const {
4440
if (!m_HostEvent) {
4541
PI_CALL(RT::piEventsWait, 1, &m_Event);

sycl/source/detail/memory_manager.cpp

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,16 @@ void MemoryManager::releaseMemObj(ContextImplPtr TargetContext,
5454
}
5555

5656
void *MemoryManager::allocate(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
57-
bool InitFromUserData,
57+
bool InitFromUserData, void *HostPtr,
5858
std::vector<RT::PiEvent> DepEvents,
5959
RT::PiEvent &OutEvent) {
6060
// There is no async API for memory allocation. Explicitly wait for all
6161
// dependency events and return empty event.
6262
waitForEvents(DepEvents);
6363
OutEvent = nullptr;
6464

65-
return MemObj->allocateMem(TargetContext, InitFromUserData, OutEvent);
65+
return MemObj->allocateMem(TargetContext, InitFromUserData, HostPtr,
66+
OutEvent);
6667
}
6768

6869
void *MemoryManager::allocateHostMemory(SYCLMemObjI *MemObj, void *UserPtr,
@@ -420,7 +421,7 @@ void *MemoryManager::map(SYCLMemObjI *SYCLMemObj, void *Mem, QueueImplPtr Queue,
420421
sycl::id<3> AccessOffset, unsigned int ElementSize,
421422
std::vector<RT::PiEvent> DepEvents,
422423
RT::PiEvent &OutEvent) {
423-
if (Queue->is_host() || Dim != 1) {
424+
if (Queue->is_host()) {
424425
assert(!"Not supported configuration of map requested");
425426
throw runtime_error("Not supported configuration of map requested");
426427
}
@@ -447,10 +448,15 @@ void *MemoryManager::map(SYCLMemObjI *SYCLMemObj, void *Mem, QueueImplPtr Queue,
447448
AccessOffset[0] *= ElementSize;
448449
AccessRange[0] *= ElementSize;
449450

450-
void *MappedPtr;
451+
// TODO: Handle offset
452+
assert(AccessOffset[0] == 0 && "Handle offset");
453+
454+
void *MappedPtr = nullptr;
455+
const size_t BytesToMap = AccessRange[0] * AccessRange[1] * AccessRange[2];
456+
451457
PI_CALL(RT::piEnqueueMemBufferMap, Queue->getHandleRef(),
452458
pi::cast<RT::PiMem>(Mem), CL_FALSE, Flags, AccessOffset[0],
453-
AccessRange[0], DepEvents.size(),
459+
BytesToMap, DepEvents.size(),
454460
DepEvents.empty() ? nullptr : &DepEvents[0], &OutEvent, &MappedPtr);
455461
return MappedPtr;
456462
}

0 commit comments

Comments
 (0)