Skip to content

Commit 65f522c

Browse files
authored
[SYCL][Graph] Provide context for future direction section (#285)
- Move unimplemented features to top of section - Add some text explaining the contents of the section - Separate fully and partially developed features into subsections for readability
1 parent 748c8e2 commit 65f522c

File tree

1 file changed

+67
-58
lines changed

1 file changed

+67
-58
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Lines changed: 67 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1393,64 +1393,16 @@ submitted in its entirety for execution via
13931393

13941394
== Future Direction [[future-direction]]
13951395

1396-
=== Memory Allocation Nodes
1396+
This section contains both features of the specification which have been
1397+
fully developed, but are not yet implemented, as well as features which are
1398+
still in development.
13971399

1398-
There is no provided interface for users to define a USM allocation/free
1399-
operation belonging to the scope of the graph. It would be error prone and
1400-
non-performant to allocate or free memory as a node executed during graph
1401-
submission. Instead, such a memory allocation API needs to provide a way to
1402-
return a pointer which won't be valid until the allocation is made on graph
1403-
finalization, as allocating at finalization is the only way to benefit from
1404-
the known graph scope for optimal memory allocation, and even optimize to
1405-
eliminate some allocations entirely.
1406-
1407-
Such a deferred allocation strategy presents challenges however, and as a result
1408-
we recommend instead that prior to graph construction users perform core SYCL
1409-
USM allocations to be used in the graph submission. Before to coming to this
1410-
recommendation we considered the following explicit graph building interfaces
1411-
for adding a memory allocation owned by the graph:
1400+
Fully developed features will be moved to the main specification once they
1401+
have been implemented.
14121402

1413-
1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1414-
which will be instantiated on graph finalization with the location of the
1415-
allocated USM memory.
1416-
1417-
2. Allocation function returning a handle to the allocation. Applications use
1418-
the handle in node command-group functions to access memory when allocated.
1419-
1420-
3. Allocation function returning a pointer to a virtual allocation, only backed
1421-
with an actual allocation when graph is finalized or submitted.
1403+
=== Features Awaiting Implementation
14221404

1423-
Design 1) has the drawback of forcing users to keep the user pointer variable
1424-
alive so that the reference is valid, which is unintuitive and is likely to
1425-
result in bugs.
1426-
1427-
Design 2) introduces a handle object which has the advantages of being a less
1428-
error prone way to provide the pointer to the deferred allocation. However, it
1429-
requires kernel changes and introduces an overhead above the raw pointers that
1430-
are the advantage of USM.
1431-
1432-
Design 3) needs specific backend support for deferred allocation.
1433-
1434-
=== Device Specific Graph
1435-
1436-
A modifiable state `command_graph` contains nodes targeting specific devices,
1437-
rather than being a device agnostic representation only tied to devices on
1438-
finalization. This allows the implementation to process nodes which require
1439-
device information when the command group function is evaluated. For example,
1440-
a SYCL reduction implementation may desire the work-group/sub-group size, which
1441-
is normally gathered by the runtime from the device associated with the queue.
1442-
1443-
This design also enables the future capability for a user to compose a graph
1444-
with nodes targeting different devices, allowing the benefits of defining an
1445-
execution graph ahead of submission to be extended to multi-device platforms.
1446-
Without this capability a user currently has to submit individual single-device
1447-
graphs and use events for dependencies, which is a usage model this extension is
1448-
aiming to optimize. Automatic load balancing of commands across devices is not a
1449-
problem this extension currently aims to solve, it is the responsibility of the
1450-
user to decide the device each command will be processed for, not the SYCL
1451-
runtime.
1452-
1453-
=== Storage Lifetimes [[storage-lifetimes]]
1405+
==== Storage Lifetimes [[storage-lifetimes]]
14541406

14551407
The lifetime of any buffer recorded as part of a submission
14561408
to a command graph will be extended in keeping with the common reference
@@ -1512,7 +1464,7 @@ associated with a buffer that was created using a host data pointer will
15121464
outlive any executable graphs created from a modifiable graph which uses
15131465
that buffer.
15141466

1515-
=== Host Tasks [[future-host-tasks]]
1467+
==== Host Tasks [[future-host-tasks]]
15161468

15171469
A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
15181470
dependency rules. It is valid to record a host task as part of graph, though it
@@ -1533,8 +1485,6 @@ auto node = graph.add([&](sycl::handler& cgh){
15331485
});
15341486
----
15351487

1536-
=== Graph Update
1537-
15381488
==== Executable Graph Update
15391489

15401490
A graph in the executable state can have each nodes inputs & outputs updated
@@ -1612,6 +1562,65 @@ Exceptions:
16121562

16131563
|===
16141564

1565+
=== Features Still in Development
1566+
1567+
==== Memory Allocation Nodes
1568+
1569+
There is no provided interface for users to define a USM allocation/free
1570+
operation belonging to the scope of the graph. It would be error prone and
1571+
non-performant to allocate or free memory as a node executed during graph
1572+
submission. Instead, such a memory allocation API needs to provide a way to
1573+
return a pointer which won't be valid until the allocation is made on graph
1574+
finalization, as allocating at finalization is the only way to benefit from
1575+
the known graph scope for optimal memory allocation, and even optimize to
1576+
eliminate some allocations entirely.
1577+
1578+
Such a deferred allocation strategy presents challenges however, and as a result
1579+
we recommend instead that prior to graph construction users perform core SYCL
1580+
USM allocations to be used in the graph submission. Before to coming to this
1581+
recommendation we considered the following explicit graph building interfaces
1582+
for adding a memory allocation owned by the graph:
1583+
1584+
1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1585+
which will be instantiated on graph finalization with the location of the
1586+
allocated USM memory.
1587+
1588+
2. Allocation function returning a handle to the allocation. Applications use
1589+
the handle in node command-group functions to access memory when allocated.
1590+
1591+
3. Allocation function returning a pointer to a virtual allocation, only backed
1592+
with an actual allocation when graph is finalized or submitted.
1593+
1594+
Design 1) has the drawback of forcing users to keep the user pointer variable
1595+
alive so that the reference is valid, which is unintuitive and is likely to
1596+
result in bugs.
1597+
1598+
Design 2) introduces a handle object which has the advantages of being a less
1599+
error prone way to provide the pointer to the deferred allocation. However, it
1600+
requires kernel changes and introduces an overhead above the raw pointers that
1601+
are the advantage of USM.
1602+
1603+
Design 3) needs specific backend support for deferred allocation.
1604+
1605+
==== Device Specific Graph
1606+
1607+
A modifiable state `command_graph` contains nodes targeting specific devices,
1608+
rather than being a device agnostic representation only tied to devices on
1609+
finalization. This allows the implementation to process nodes which require
1610+
device information when the command group function is evaluated. For example,
1611+
a SYCL reduction implementation may desire the work-group/sub-group size, which
1612+
is normally gathered by the runtime from the device associated with the queue.
1613+
1614+
This design also enables the future capability for a user to compose a graph
1615+
with nodes targeting different devices, allowing the benefits of defining an
1616+
execution graph ahead of submission to be extended to multi-device platforms.
1617+
Without this capability a user currently has to submit individual single-device
1618+
graphs and use events for dependencies, which is a usage model this extension is
1619+
aiming to optimize. Automatic load balancing of commands across devices is not a
1620+
problem this extension currently aims to solve, it is the responsibility of the
1621+
user to decide the device each command will be processed for, not the SYCL
1622+
runtime.
1623+
16151624
== Issues
16161625

16171626
=== Simultaneous Graph Submission

0 commit comments

Comments
 (0)