@@ -1393,64 +1393,16 @@ submitted in its entirety for execution via
1393
1393
1394
1394
== Future Direction [[future-direction]]
1395
1395
1396
- === Memory Allocation Nodes
1396
+ This section contains both features of the specification which have been
1397
+ fully developed, but are not yet implemented, as well as features which are
1398
+ still in development.
1397
1399
1398
- There is no provided interface for users to define a USM allocation/free
1399
- operation belonging to the scope of the graph. It would be error prone and
1400
- non-performant to allocate or free memory as a node executed during graph
1401
- submission. Instead, such a memory allocation API needs to provide a way to
1402
- return a pointer which won't be valid until the allocation is made on graph
1403
- finalization, as allocating at finalization is the only way to benefit from
1404
- the known graph scope for optimal memory allocation, and even optimize to
1405
- eliminate some allocations entirely.
1406
-
1407
- Such a deferred allocation strategy presents challenges however, and as a result
1408
- we recommend instead that prior to graph construction users perform core SYCL
1409
- USM allocations to be used in the graph submission. Before to coming to this
1410
- recommendation we considered the following explicit graph building interfaces
1411
- for adding a memory allocation owned by the graph:
1400
+ Fully developed features will be moved to the main specification once they
1401
+ have been implemented.
1412
1402
1413
- 1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1414
- which will be instantiated on graph finalization with the location of the
1415
- allocated USM memory.
1416
-
1417
- 2. Allocation function returning a handle to the allocation. Applications use
1418
- the handle in node command-group functions to access memory when allocated.
1419
-
1420
- 3. Allocation function returning a pointer to a virtual allocation, only backed
1421
- with an actual allocation when graph is finalized or submitted.
1403
+ === Features Awaiting Implementation
1422
1404
1423
- Design 1) has the drawback of forcing users to keep the user pointer variable
1424
- alive so that the reference is valid, which is unintuitive and is likely to
1425
- result in bugs.
1426
-
1427
- Design 2) introduces a handle object which has the advantages of being a less
1428
- error prone way to provide the pointer to the deferred allocation. However, it
1429
- requires kernel changes and introduces an overhead above the raw pointers that
1430
- are the advantage of USM.
1431
-
1432
- Design 3) needs specific backend support for deferred allocation.
1433
-
1434
- === Device Specific Graph
1435
-
1436
- A modifiable state `command_graph` contains nodes targeting specific devices,
1437
- rather than being a device agnostic representation only tied to devices on
1438
- finalization. This allows the implementation to process nodes which require
1439
- device information when the command group function is evaluated. For example,
1440
- a SYCL reduction implementation may desire the work-group/sub-group size, which
1441
- is normally gathered by the runtime from the device associated with the queue.
1442
-
1443
- This design also enables the future capability for a user to compose a graph
1444
- with nodes targeting different devices, allowing the benefits of defining an
1445
- execution graph ahead of submission to be extended to multi-device platforms.
1446
- Without this capability a user currently has to submit individual single-device
1447
- graphs and use events for dependencies, which is a usage model this extension is
1448
- aiming to optimize. Automatic load balancing of commands across devices is not a
1449
- problem this extension currently aims to solve, it is the responsibility of the
1450
- user to decide the device each command will be processed for, not the SYCL
1451
- runtime.
1452
-
1453
- === Storage Lifetimes [[storage-lifetimes]]
1405
+ ==== Storage Lifetimes [[storage-lifetimes]]
1454
1406
1455
1407
The lifetime of any buffer recorded as part of a submission
1456
1408
to a command graph will be extended in keeping with the common reference
@@ -1512,7 +1464,7 @@ associated with a buffer that was created using a host data pointer will
1512
1464
outlive any executable graphs created from a modifiable graph which uses
1513
1465
that buffer.
1514
1466
1515
- === Host Tasks [[future-host-tasks]]
1467
+ ==== Host Tasks [[future-host-tasks]]
1516
1468
1517
1469
A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
1518
1470
dependency rules. It is valid to record a host task as part of graph, though it
@@ -1533,8 +1485,6 @@ auto node = graph.add([&](sycl::handler& cgh){
1533
1485
});
1534
1486
----
1535
1487
1536
- === Graph Update
1537
-
1538
1488
==== Executable Graph Update
1539
1489
1540
1490
A graph in the executable state can have each nodes inputs & outputs updated
@@ -1612,6 +1562,65 @@ Exceptions:
1612
1562
1613
1563
|===
1614
1564
1565
+ === Features Still in Development
1566
+
1567
+ ==== Memory Allocation Nodes
1568
+
1569
+ There is no provided interface for users to define a USM allocation/free
1570
+ operation belonging to the scope of the graph. It would be error prone and
1571
+ non-performant to allocate or free memory as a node executed during graph
1572
+ submission. Instead, such a memory allocation API needs to provide a way to
1573
+ return a pointer which won't be valid until the allocation is made on graph
1574
+ finalization, as allocating at finalization is the only way to benefit from
1575
+ the known graph scope for optimal memory allocation, and even optimize to
1576
+ eliminate some allocations entirely.
1577
+
1578
+ Such a deferred allocation strategy presents challenges however, and as a result
1579
+ we recommend instead that prior to graph construction users perform core SYCL
1580
+ USM allocations to be used in the graph submission. Before to coming to this
1581
+ recommendation we considered the following explicit graph building interfaces
1582
+ for adding a memory allocation owned by the graph:
1583
+
1584
+ 1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1585
+ which will be instantiated on graph finalization with the location of the
1586
+ allocated USM memory.
1587
+
1588
+ 2. Allocation function returning a handle to the allocation. Applications use
1589
+ the handle in node command-group functions to access memory when allocated.
1590
+
1591
+ 3. Allocation function returning a pointer to a virtual allocation, only backed
1592
+ with an actual allocation when graph is finalized or submitted.
1593
+
1594
+ Design 1) has the drawback of forcing users to keep the user pointer variable
1595
+ alive so that the reference is valid, which is unintuitive and is likely to
1596
+ result in bugs.
1597
+
1598
+ Design 2) introduces a handle object which has the advantages of being a less
1599
+ error prone way to provide the pointer to the deferred allocation. However, it
1600
+ requires kernel changes and introduces an overhead above the raw pointers that
1601
+ are the advantage of USM.
1602
+
1603
+ Design 3) needs specific backend support for deferred allocation.
1604
+
1605
+ ==== Device Specific Graph
1606
+
1607
+ A modifiable state `command_graph` contains nodes targeting specific devices,
1608
+ rather than being a device agnostic representation only tied to devices on
1609
+ finalization. This allows the implementation to process nodes which require
1610
+ device information when the command group function is evaluated. For example,
1611
+ a SYCL reduction implementation may desire the work-group/sub-group size, which
1612
+ is normally gathered by the runtime from the device associated with the queue.
1613
+
1614
+ This design also enables the future capability for a user to compose a graph
1615
+ with nodes targeting different devices, allowing the benefits of defining an
1616
+ execution graph ahead of submission to be extended to multi-device platforms.
1617
+ Without this capability a user currently has to submit individual single-device
1618
+ graphs and use events for dependencies, which is a usage model this extension is
1619
+ aiming to optimize. Automatic load balancing of commands across devices is not a
1620
+ problem this extension currently aims to solve, it is the responsibility of the
1621
+ user to decide the device each command will be processed for, not the SYCL
1622
+ runtime.
1623
+
1615
1624
== Issues
1616
1625
1617
1626
=== Simultaneous Graph Submission
0 commit comments