[SYCL][DOC] Queue Shortcuts (#1051)

jbrodman · bader · commit 478b7c004787 · 2020-01-28T13:54:56.000+03:00
Add queue simplification functions.

Signed-off-by: James Brodman &lt;james.brodman@intel.com&gt;
diff --git a/sycl/doc/extensions/QueueShortcuts/QueueShortcuts.adoc b/sycl/doc/extensions/QueueShortcuts/QueueShortcuts.adoc
@@ -0,0 +1,21 @@
+= SYCL Proposals: Queue Shortcuts
+James Brodman <james.brodman@intel.com>
+v0.1
+:source-highlighter: pygments
+:icons: font
+== Introduction
+This document presents an addition proposed for a future version of the SYCL Specification.  The goal of this proposal is to reduce the complexity and verbosity of using SYCL for programmers.
+
+== Queue Simplifications
+Tasks are submitted to queues for execution in SYCL.  This is normally done by invoking the `submit` method of a `queue` and passing a lambda that specifies the operation to perform and its dependences.  A task's dependences have traditionally been specified through the creation of `accessor` objects that tell the SYCL runtime how data in a `buffer` or `image` is used.  However, new proposals for data management in SYCL, such as Unified Shared Memory, provide alternatives to the buffer and accessor model.  The USM proposal specifies dependences between kernels using `event` objects.  The Queue Properties proposal specifies how to create a `queue` that has in-order semantics where each operation is performed after the previous operation has finished.
+
+It makes sense, in a SYCL with those proposals, to provide programmers with shortcuts to eliminate unnecessary extra code.  When using in-order queues, for example, the lambda passed to `submit` does nothing except invoke `parallel_for` or `single_task`.  This proposal adds the kernel specification methods directly to the `queue` class.  These additional methods have two flavors.  The first handles the "empty lambda" case.  The second handles the USM dependence case by adding methods that take an `event` or vector of `events` that specify the dependences that must be satisfied before the kernel executes.  Both flavors could be implemented in a header file by specifying the `submit` lambda for the programmer.
+
+Note: These simplifications do not depend on queue order properties.  They apply both for in-order and out-of-order queues.
+
+.Queue Shortcuts
+[source,cpp]
+----
+include::queue.hpp[]
+----
+
diff --git a/sycl/doc/extensions/QueueShortcuts/queue.hpp b/sycl/doc/extensions/QueueShortcuts/queue.hpp
@@ -0,0 +1,50 @@
+class queue {
+public:
+  ...
+  template <typename KernelName, typename KernelType>
+  event single_task(KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType>
+  event single_task(event DepEvent, KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType>
+  event single_task(const vector_class<event> &DepEvents,
+                    KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems, KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems, event DepEvent,
+                     KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems,
+                     const vector_class<event> &DepEvents,
+                     KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
+                     KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
+                     event DepEvent, KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
+                     const vector_class<event> &DepEvents,
+                     KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(nd_range<Dims> ExecutionRange, KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(nd_range<Dims> ExecutionRange, event DepEvent,
+                     KernelType KernelFunc);
+
+  template <typename KernelName, typename KernelType, int Dims>
+  event parallel_for(nd_range<Dims> ExecutionRange,
+                     const vector_class<event> &DepEvents,
+                     KernelType KernelFunc);
+};