[ESIMD] Update documentation to reflect recent ESIMD API changes. (#3727)

DenisBakhvalov · web-flow · commit 1e0bd1ed685f · 2021-05-13T15:19:47.000-07:00
* [ESIMD] Updated documentation to reflect recent changes.
diff --git a/sycl/doc/extensions/ExplicitSIMD/README.md b/sycl/doc/extensions/ExplicitSIMD/README.md
@@ -36,21 +36,23 @@ third:
 
   auto e = q.submit([&](handler &cgh) {
     cgh.parallel_for<class Test>(Range, [=](nd_item<1> ndi) SYCL_ESIMD_KERNEL {
-      using namespace sycl::INTEL::gpu;
+      using namespace sycl::ext::intel::experimental::esimd;
 
       int i = ndi.get_global_id(0);
-      simd<float, VL> va = block_load<float, VL>(A + i * VL);
-      simd<float, VL> vb = block_load<float, VL>(B + i * VL);
+      simd<float, VL> va;
+      va.copy_from(A + i * VL);
+      simd<float, VL> vb;
+      vb.copy_from(B + i * VL);
       simd<float, VL> vc = va + vb;
-      block_store<float, VL>(C + i * VL, vc);
+      vc.copy_to(C + i * VL);
     });
   });
 ```
 
 In this example the lambda function passed to the `parallel_for` is marked with
 a special attribute - `SYCL_ESIMD_KERNEL`. This tells the compiler that this
 kernel is a ESIMD one and ESIMD APIs can be used inside it. Here the `simd`
-objects and `block_load`/`block_store` intrinsics are used which are avaiable
+objects and `copy_from`/`copy_to` intrinsics are used which are avaiable
 only in the ESIMD extension.
 Full runnable code sample can be found on the
 [github repo](https://github.com/intel/llvm-test-suite/blob/intel/SYCL/ESIMD/vadd_usm.cpp).
diff --git a/sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md b/sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md
@@ -34,6 +34,8 @@ Explicit SIMD APIs can be used only in code to be executed on Intel Gen
 architecture devices and the host device for now. Attempt to run such code on
 other devices will result in error. 
 
+All the ESIMD APIs are defined in the `sycl::ext::intel::experimental::esimd` namespace.
+
 Kernels and `SYCL_EXTERNAL` functions using ESP must be explicitly marked with
 the `[[intel::sycl_explicit_simd]]` attribute. Subgroup size query within such
 functions will always return `1`.
@@ -56,17 +58,18 @@ private:
 
 *Lambda kernel and function*
 ```cpp
+using namespace sycl::ext::intel::experimental::esimd;
 SYCL_EXTERNAL
-void sycl_device_f(sycl::global_ptr<int> ptr, sycl::intel::gpu::simd<float, 8> X) [[intel::sycl_explicit_simd]] {
-  sycl::intel::gpu::flat_block_write(*ptr.get(), X);
+void sycl_device_f(sycl::global_ptr<int> ptr, simd<float, 8> X) [[intel::sycl_explicit_simd]] {
+  flat_block_write(*ptr.get(), X);
 }
 ...
   Q.submit([&](sycl::handler &Cgh) {
     auto Acc1 = Buf1.get_access<sycl::access::mode::read>(Cgh);
     auto Acc2 = Buf2.get_access<sycl::access::mode::read_write>(Cgh);
 
     Cgh.single_task<class KernelID>([=] () [[intel::sycl_explicit_simd]] {
-      sycl::intel::gpu::simd<float, 8> Val = sycl::intel::gpu::flat_block_read(Acc1.get_pointer());
+      simd<float, 8> Val = flat_block_read(Acc1.get_pointer());
       sycl_device_f(Acc2, Val);
     });
   });
@@ -87,7 +90,7 @@ device-side API
 - 2D and 3D accessors
 - Constant accessors
 - `sycl::accessor::get_pointer()`. All memory accesses through an accessor are
-done via explicit APIs; e.g. `sycl::intel::gpu::block_store(acc, offset)`
+done via explicit APIs; e.g. `sycl::ext::intel::experimental::esimd::block_store(acc, offset)`
 - Few others (to be documented)
 
 ## Core Explicit SIMD programming APIs
@@ -98,15 +101,15 @@ efficient mapping to SIMD vector operations on Intel GPU architectures.
 
 ### SIMD vector class
 
-The `sycl::intel::gpu::simd` class is a vector templated on some element type.
+The `simd` class is a vector templated on some element type.
 The element type must be vectorizable type. The set of vectorizable types is the
 set of fundamental SYCL arithmetic types (C++ arithmetic types or `half` type)
 excluding `bool`. The length of the vector is the second template parameter.
 
 ESIMD compiler back-end does the best it can to map each `simd` class object to a consecutive block
 of registers in the general register file (GRF).
 
-Every specialization of ```sycl::intel::gpu::simd``` shall be a complete type. The term
+Every specialization of `simd` class shall be a complete type. The term
 simd type refers to all supported specialization of the simd class template.
 To access the i-th individual data element in a simd vector, Explicit SIMD supports the
 standard subscript operator ```[]```, which returns by value.
@@ -215,7 +218,7 @@ To model predicated move, Explicit SIMD provides the following merge functions:
 ```
 ### `simd_view` class
 
-The ```sycl::intel::gpu::simd_view``` represents a "window" into existing simd object,
+The `simd_view` represents a "window" into existing simd object,
 through which a part of the original object can be read or modified. This is a
 syntactic convenience feature to reduce verbosity when accessing sub-regions of
 simd objects. **RegionTy** describes the window shape and can be 1D or 2D,
@@ -231,8 +234,8 @@ different shapes and dimensions as illustrated below (`auto` resolves to a
 <img src="images/simd_view.svg" title="1D select example" width="800" height="300"/>
 </p>
 
-```sycl::intel::gpu::simd_view``` class supports all the element-wise operations and
-other utility functions defined for ```sycl::intel::gpu::simd``` class. It also
+`simd_view` class supports all the element-wise operations and
+other utility functions defined for `simd` class. It also
 provides region accessors and more generic operations tailored for 2D regions,
 such as row/column operators and 2D select/replicate/format/merge operations.
 
@@ -479,12 +482,12 @@ int main(void) {
   auto e = q.submit([&](handler &cgh) {
     cgh.parallel_for<class Test>(
       Range, [=](nd_item<1> i) [[intel::sycl_explicit_simd]] {
-
+      using namespace sycl::ext::intel::experimental::esimd;
       auto offset = i.get_global_id(0) * VL;
-      sycl::intel::gpu<float, VL> va = sycl::intel::gpu::flat_block_load<float, VL>(A + offset);
-      sycl::intel::gpu<float, VL> vb = sycl::intel::gpu::flat_block_load<float, VL>(B + offset);
-      sycl::intel::gpu<float, VL> vc = va + vb;
-      sycl::intel::gpu::flat_block_store<float, VL>(C + offset, vc);
+      simd<float, VL> va = flat_block_load<float, VL>(A + offset);
+      simd<float, VL> vb = flat_block_load<float, VL>(B + offset);
+      simd<float, VL> vc = va + vb;
+      flat_block_store<float, VL>(C + offset, vc);
     });
   });
   e.wait();
@@ -501,9 +504,9 @@ int main(void) {
 
 - Design interoperability with SPMD context - e.g. invocation of ESIMD functions
   from a standard SYCL code
-- Generate sycl::intel::gpu API documentation from sources
+- Generate `sycl::ext::intel::experimental::esimd` API documentation from sources
 - Section covering 2D use cases
-- A bridge from `std::simd` to `sycl::intel::gpu::simd`
+- A bridge from `std::simd` to `sycl::ext::intel::experimental::esimd::simd`
 - Describe `simd_view` class restrictions
 - Support OpenCL and L0 interop for ESIMD kernels
 - Consider auto-inclusion of sycl_explicit_simd.hpp under -fsycl-explicit-simd option