Skip to content

Commit a7877f0

Browse files
authored
Update SIMD documentation (#20391)
* Update SIMD documentation with more info on what options exist, which flags to pass, what preprocessors are available, and reword the docs to be more up-to-date with current times. * Sphinx formatting * formatting * Address review
1 parent 4847f4d commit a7877f0

File tree

2 files changed

+88
-19
lines changed

2 files changed

+88
-19
lines changed

site/make.bat

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,16 @@ if errorlevel 9009 (
5555
echo.to the full path of the 'sphinx-build' executable. Alternatively you
5656
echo.may add the Sphinx directory to PATH.
5757
echo.
58-
echo.If you don't have Sphinx installed, grab it from
59-
echo.http://sphinx-doc.org/
58+
echo.If you don't have Sphinx installed, run
59+
echo.
60+
echo. pip install -U --user sphinx==2.4.4
61+
echo. pip install -U --user jinja2==3.0.1
62+
echo.
63+
echo.to install it, or grab it manually from http://sphinx-doc.org/
64+
echo.N.b. the second command to downgrade jinja2 is needed due to
65+
echo.a Jinja2 bug. See
66+
echo.https://github.com/sphinx-doc/sphinx/issues/10291
67+
echo.https://github.com/emscripten-core/emscripten/issues/20390
6068
exit /b 1
6169
)
6270

site/source/docs/porting/simd.rst

Lines changed: 78 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,27 +3,80 @@
33
.. role:: raw-html(raw)
44
:format: html
55

6-
=======================================
7-
Porting SIMD code targeting WebAssembly
8-
=======================================
6+
===========================
7+
Using SIMD with WebAssembly
8+
===========================
9+
10+
Emscripten supports the `WebAssembly SIMD <https://github.com/webassembly/simd/>`_ feature. There are five different ways to leverage WebAssembly SIMD in your C/C++ programs:
11+
12+
1. Enable LLVM/Clang SIMD autovectorizer to automatically target WebAssembly SIMD, without requiring changes to C/C++ source code.
13+
2. Write SIMD code using the GCC/Clang SIMD Vector Extensions (``__attribute__((vector_size(16)))``)
14+
3. Write SIMD code using the WebAssembly SIMD intrinsics (``#include <wasm_simd128.h>``)
15+
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 or 128-bit subset of the AVX intrinsics (``#include <*mmintrin.h>``)
16+
5. Compile existing SIMD code that uses the ARM NEON intrinsics (``#include <arm_neon.h>``)
17+
18+
These techniques can be freely combined in a single program.
19+
20+
To enable any of the five types of SIMD above, pass the WebAssembly-specific ``-msimd128`` flag at compile time. This will also turn on LLVM's autovectorization passes. If that is not desirable, additionally pass flags ``-fno-vectorize -fno-slp-vectorize`` to disable the autovectorizer. See `Auto-Vectorization in LLVM <https://llvm.org/docs/Vectorizers.html>`_ for more information.
21+
22+
WebAssembly SIMD is supported by
23+
24+
* Chrome ≥ 91 (May 2021),
25+
26+
* Firefox ≥ 89 (June 2021),
27+
28+
* Safari ≥ 16.4 (March 2023) and
29+
30+
* Node.js ≥ 16.4 (June 2021).
931

10-
Emscripten supports the `WebAssembly SIMD proposal <https://github.com/webassembly/simd/>`_ when using the WebAssembly LLVM backend. To enable SIMD, pass the -msimd128 flag at compile time. This will also turn on LLVM's autovectorization passes, so no source modifications are necessary to benefit from SIMD.
32+
See `WebAssembly Roadmap <https://webassembly.org/roadmap/>`_ for details about other VMs.
1133

12-
At the source level, the GCC/Clang `SIMD Vector Extensions <https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ can be used and will be lowered to WebAssembly SIMD instructions where possible. In addition, there is a portable intrinsics header file that can be used.
34+
An upcoming `Relaxed SIMD proposal <https://github.com/WebAssembly/relaxed-simd/tree/main/proposals/relaxed-simd>`_ will add more SIMD instructions to WebAssembly.
35+
36+
================================
37+
GCC/Clang SIMD Vector Extensions
38+
================================
39+
40+
At the source level, the GCC/Clang `SIMD Vector Extensions <https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ can be used and will be lowered to WebAssembly SIMD instructions where possible.
41+
42+
This enables developers to create custom wide vector types via typedefs, and use arithmetic operators (+,-,*,/) on the vectorized types, as well as allow individual lane access via the vector[i] notation. However, the `GCC vector built-in functions <https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html>`_ are not available. Instead, use the WebAssembly SIMD Intrinsics functions below.
43+
44+
===========================
45+
WebAssembly SIMD Intrinsics
46+
===========================
47+
48+
LLVM maintains a WebAssembly SIMD Intrinsics header file that is provided with Emscripten, and adds type definitions for the different supported vector types.
1349

1450
.. code-block:: cpp
1551
1652
#include <wasm_simd128.h>
53+
#include <stdio.h>
54+
55+
int main() {
56+
#ifdef __wasm_simd128__
57+
v128 v1 = wasm_f32x4_make(1.2f, 3.4f, 5.6f, 7.8f);
58+
v128 v2 = wasm_f32x4_make(2.1f, 4.3f, 6.5f, 8.7f);
59+
v128 v3 = v1 + v2;
60+
// Prints "v3: [3.3, 7.7, 12.1, 16.5]"
61+
printf("v3: [%.1f, %.1f, %.1f, %.1f]\n",
62+
wasm_f32x4_extract_lane(v3, 0),
63+
wasm_f32x4_extract_lane(v3, 1),
64+
wasm_f32x4_extract_lane(v3, 2),
65+
wasm_f32x4_extract_lane(v3, 3));
66+
#endif
67+
}
1768
18-
Separate documentation for the intrinsics header is a work in progress, but its usage is straightforward and its source can be found at `wasm_simd128.h <https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/wasm_simd128.h>`_. These intrinsics are under active development in parallel with the SIMD proposal and should not be considered any more stable than the proposal itself. Note that most engines will also require an extra flag to enable SIMD. For example, Node requires `--experimental-wasm-simd`.
69+
The Wasm SIMD header can be browsed online at `wasm_simd128.h <https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/wasm_simd128.h>`_.
1970

20-
WebAssembly SIMD is not supported when using the Fastcomp backend.
71+
Pass flag ``-msimd128`` at compile time to enable targeting WebAssembly SIMD Intrinsics. C/C++ code can use the built-in preprocessor define ``#ifdef __wasm_simd128__`` to detect when building with WebAssembly SIMD enabled.
72+
73+
Pass ``-mrelaxed-simd`` to target WebAssembly Relaxed SIMD Intrinsics. C/C++ code can use the built-in preprocessor define ``#ifdef __wasm_relaxed_simd__`` to detect when this target is active.
2174

2275
======================================
2376
Limitations and behavioral differences
2477
======================================
2578

26-
When porting native SIMD code, it should be noted that because of portability concerns, the WebAssembly SIMD specification does not expose the full native instruction sets. In particular the following changes exist:
79+
When porting native SIMD code, it should be noted that because of portability concerns, the WebAssembly SIMD specification does not expose access to all of the native x86/ARM SIMD instructions. In particular the following changes exist:
2780

2881
- Emscripten does not support x86 or any other native inline SIMD assembly or building .s assembly files, so all code should be written to use SIMD intrinsic functions or compiler vector extensions.
2982

@@ -39,14 +92,14 @@ SIMD-related bug reports are tracked in the `Emscripten bug tracker with the lab
3992
Optimization considerations
4093
===========================
4194

42-
When porting SIMD code to use WebAssembly SIMD, implementors should be aware of semantic differences between the host hardware and WebAssembly semantics; as acknowledged in the WebAssembly design documentation, "`this sometimes will lead to poor performance <https://github.com/WebAssembly/design/blob/master/Portability.md#assumptions-for-efficient-execution>`_." The following list outlines some WebAssembly SIMD instructions to look out for when performance tuning:
95+
When developing SIMD code to use WebAssembly SIMD, implementors should be aware of semantic differences between the host hardware and WebAssembly semantics; as acknowledged in the WebAssembly design documentation, "`this sometimes will lead to poor performance <https://github.com/WebAssembly/design/blob/master/Portability.md#assumptions-for-efficient-execution>`_." The following list outlines some WebAssembly SIMD instructions to look out for when performance tuning:
4396

4497
.. list-table:: WebAssembly SIMD instructions with performance implications
4598
:widths: 10 10 30
4699
:header-rows: 1
47100

48101
* - WebAssembly SIMD instruction
49-
- Hardware architecture
102+
- Arch
50103
- Considerations
51104

52105
* - [i8x16|i16x8|i32x4|i64x2].[shl|shr_s|shr_u]
@@ -86,15 +139,23 @@ When porting SIMD code to use WebAssembly SIMD, implementors should be aware of
86139
- Included for orthogonality, these instructions have no equivalent x86 instruction and are `emulated with 10 x86 instructions in v8 <https://github.com/v8/v8/blob/b6520eda5eafc3b007a5641b37136dfc9d92f63d/src/compiler/backend/x64/code-generator-x64.cc#L2834-L2858>`_.
87140

88141

89-
=====================================================
90-
Compiling SIMD code targeting x86 SSE instruction set
91-
=====================================================
142+
=======================================================
143+
Compiling SIMD code targeting x86 SSE* instruction sets
144+
=======================================================
145+
146+
Emscripten supports compiling existing codebases that use x86 SSE instructions by passing the ``-msimd128`` flag, and additionally one of the following:
92147

93-
Emscripten supports compiling existing codebases that use x86 SSE by passing the `-msse` directive to the compiler, and including the header `<xmmintrin.h>`.
148+
* **SSE**: pass ``-msse`` and ``#include <xmmintrin.h>``. Use ``#ifdef __SSE__`` to gate code.
149+
* **SSE2**: pass ``-msse2`` and ``#include <emmintrin.h>``. Use ``#ifdef __SSE2__`` to gate code.
150+
* **SSE3**: pass ``-msse3`` and ``#include <pmmintrin.h>``. Use ``#ifdef __SSE3__`` to gate code.
151+
* **SSSE3**: pass ``-mssse3`` and ``#include <tmmintrin.h>``. Use ``#ifdef __SSSE3__`` to gate code.
152+
* **SSE4.1**: pass ``-msse4.1`` and ``#include <smmintrin.h>``. Use ``#ifdef __SSE4_1__`` to gate code.
153+
* **SSE4.2**: pass ``-msse4.2`` and ``#include <nmmintrin.h>``. Use ``#ifdef __SSE4_2__`` to gate code.
154+
* **AVX**: pass ``-mavx`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX__`` to gate code.
94155

95-
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and 128-bit AVX instruction sets are supported.
156+
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and 128-bit AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.
96157

97-
The following table highlights the availability and expected performance of different SSE1 intrinsics. Even if you are directly targeting the native Wasm SIMD opcodes via wasm_simd128.h header, this table can be useful for understanding the performance limitations that the Wasm SIMD specification has when running on x86 hardware.
158+
The following tables highlight the availability and expected performance of different SSE* intrinsics. This can be useful for understanding the performance limitations that the Wasm SIMD specification has when running on x86 hardware.
98159

99160
For detailed information on each SSE intrinsic function, visit the excellent `Intel Intrinsics Guide on SSE1 <https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE>`_.
100161

@@ -110,7 +171,7 @@ The following legend is used to highlight the expected performance of various in
110171

111172
Certain intrinsics in the table below are marked "virtual". This means that there does not actually exist a native x86 SSE instruction set opcode to implement them, but native compilers offer the function as a convenience. Different compilers might generate a different instruction sequence for these.
112173

113-
In addition to consulting the tables below, you can turn on diagnostics for slow, emulated functions by defining the macro `WASM_SIMD_COMPAT_SLOW`. This will print out warnings if you attempt to use any of the slow paths (corresponding to ❌ or 💣 in the legend).
174+
In addition to consulting the tables below, you can turn on diagnostics for slow, emulated functions by defining the macro ``#define WASM_SIMD_COMPAT_SLOW``. This will print out warnings if you attempt to use any of the slow paths (corresponding to ❌ or 💣 in the legend).
114175

115176
.. list-table:: x86 SSE intrinsics available via #include <xmmintrin.h> and -msse
116177
:widths: 20 30

0 commit comments

Comments
 (0)