Skip to content

Commit

Permalink
Add 256-bit AVX support (#22430)
Browse files Browse the repository at this point in the history
Currently only 128-bit subset of the AVX intrinsic are supported, this
patch add 256-bit AVX intrinsic.
Since WebAssembly only supports 128-bit fixed vector length, one 256-bit
AVX intrinsic is emulated by two 128-bit intrinsics.
  • Loading branch information
jiepan-intel authored Sep 25, 2024
1 parent 52cc139 commit 77e24ae
Show file tree
Hide file tree
Showing 6 changed files with 3,557 additions and 214 deletions.
3 changes: 3 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ See docs/process.md for more on how version tagging works.

3.1.68 (in development)
-----------------------
- Added support for compiling 256-bit wide AVX intrinsics, emulated on top
of 128-bit Wasm SIMD instruction set. (#22430). Pass `-msimd128 -mavx` to
enable targeting AVX.
- Pthread-based programs no longer generates `.worker.js` file. This file was
made redundant back in 3.1.58 and now is completely removed. (#22598)
- The freetype port was updated from v2.6 to v2.13.3. (#22585)
Expand Down
6 changes: 3 additions & 3 deletions site/source/docs/porting/simd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Emscripten supports the `WebAssembly SIMD <https://github.com/webassembly/simd/>
1. Enable LLVM/Clang SIMD autovectorizer to automatically target WebAssembly SIMD, without requiring changes to C/C++ source code.
2. Write SIMD code using the GCC/Clang SIMD Vector Extensions (``__attribute__((vector_size(16)))``)
3. Write SIMD code using the WebAssembly SIMD intrinsics (``#include <wasm_simd128.h>``)
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 or 128-bit subset of the AVX intrinsics (``#include <*mmintrin.h>``)
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 or AVX intrinsics (``#include <*mmintrin.h>``)
5. Compile existing SIMD code that uses the ARM NEON intrinsics (``#include <arm_neon.h>``)

These techniques can be freely combined in a single program.
Expand Down Expand Up @@ -153,7 +153,7 @@ Emscripten supports compiling existing codebases that use x86 SSE instructions b
* **SSE4.2**: pass ``-msse4.2`` and ``#include <nmmintrin.h>``. Use ``#ifdef __SSE4_2__`` to gate code.
* **AVX**: pass ``-mavx`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX__`` to gate code.

Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and 128-bit AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.

The following tables highlight the availability and expected performance of different SSE* intrinsics. This can be useful for understanding the performance limitations that the Wasm SIMD specification has when running on x86 hardware.

Expand Down Expand Up @@ -1136,7 +1136,7 @@ The following table highlights the availability and expected performance of diff
* - _mm_testz_ps
- 💣 emulated with complex SIMD+scalar sequence

Only the 128-bit wide instructions from AVX instruction set are available. 256-bit wide AVX instructions are not provided.
Only the 128-bit wide instructions from AVX instruction set are listed. The 256-bit wide AVX instructions are emulated by two 128-bit wide instructions.


======================================================
Expand Down
Loading

0 comments on commit 77e24ae

Please sign in to comment.