Skip to content

[Clang] [docs] [MSVC] Add sections on __forceinline and intrinsic behaviour differences between Clang and MSVC #99426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 20, 2024

Conversation

MaxEW707
Copy link
Contributor

We have had quite a few issues created around how Clang treats intrinsics vs how MSVC treats intrinsics.

While I was writing this I also added some sections on behaviour changes that caught me while porting my MSVC codebase to clang-cl.

Hopefully we can point issues around intrinsics to this doc and hopefully it is useful to others who run into similar behaviour differences.
The behaviour differences highlighted here are differences, as far as I am aware, that we do not intend to change or fix for MSVC.

@MaxEW707 MaxEW707 requested review from nico and rnk July 18, 2024 04:01
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jul 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 18, 2024

@llvm/pr-subscribers-clang

Author: Max Winkler (MaxEW707)

Changes

We have had quite a few issues created around how Clang treats intrinsics vs how MSVC treats intrinsics.

While I was writing this I also added some sections on behaviour changes that caught me while porting my MSVC codebase to clang-cl.

Hopefully we can point issues around intrinsics to this doc and hopefully it is useful to others who run into similar behaviour differences.
The behaviour differences highlighted here are differences, as far as I am aware, that we do not intend to change or fix for MSVC.


Full diff: https://github.com/llvm/llvm-project/pull/99426.diff

1 Files Affected:

  • (modified) clang/docs/MSVCCompatibility.rst (+130)
diff --git a/clang/docs/MSVCCompatibility.rst b/clang/docs/MSVCCompatibility.rst
index b2486052abf9a..cb5d89c6fd64f 100644
--- a/clang/docs/MSVCCompatibility.rst
+++ b/clang/docs/MSVCCompatibility.rst
@@ -154,3 +154,133 @@ a hint suggesting how to fix the problem.
 As of this writing, Clang is able to compile a simple ATL hello world
 application.  There are still issues parsing WRL headers for modern Windows 8
 apps, but they should be addressed soon.
+
+__forceinline behavior
+======================
+
+``__forceinline`` behaves like ``[[clang::always_inline]]``.
+Inlining is always attempted regardless of optimization level.
+
+This differs from MSVC where ``__forceinline`` is only respected once inline expansion is enabled
+which allows any function marked implicitly or explicitly ``inline`` or ``__forceinline`` to be expanded.
+Therefore functions marked ``__forceinline`` will be expanded when the optimization level is ``/Od`` unlike
+MSVC where ``__forceinline`` will not be expanded under ``/Od``.
+
+SIMD and instruction set intrinsic behavior
+===========================================
+
+Clang follows the GCC model for intrinsics and not the MSVC model.
+There are currently no plans to support the MSVC model.
+
+MSVC intrinsics always emit the machine instruction the intrinsic models regardless of the compile time options specified.
+For example ``__popcnt`` always emits the x86 popcnt instruction even if the compiler does not have the option enabled to emit popcnt on its own volition.
+
+There are two common cases where code that compiles with MSVC will need reworking to build on clang.
+Assume the examples are only built with `-msse2` so we do not have the intrinsics at compile time.
+
+.. code-block:: c++
+
+  unsigned PopCnt(unsigned v) {
+    if (HavePopCnt)
+      return __popcnt(v);
+    else
+      return GenericPopCnt(v);
+  }
+
+.. code-block:: c++
+
+  __m128 dot4_sse3(__m128 v0, __m128 v1) {
+    __m128 r = _mm_mul_ps(v0, v1);
+    r = _mm_hadd_ps(r, r);
+    r = _mm_hadd_ps(r, r);
+    return r;
+  }
+
+Clang expects that either you have compile time support for the target features, `-msse3` and `-mpopcnt`, you mark the function with the expected target feature or use runtime detection with an indirect call.
+
+.. code-block:: c++
+
+  __attribute__((__target__("sse3"))) __m128 dot4_sse3(__m128 v0, __m128 v1) {
+    __m128 r = _mm_mul_ps(v0, v1);
+    r = _mm_hadd_ps(r, r);
+    r = _mm_hadd_ps(r, r);
+    return r;
+  }
+
+The SSE3 dot product can be easily fixed by either building the translation unit with SSE3 support or using `__target__` to compile that specific function with SSE3 support.
+
+.. code-block:: c++
+
+  unsigned PopCnt(unsigned v) {
+    if (HavePopCnt)
+      return __popcnt(v);
+    else
+      return GenericPopCnt(v);
+  }
+
+The above ``PopCnt`` example must be changed to work with clang. If we mark the function with `__target__("popcnt")` then the compiler is free to emit popcnt at will which we do not want. While this isn't a concern in our small example it is a concern in larger functions with surrounding code around the intrinsics. Similar reasoning for compiling the translation unit with `-mpopcnt`.
+We must split each branch into its own function that can be called indirectly instead of using the intrinsic directly.
+
+.. code-block:: c++
+
+  __attribute__((__target__("popcnt"))) unsigned hwPopCnt(unsigned v) { return __popcnt(v); }
+  unsigned (*PopCnt)(unsigned) = HavePopCnt ? hwPopCnt : GenericPopCnt;
+
+.. code-block:: c++
+
+  __attribute__((__target__("popcnt"))) unsigned hwPopCnt(unsigned v) { return __popcnt(v); }
+  unsigned PopCnt(unsigned v) {
+    if (HavePopCnt)
+      return hwPopCnt(v);
+    else
+      return GenericPopCnt(v);
+  }
+
+In the above example ``hwPopCnt`` will not be inlined into ``PopCnt`` since ``PopCnt`` doesn't have the popcnt target feature.
+With a larger function that does real work the function call overhead is negligible. However in our popcnt example there is the function call
+overhead. There is no analog for this specific MSVC behavior in clang.
+
+For clang we effectively have to create the dispatch function ourselves to each specfic implementation.
+
+SIMD vector types
+=================
+
+Clang's simd vector types are builtin types and not user defined types as in MSVC. This does have some observable behavior changes.
+We will look at the x86 `__m128` type for the examples below but the statements apply to all vector types imcluding ARM's `float32x4_t`.
+
+There are no members that can be accessed on the vector types. Vector types are not structs in clang.
+You cannot use ``__m128.m128_f32[0]`` to access the first element of the `__m128`.
+This also means struct initialization like ``__m128{ { 0.0f, 0.0f, 0.0f, 0.0f } }`` will not compile with clang.
+
+Since vector types are builtin types, clang implements operators on them natively.
+
+.. code-block:: c++
+
+  #ifdef _MSC_VER
+  __m128 operator+(__m128 a, __m128 b) { return _mm_add_ps(a, b); }
+  #endif
+
+The above code will fail to compile since overloaded 'operator+' must have at least one parameter of class or enumeration type.
+You will need to fix such code to have the check ``#if defined(_MSC_VER) && !defined(__clang__)``.
+
+Since `__m128` is not a class type in clang any overloads after a template definition will not be considered.
+
+.. code-block:: c++
+
+  template<class T>
+  void foo(T) {}
+
+  template<class T>
+  void bar(T t) {
+    foo(t);
+  }
+
+  void foo(__m128) {}
+
+  int main() {
+    bar(_mm_setzero_ps());
+  }
+
+With MSVC ``foo(__m128)`` will be selected but with clang ``foo<__m128>()`` will be selected since on clang `__m128` is a builtin type.
+
+In general the takeaway is `__m128` is a builtin type on clang while a class type on MSVC.

Therefore functions marked ``__forceinline`` will be expanded when the optimization level is ``/Od`` unlike
MSVC where ``__forceinline`` will not be expanded under ``/Od``.

SIMD and instruction set intrinsic behavior
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up. This is a major point of incompatibility for users, and it's not something we can easily address in Clang or LLVM. LLVM tracks subtarget features on the function, and in Clang, vector intrinsics are "raised" into vector math, losing track of the specific instructions the user wants to select.

@@ -154,3 +154,133 @@ a hint suggesting how to fix the problem.
As of this writing, Clang is able to compile a simple ATL hello world
application. There are still issues parsing WRL headers for modern Windows 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"modern Windows 8" is obviously stale :) , but not necessarily relevant to this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya its on my todo list to clean this up and to add more details around ABI and/or name mangling support.

@MaxEW707 MaxEW707 merged commit 4a739fb into llvm:main Jul 20, 2024
8 checks passed
@MaxEW707 MaxEW707 deleted the mwinkler/msvc-compat-docs-update branch July 20, 2024 01:04
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 20, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot2 while building clang at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/1861

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[365/371] Generating Fuzzer-x86_64-Test
[366/371] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o
[367/371] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[368/371] Generating Msan-x86_64-with-call-Test
[369/371] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[370/371] Generating Msan-x86_64-Test
[370/371] Running compiler_rt regression tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 10048 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: SanitizerCommon-lsan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp (7240 of 10048)
******************** TEST 'SanitizerCommon-lsan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/sanitizer-x86_64-linux/build/build_default/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -funwind-tables  -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O2 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /b/sanitizer-x86_64-linux/build/build_default/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -funwind-tables -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O2 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:52:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:52:45: note: possible intended match here
Some of the malloc calls returned non-null: 256
                                            ^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           47:  [256] 
           48:  [320] 
           49:  [384] 
           50:  [448] 
           51: Some of the malloc calls returned null: 256 
           52: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
check:68'1                                                 ?    possible intended match
Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
...
[365/371] Generating Fuzzer-x86_64-Test
[366/371] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o
[367/371] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[368/371] Generating Msan-x86_64-with-call-Test
[369/371] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[370/371] Generating Msan-x86_64-Test
[370/371] Running compiler_rt regression tests
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 10048 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: SanitizerCommon-lsan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp (7240 of 10048)
******************** TEST 'SanitizerCommon-lsan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/sanitizer-x86_64-linux/build/build_default/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -funwind-tables  -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O2 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /b/sanitizer-x86_64-linux/build/build_default/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -funwind-tables -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test -ldl -O2 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:52:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:52:45: note: possible intended match here
Some of the malloc calls returned non-null: 256
                                            ^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           47:  [256] 
           48:  [320] 
           49:  [384] 
           50:  [448] 
           51: Some of the malloc calls returned null: 256 
           52: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
check:68'1                                                 ?    possible intended match

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…ehaviour differences between Clang and MSVC (#99426)

Summary:
We have had quite a few issues created around how Clang treats
intrinsics vs how MSVC treats intrinsics.

While I was writing this I also added some sections on behaviour changes
that caught me while porting my MSVC codebase to clang-cl.

Hopefully we can point issues around intrinsics to this doc and
hopefully it is useful to others who run into similar behaviour
differences.
The behaviour differences highlighted here are differences, as far as I
am aware, that we do not intend to change or fix for MSVC.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants