You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Clang] [docs] [MSVC] Add sections on __forceinline and intrinsic behaviour differences between Clang and MSVC (#99426)
We have had quite a few issues created around how Clang treats
intrinsics vs how MSVC treats intrinsics.
While I was writing this I also added some sections on behaviour changes
that caught me while porting my MSVC codebase to clang-cl.
Hopefully we can point issues around intrinsics to this doc and
hopefully it is useful to others who run into similar behaviour
differences.
The behaviour differences highlighted here are differences, as far as I
am aware, that we do not intend to change or fix for MSVC.
Copy file name to clipboardExpand all lines: clang/docs/MSVCCompatibility.rst
+130Lines changed: 130 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -154,3 +154,133 @@ a hint suggesting how to fix the problem.
154
154
As of this writing, Clang is able to compile a simple ATL hello world
155
155
application. There are still issues parsing WRL headers for modern Windows 8
156
156
apps, but they should be addressed soon.
157
+
158
+
__forceinline behavior
159
+
======================
160
+
161
+
``__forceinline`` behaves like ``[[clang::always_inline]]``.
162
+
Inlining is always attempted regardless of optimization level.
163
+
164
+
This differs from MSVC where ``__forceinline`` is only respected once inline expansion is enabled
165
+
which allows any function marked implicitly or explicitly ``inline`` or ``__forceinline`` to be expanded.
166
+
Therefore functions marked ``__forceinline`` will be expanded when the optimization level is ``/Od`` unlike
167
+
MSVC where ``__forceinline`` will not be expanded under ``/Od``.
168
+
169
+
SIMD and instruction set intrinsic behavior
170
+
===========================================
171
+
172
+
Clang follows the GCC model for intrinsics and not the MSVC model.
173
+
There are currently no plans to support the MSVC model.
174
+
175
+
MSVC intrinsics always emit the machine instruction the intrinsic models regardless of the compile time options specified.
176
+
For example ``__popcnt`` always emits the x86 popcnt instruction even if the compiler does not have the option enabled to emit popcnt on its own volition.
177
+
178
+
There are two common cases where code that compiles with MSVC will need reworking to build on clang.
179
+
Assume the examples are only built with `-msse2` so we do not have the intrinsics at compile time.
180
+
181
+
.. code-block:: c++
182
+
183
+
unsigned PopCnt(unsigned v) {
184
+
if (HavePopCnt)
185
+
return __popcnt(v);
186
+
else
187
+
return GenericPopCnt(v);
188
+
}
189
+
190
+
.. code-block:: c++
191
+
192
+
__m128 dot4_sse3(__m128 v0, __m128 v1) {
193
+
__m128 r = _mm_mul_ps(v0, v1);
194
+
r = _mm_hadd_ps(r, r);
195
+
r = _mm_hadd_ps(r, r);
196
+
return r;
197
+
}
198
+
199
+
Clang expects that either you have compile time support for the target features, `-msse3` and `-mpopcnt`, you mark the function with the expected target feature or use runtime detection with an indirect call.
The SSE3 dot product can be easily fixed by either building the translation unit with SSE3 support or using `__target__` to compile that specific function with SSE3 support.
211
+
212
+
.. code-block:: c++
213
+
214
+
unsigned PopCnt(unsigned v) {
215
+
if (HavePopCnt)
216
+
return __popcnt(v);
217
+
else
218
+
return GenericPopCnt(v);
219
+
}
220
+
221
+
The above ``PopCnt`` example must be changed to work with clang. If we mark the function with `__target__("popcnt")` then the compiler is free to emit popcnt at will which we do not want. While this isn't a concern in our small example it is a concern in larger functions with surrounding code around the intrinsics. Similar reasoning for compiling the translation unit with `-mpopcnt`.
222
+
We must split each branch into its own function that can be called indirectly instead of using the intrinsic directly.
0 commit comments