Skip to content

Commit cb08558

Browse files
committed
[HIP] Fix regressions due to fp contract change
Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation (https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like other toolchains. However, this change introduced an unintentional change regarding backend fp fuse option, which caused regressions in some HIP applications. Basically before the change, HIP toolchain used clang to generate bitcode, then use opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes the default fp fuse mode which is 'Standard'. This mode respect contract flag of fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag. However, after the change, HIP toolchain now use clang to generate IR, do optimization, and generate ISA as one process. Now amdgpu backend fp fuse option is determined by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd instructions without contract flag. This causes wrong result for some device library functions, e.g. tan(-1e20), which should return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse option does not respect contract flag, there is no way to use #pragma clang fp contract directive to enforce fp contract requirements. This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast' fp fuse option in backend does not honor contract flag, it is of little use to HIP applications since all code with #pragma STDC FP_CONTRACT or any IR from a source compiled with -ffp-contract=on is broken. Differential Revision: https://reviews.llvm.org/D90174
1 parent 3823665 commit cb08558

File tree

9 files changed

+325
-27
lines changed

9 files changed

+325
-27
lines changed

clang/docs/LanguageExtensions.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3209,8 +3209,9 @@ statements in C).
32093209
32103210
The pragma can also be used with ``off`` which turns FP contraction off for a
32113211
section of the code. This can be useful when fast contraction is otherwise
3212-
enabled for the translation unit with the ``-ffp-contract=fast`` flag.
3213-
3212+
enabled for the translation unit with the ``-ffp-contract=fast-honor-pragmas`` flag.
3213+
Note that ``-ffp-contract=fast`` will override pragmas to fuse multiply and
3214+
addition across statements regardless of any controlling pragmas.
32143215
32153216
``#pragma clang fp exceptions`` specifies floating point exception behavior. It
32163217
may take one the the values: ``ignore``, ``maytrap`` or ``strict``. Meaning of

clang/docs/UsersManual.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1336,15 +1336,16 @@ are listed below.
13361336
The C standard permits intermediate floating-point results within an
13371337
expression to be computed with more precision than their type would
13381338
normally allow. This permits operation fusing, and Clang takes advantage
1339-
of this by default. This behavior can be controlled with the
1340-
``FP_CONTRACT`` pragma. Please refer to the pragma documentation for a
1341-
description of how the pragma interacts with this option.
1339+
of this by default. This behavior can be controlled with the ``FP_CONTRACT``
1340+
and ``clang fp contract`` pragmas. Please refer to the pragma documentation
1341+
for a description of how the pragmas interact with this option.
13421342

13431343
Valid values are:
13441344

1345-
* ``fast`` (everywhere)
1346-
* ``on`` (according to FP_CONTRACT pragma, default)
1345+
* ``fast`` (fuse across statements disregarding pragmas, default for CUDA)
1346+
* ``on`` (fuse in the same statement unless dictated by pragmas, default for languages other than CUDA/HIP)
13471347
* ``off`` (never fuse)
1348+
* ``fast-honor-pragmas`` (fuse across statements unless dictated by pragmas, default for HIP)
13481349

13491350
.. _opt_fhonor-infinities:
13501351

clang/include/clang/Basic/LangOptions.h

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,11 @@ class LangOptions : public LangOptionsBase {
187187
// Enable the floating point pragma
188188
FPM_On,
189189

190-
// Aggressively fuse FP ops (E.g. FMA).
191-
FPM_Fast
190+
// Aggressively fuse FP ops (E.g. FMA) disregarding pragmas.
191+
FPM_Fast,
192+
193+
// Aggressively fuse FP ops and honor pragmas.
194+
FPM_FastHonorPragmas
192195
};
193196

194197
/// Alias for RoundingMode::NearestTiesToEven.
@@ -417,7 +420,13 @@ class FPOptions {
417420
}
418421
explicit FPOptions(const LangOptions &LO) {
419422
Value = 0;
420-
setFPContractMode(LO.getDefaultFPContractMode());
423+
// The language fp contract option FPM_FastHonorPragmas has the same effect
424+
// as FPM_Fast in frontend. For simplicity, use FPM_Fast uniformly in
425+
// frontend.
426+
auto LangOptContractMode = LO.getDefaultFPContractMode();
427+
if (LangOptContractMode == LangOptions::FPM_FastHonorPragmas)
428+
LangOptContractMode = LangOptions::FPM_Fast;
429+
setFPContractMode(LangOptContractMode);
421430
setRoundingMode(LO.getFPRoundingMode());
422431
setFPExceptionMode(LO.getFPExceptionMode());
423432
setAllowFPReassociate(LO.AllowFPReassoc);

clang/include/clang/Driver/Options.td

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1276,9 +1276,13 @@ def fno_rounding_math : Flag<["-"], "fno-rounding-math">, Group<f_Group>, Flags<
12761276
def ftrapping_math : Flag<["-"], "ftrapping-math">, Group<f_Group>, Flags<[CC1Option]>;
12771277
def fno_trapping_math : Flag<["-"], "fno-trapping-math">, Group<f_Group>, Flags<[CC1Option]>;
12781278
def ffp_contract : Joined<["-"], "ffp-contract=">, Group<f_Group>,
1279-
Flags<[CC1Option]>, HelpText<"Form fused FP ops (e.g. FMAs): fast (everywhere)"
1280-
" | on (according to FP_CONTRACT pragma) | off (never fuse). Default"
1281-
" is 'fast' for CUDA/HIP and 'on' otherwise.">, Values<"fast,on,off">;
1279+
Flags<[CC1Option]>, HelpText<"Form fused FP ops (e.g. FMAs):"
1280+
" fast (fuses across statements disregarding pragmas)"
1281+
" | on (only fuses in the same statement unless dictated by pragmas)"
1282+
" | off (never fuses)"
1283+
" | fast-honor-pragmas (fuses across statements unless diectated by pragmas)."
1284+
" Default is 'fast' for CUDA, 'fast-honor-pragmas' for HIP, and 'on' otherwise.">,
1285+
Values<"fast,on,off,fast-honor-pragmas">;
12821286

12831287
defm strict_float_cast_overflow : OptOutFFlag<"strict-float-cast-overflow",
12841288
"Assume that overflowing float-to-int casts are undefined (default)",

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -481,6 +481,7 @@ static bool initTargetOptions(DiagnosticsEngine &Diags,
481481
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
482482
break;
483483
case LangOptions::FPM_On:
484+
case LangOptions::FPM_FastHonorPragmas:
484485
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
485486
break;
486487
case LangOptions::FPM_Fast:

clang/lib/Frontend/CompilerInvocation.cpp

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2424,9 +2424,20 @@ void CompilerInvocation::setLangDefaults(LangOptions &Opts, InputKind IK,
24242424

24252425
Opts.HIP = IK.getLanguage() == Language::HIP;
24262426
Opts.CUDA = IK.getLanguage() == Language::CUDA || Opts.HIP;
2427-
if (Opts.CUDA)
2428-
// Set default FP_CONTRACT to FAST.
2427+
if (Opts.HIP) {
2428+
// HIP toolchain does not support 'Fast' FPOpFusion in backends since it
2429+
// fuses multiplication/addition instructions without contract flag from
2430+
// device library functions in LLVM bitcode, which causes accuracy loss in
2431+
// certain math functions, e.g. tan(-1e20) becomes -0.933 instead of 0.8446.
2432+
// For device library functions in bitcode to work, 'Strict' or 'Standard'
2433+
// FPOpFusion options in backends is needed. Therefore 'fast-honor-pragmas'
2434+
// FP contract option is used to allow fuse across statements in frontend
2435+
// whereas respecting contract flag in backend.
2436+
Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
2437+
} else if (Opts.CUDA) {
2438+
// Allow fuse across statements disregarding pragmas.
24292439
Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
2440+
}
24302441

24312442
Opts.RenderScript = IK.getLanguage() == Language::RenderScript;
24322443
if (Opts.RenderScript) {
@@ -3343,6 +3354,8 @@ static void ParseLangArgs(LangOptions &Opts, ArgList &Args, InputKind IK,
33433354
Opts.setDefaultFPContractMode(LangOptions::FPM_On);
33443355
else if (Val == "off")
33453356
Opts.setDefaultFPContractMode(LangOptions::FPM_Off);
3357+
else if (Val == "fast-honor-pragmas")
3358+
Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
33463359
else
33473360
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
33483361
}

clang/lib/Sema/SemaAttr.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,8 @@ void Sema::ActOnPragmaFPContract(SourceLocation Loc,
966966
case LangOptions::FPM_Off:
967967
NewFPFeatures.setDisallowFPContract();
968968
break;
969+
case LangOptions::FPM_FastHonorPragmas:
970+
llvm_unreachable("Should not happen");
969971
}
970972
FpPragmaStack.Act(Loc, Sema::PSK_Set, StringRef(), NewFPFeatures);
971973
CurFPFeatures = NewFPFeatures.applyOverrides(getLangOpts());

0 commit comments

Comments
 (0)