Skip to content

[flang] Align -x f95[-cpp-input] language modes with gfortran #127986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

inaki-amatria
Copy link
Member

@inaki-amatria inaki-amatria commented Feb 20, 2025

This PR may solve some of the issues described in: #127617. The reason to revert 81d82ca is detailed in this comment in the associated issue.

EDIT:

The decision to revert 81d82ca was made because it does not provide a reasonable default. A preprocessed file is not necessarily in fixed-form. The root cause of OP's issue is that they did not pass the -ffixed-form flag when compiling the preprocessed code. Flang cannot infer the format solely based on the .i file extension.

Additionally, this commit introduced a regression where Flang incorrectly assumes free-form source files should be treated as fixed-form, even when no flag in the invocation suggests otherwise. This results in errors such as the following:

$ cat foo.f90
program main
end
$ flang -Werror -x f95 foo.f90
error: Could not scan foo.f90
foo.f90:1:1: warning: Character in fixed-form label field must be a digit
  program main
  ^
foo.f90:2:1: warning: Character in fixed-form label field must be a digit
  end
  ^

In this sense, this PR aims to align the behavior of -x f95 and -x f95-cpp-input with gfortran. Specifically:

  • -x f95 should process the file according to its file extension or based on the flags explicitly passed by the user.
  • -x f95-cpp-input should behave like -x f95 but additionally enable the -cpp flag unless explicitly overridden by the user with -nocpp.

@inaki-amatria inaki-amatria self-assigned this Feb 20, 2025
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' flang:driver flang Flang issues not falling into any other category labels Feb 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 20, 2025

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-flang-driver

Author: Iñaki Amatria Barral (inaki-amatria)

Changes

This PR may solve some of the issues described in: #127617.

The reason to revert 81d82ca is detailed in the associated issue.


Full diff: https://github.com/llvm/llvm-project/pull/127986.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Flang.cpp (-7)
  • (modified) flang/lib/Frontend/CompilerInvocation.cpp (+6)
  • (added) flang/test/Driver/dash-x-f95.f (+35)
  • (modified) flang/test/Driver/input-from-stdin/input-from-stdin.f90 (+1-1)
  • (removed) flang/test/Driver/pp-fixed-form.f90 (-19)
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..bb308c45eb64b 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -803,13 +803,6 @@ void Flang::ConstructJob(Compilation &C, const JobAction &JA,
 
   addFortranDialectOptions(Args, CmdArgs);
 
-  // 'flang -E' always produces output that is suitable for use as fixed form
-  // Fortran. However it is only valid free form source if the original is also
-  // free form.
-  if (InputType == types::TY_PP_Fortran &&
-      !Args.getLastArg(options::OPT_ffixed_form, options::OPT_ffree_form))
-    CmdArgs.push_back("-ffixed-form");
-
   handleColorDiagnosticsArgs(D, Args, CmdArgs);
 
   // LTO mode is parsed by the Clang driver library.
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index f3d9432c62d3b..1b68dee8b169d 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -859,6 +859,12 @@ static void parsePreprocessorArgs(Fortran::frontend::PreprocessorOptions &opts,
         (currentArg->getOption().matches(clang::driver::options::OPT_cpp))
             ? PPMacrosFlag::Include
             : PPMacrosFlag::Exclude;
+  // Enable -cpp based on -x unless explicitly disabled with -nocpp
+  if (const auto *dashX = args.getLastArg(clang::driver::options::OPT_x);
+      dashX && opts.macrosFlag != PPMacrosFlag::Exclude)
+    opts.macrosFlag = llvm::StringSwitch<PPMacrosFlag>(dashX->getValue())
+                          .Case("f95-cpp-input", PPMacrosFlag::Include)
+                          .Default(opts.macrosFlag);
 
   opts.noReformat = args.hasArg(clang::driver::options::OPT_fno_reformat);
   opts.preprocessIncludeLines =
diff --git a/flang/test/Driver/dash-x-f95.f b/flang/test/Driver/dash-x-f95.f
new file mode 100644
index 0000000000000..ae9a426a9d956
--- /dev/null
+++ b/flang/test/Driver/dash-x-f95.f
@@ -0,0 +1,35 @@
+program main
+  print *, __FILE__, __LINE__
+end
+
+! This test verifies that `flang`'s `-x` options behave like `gfortran`'s.
+! Specifically:
+! - `-x f95` should process the file based on its extension unless overridden.
+! - `-x f95-cpp-input` should behave like `-x f95` but with preprocessing
+!   (`-cpp`) enabled unless overridden.
+
+! ---
+! Ensure the file is treated as fixed-form unless explicitly set otherwise
+! ---
+! RUN: not %flang -Werror -x f95 -cpp -c %s 2>&1 | FileCheck --check-prefix=SCAN-ERROR %s
+! RUN: not %flang -Werror -x f95-cpp-input -c %s 2>&1 | FileCheck --check-prefix=SCAN-ERROR %s
+
+! SCAN-ERROR: error
+
+! RUN: %flang -Werror -x f95 -cpp -ffree-form -c %s 2>&1 | FileCheck --check-prefix=NO-SCAN-ERROR --allow-empty %s
+! RUN: %flang -Werror -x f95-cpp-input -ffree-form -c %s 2>&1 | FileCheck --check-prefix=NO-SCAN-ERROR --allow-empty %s
+
+! NO-SCAN-ERROR-NOT: error
+
+! ---
+! Ensure `-cpp` is not enabled by default unless explicitly requested
+! ---
+! RUN: not %flang -Werror -x f95 -ffree-form -c %s 2>&1 | FileCheck --check-prefix=SEMA-ERROR %s
+! RUN: not %flang -Werror -x f95-cpp-input -nocpp -ffree-form -c %s 2>&1 | FileCheck --check-prefix=SEMA-ERROR %s
+
+! SEMA-ERROR: error
+
+! RUN: %flang -Werror -x f95 -cpp -ffree-form -c %s 2>&1 | FileCheck --check-prefix=NO-SEMA-ERROR --allow-empty %s
+! RUN: %flang -Werror -x f95-cpp-input -ffree-form -c %s 2>&1 | FileCheck --check-prefix=NO-SEMA-ERROR --allow-empty %s
+
+! NO-SEMA-ERROR-NOT: error
diff --git a/flang/test/Driver/input-from-stdin/input-from-stdin.f90 b/flang/test/Driver/input-from-stdin/input-from-stdin.f90
index 285f0751b35d8..1fcc0340a64ba 100644
--- a/flang/test/Driver/input-from-stdin/input-from-stdin.f90
+++ b/flang/test/Driver/input-from-stdin/input-from-stdin.f90
@@ -6,7 +6,7 @@
 ! Input type is implicit
 ! RUN: cat %s | %flang -E -cpp - | FileCheck %s --check-prefix=PP-NOT-DEFINED
 ! RUN: cat %s | %flang -DNEW -E -cpp - | FileCheck %s --check-prefix=PP-DEFINED
-! RUN: cat %s | %flang -DNEW -E - | FileCheck %s --check-prefix=PP-NOT-DEFINED
+! RUN: cat %s | %flang -DNEW -E - | FileCheck %s --check-prefix=PP-DEFINED
 ! RUN: cat %s | %flang -DNEW -E -nocpp - | FileCheck %s --check-prefix=PP-NOT-DEFINED
 
 ! Input type is explicit
diff --git a/flang/test/Driver/pp-fixed-form.f90 b/flang/test/Driver/pp-fixed-form.f90
deleted file mode 100644
index 4695da78763ae..0000000000000
--- a/flang/test/Driver/pp-fixed-form.f90
+++ /dev/null
@@ -1,19 +0,0 @@
-!RUN: %flang -save-temps -### %S/Inputs/free-form-test.f90  2>&1 | FileCheck %s --check-prefix=FREE
-FREE:       "-fc1" {{.*}} "-o" "free-form-test.i" {{.*}} "-x" "f95-cpp-input" "{{.*}}/free-form-test.f90"
-FREE-NEXT:  "-fc1" {{.*}} "-ffixed-form" {{.*}} "-x" "f95" "free-form-test.i"
-
-!RUN: %flang -save-temps -### %S/Inputs/fixed-form-test.f  2>&1 | FileCheck %s --check-prefix=FIXED
-FIXED:      "-fc1" {{.*}} "-o" "fixed-form-test.i" {{.*}} "-x" "f95-cpp-input" "{{.*}}/fixed-form-test.f"
-FIXED-NEXT: "-fc1" {{.*}} "-ffixed-form" {{.*}} "-x" "f95" "fixed-form-test.i"
-
-!RUN: %flang -save-temps -### -ffree-form %S/Inputs/free-form-test.f90  2>&1 | FileCheck %s --check-prefix=FREE-FLAG
-FREE-FLAG:           "-fc1" {{.*}} "-o" "free-form-test.i" {{.*}} "-x" "f95-cpp-input" "{{.*}}/free-form-test.f90"
-FREE-FLAG-NEXT:      "-fc1" {{.*}} "-emit-llvm-bc" "-ffree-form"
-FREE-FLAG-NOT:       "-ffixed-form"
-FREE-FLAG-SAME:      "-x" "f95" "free-form-test.i"
-
-!RUN: %flang -save-temps -### -ffixed-form %S/Inputs/fixed-form-test.f  2>&1 | FileCheck %s --check-prefix=FIXED-FLAG
-FIXED-FLAG:          "-fc1" {{.*}} "-o" "fixed-form-test.i" {{.*}} "-x" "f95-cpp-input" "{{.*}}/fixed-form-test.f"
-FIXED-FLAG-NEXT:     "-fc1" {{.*}} "-emit-llvm-bc" "-ffixed-form"
-FIXED-FLAG-NOT:      "-ffixed-form"
-FIXED-FLAG-SAME:     "-x" "f95" "fixed-form-test.i"

@tarunprabhu
Copy link
Contributor

This looks ok to me, but please wait for others to approve.

@inaki-amatria
Copy link
Member Author

Ping!

@macurtis-amd macurtis-amd requested a review from DavidTruby March 3, 2025 11:10
@macurtis-amd
Copy link
Contributor

I don't feel that I'm qualified to review this. One of the assumptions which led to 81d82cac8c was that preprocessed output would always be in fixed-form (see this comment).

I've added @DavidTruby to the review. Maybe he has an opinion here.

@DavidTruby
Copy link
Member

I think we/I had a misunderstanding of what -x f95-cpp-input means.
The previous patch made the assumption that it meant input that had already been preprocessed. If that were the case, the assumption we're making in the driver (that preprocessed input is in fixed form) would be valid, as flang -E always emits fixed form files. In that sense, Flang can infer the form of the files based on the .i extension; if they were produced by flang -E then they will always be valid fixed form (and may also be valid free form, if the input was free form).

However, I think you're saying that -x f95-cpp-input means files that should be preprocessed? If that's the case then we indeed can't assume that f95-cpp-input is fixed form.

I don't think that this is the only place that we are using f95-cpp-input to mean input that has already been preprocessed though... for example you can see the same in the type switch here:

return llvm::StringSwitch<types::ID>(Ext)

We're using TY_PP_Fortran for the lower case ones as they don't need to be preprocessed, and TY_Fortran for the upper case ones as they do need to be preprocessed. If this is backwards then we have more issues than just what's being discussed here!

I'm not sure exactly what the solution is here. Normally I'd say that we should match gfortran's behaviour, but equally the -x options are in some sense internal to the compiler, and shouldn't generally be being passed by end users I think.
If we want to change this to match gfortran, we will need to go through and fix all of the places where we're treating this the wrong way around, and also change -save-temps to pass -x f77-input to the preprocessed input so that it's always treated as fixed form still.

Comment on lines 863 to 864
if (const auto *dashX = args.getLastArg(clang::driver::options::OPT_x);
dashX && opts.macrosFlag != PPMacrosFlag::Exclude)
Copy link
Contributor

@tarunprabhu tarunprabhu Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this the first time around. Is there precedent in the LLVM project for multiple statements in conditional expressions? Might it be cleaner to just use nested conditionals?

Suggested change
if (const auto *dashX = args.getLastArg(clang::driver::options::OPT_x);
dashX && opts.macrosFlag != PPMacrosFlag::Exclude)
if (opts.macrosFlag != PPMacrosFlag::Exclude)
if (const auto *dashX = args.getLastArg(clang::driver::options::OPT_x)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@inaki-amatria
Copy link
Member Author

I think we/I had a misunderstanding of what -x f95-cpp-input means. The previous patch made the assumption that it meant input that had already been preprocessed. If that were the case, the assumption we're making in the driver (that preprocessed input is in fixed form) would be valid, as flang -E always emits fixed form files. In that sense, Flang can infer the form of the files based on the .i extension; if they were produced by flang -E then they will always be valid fixed form (and may also be valid free form, if the input was free form).

-x f95[-cpp-input] and -x f77[-cpp-input] could mean anything because they are not standard but rather something made up by gfortran. However, if we want to align these options with gfortran, their meaning should be:

  • -x f95 should parse the file according to its file extension (e.g., .f for fixed form and .f90 for free form) and with -cpp disabled unless explicitly passed by the user.
  • -x f95-cpp-input should behave just like -x f95 but additionally enable -cpp, unless explicitly disabled by the user with -nocpp.
  • -x f77 should parse the file as fixed form regardless of the file extension and with -cpp disabled unless explicitly passed by the user.
  • -x f77-cpp-input should behave just like -x f77 but additionally enable -cpp, unless explicitly disabled by the user with -nocpp.

Regarding the assumption that Flang can infer the form of a .i file because flang -E always emits fixed form: I think this is troublesome. Flang cannot reliably determine the source form based on the .i extension alone. For example, if I generate a .i file using -ffixed-line-length-none, relying on the extension alone to infer the file type may lead to compilation failures.

For this reason, I believe assuming Flang can infer the source form from the .i extension alone should be avoided. This is why I reverted the earlier patch; rather than relying on implicit assumptions, the correct approach is to pass the appropriate flags when compiling a .i file.

However, I think you're saying that -x f95-cpp-input means files that should be preprocessed? If that's the case then we indeed can't assume that f95-cpp-input is fixed form.

Exactly!

I don't think that this is the only place that we are using f95-cpp-input to mean input that has already been preprocessed though... for example you can see the same in the type switch here:

return llvm::StringSwitch<types::ID>(Ext)

We're using TY_PP_Fortran for the lower case ones as they don't need to be preprocessed, and TY_Fortran for the upper case ones as they do need to be preprocessed. If this is backwards then we have more issues than just what's being discussed here!

I'm not sure exactly what the solution is here. Normally I'd say that we should match gfortran's behaviour, but equally the -x options are in some sense internal to the compiler, and shouldn't generally be being passed by end users I think. If we want to change this to match gfortran, we will need to go through and fix all of the places where we're treating this the wrong way around, and also change -save-temps to pass -x f77-input to the preprocessed input so that it's always treated as fixed form still.

Good point. I'll revisit the logic and update the PR accordingly to ensure it matches the intended behavior.

Also, thanks for your timely review and for the detailed breakdown of the issue! I appreciate the insights.

@DavidTruby
Copy link
Member

DavidTruby commented Mar 4, 2025

Regarding the assumption that Flang can infer the form of a .i file because flang -E always emits fixed form: I think this is troublesome. Flang cannot reliably determine the source form based on the .i extension alone. For example, if I generate a .i file using -ffixed-line-length-none, relying on the extension alone to infer the file type may lead to compilation failures.

Even if you generate a .i file with -ffixed-line-length-none, the file generated will be valid fixed form fortran with a line length of 72 characters. This will be true whether or not we assume that .i means fixed form. Without the previous patch, we assume it to be free form as that's the default, however that's incorrect behaviour as the file will always be fixed form (and sometimes free form as well) if generated by flang -E.

The only alternative to assuming it is one or the other is to error if neither -ffixed-form or -ffree-form is passed but I'm not sure why that would be necessary. It really will always be fixed form with a line length of 72.

I agree with you and think we should change the flags to match gfortran; I just also think that's unrelated to the assumption we can make that .i files are fixed form. It might be that the way we've implemented it is incorrect, but the assumption itself is sound and should be kept.

@inaki-amatria inaki-amatria force-pushed the fix-dash-x-language-modes branch from a989adb to bb24d1f Compare March 4, 2025 12:58
@inaki-amatria
Copy link
Member Author

Even if you generate a .i file with -ffixed-line-length-none, the file generated will be valid fixed form fortran with a line length of 72 characters. This will be true whether or not we assume that .i means fixed form. Without the previous patch, we assume it to be free form as that's the default, however that's incorrect behaviour as the file will always be fixed form (and sometimes free form as well) if generated by flang -E.

My bad. You are absolutely right. I mistakenly thought that Flang took -ffixed-line-length-none into account when generating preprocessed output.

And don't get me wrong, I see your point. However, I still think relying solely on the .i extension to determine fixed form behavior is problematic. While Flang may always produce output that is valid fixed form Fortran with a 72-character line length, that doesn't mean all .i files should be treated this way. Consider cases where a .i file is generated by a different preprocessor or manually modified. In such cases, assuming fixed form could lead to unexpected failures, like the one I demonstrated in the PR description:

$ cat foo.f90
program main
end
$ flang -Werror -x f95 foo.f90
error: Could not scan foo.f90
foo.f90:1:1: warning: Character in fixed-form label field must be a digit
  program main
  ^
foo.f90:2:1: warning: Character in fixed-form label field must be a digit
  end
  ^

The only alternative to assuming it is one or the other is to error if neither -ffixed-form or -ffree-form is passed but I'm not sure why that would be necessary. It really will always be fixed form with a line length of 72.

I agree.

I agree with you and think we should change the flags to match gfortran; I just also think that's unrelated to the assumption we can make that .i files are fixed form. It might be that the way we've implemented it is incorrect, but the assumption itself is sound and should be kept.

Again, I apologize for the confusion 🙏🏻. I have made the necessary changes and updated the PR accordingly. For now, I have carefully revisited the logic in ToolChains/Flang.cpp and aligned -x f95-cpp-input and -x f95 with gfortran. If you think adding -x f77-cpp-input and -x f77 is worthwhile, I can address that in a separate PR, as those changes may be more substantial.

Thank you for your time!

@DavidTruby
Copy link
Member

Sorry, maybe I'm still getting a bit confused now 😅. I think in

return llvm::StringSwitch<types::ID>(Ext)
we are still getting things the wrong way around, even with this patch? If we want to match gfortran, all the lower case files should have type TY_Fortran and all the upper case ones should have TY_PP_Fortran, I think? Does it not have any actual difference to the observable behaviour to have that the way around?

R.e. .i I think we've explicitly decided in the past that processing files that were preprocessed by a different compiler is not something we need/want to support; Fortran preprocessors are odd, and another compiler's preprocessor may output code that their compiler would accept but that is not standard, regardless of what we do here. Furthermore, the flang preprocessor always runs, there's really no benefit to manually doing a preprocess step yourself even with flang -E. Essentially passing a .i file to flang -fc1 is something that should really only be being done by flang -save-temps, because it doesn't really have a meaning otherwise.

The PR description does appear to show some erroneous behaviour; if flang -Werror -x f95 foo.f90 does assume fixed form then that's incorrect. I thought we were assuming fixed-form based only on whether both .i is the file extension and we were in flang -fc1, neither should be the case with that command. That's definitely a bug if it's the case, but the bug isn't that flang -fc1 wants to assume that .i means fixed form.

Sorry for my overly long comments on this, I'm just trying to share all the context as I understand it 😄. Thanks for your patience!

This change ensures that flang no longer assumes fixed-form when using
`-x f95`. The only case where fixed-form is still assumed is when
`-x f95` is used with `.i` files, unless explicitly overridden by the
user.
This change ensures that specifying `-x f95-cpp-input` automatically
enables `-cpp`, making `flang`'s behavior consistent with `gfortran`.

`flang/test/Driver/input-from-stdin/input-from-stdin.f90` changes
because the driver assumes `-x f95-cpp-input` for `<stdin>`. Therefore,
after this patch, Fortran source that comes from `<stdin>` will now be
preprocessed.
@inaki-amatria inaki-amatria force-pushed the fix-dash-x-language-modes branch from bb24d1f to 6e304d0 Compare March 7, 2025 09:48
@inaki-amatria
Copy link
Member Author

Sorry, maybe I'm still getting a bit confused now 😅. I think in

return llvm::StringSwitch<types::ID>(Ext)

we are still getting things the wrong way around, even with this patch? If we want to match gfortran, all the lower case files should have type TY_Fortran and all the upper case ones should have TY_PP_Fortran, I think? Does it not have any actual difference to the observable behaviour to have that the way around?

R.e. .i I think we've explicitly decided in the past that processing files that were preprocessed by a different compiler is not something we need/want to support; Fortran preprocessors are odd, and another compiler's preprocessor may output code that their compiler would accept but that is not standard, regardless of what we do here. Furthermore, the flang preprocessor always runs, there's really no benefit to manually doing a preprocess step yourself even with flang -E. Essentially passing a .i file to flang -fc1 is something that should really only be being done by flang -save-temps, because it doesn't really have a meaning otherwise.

The PR description does appear to show some erroneous behaviour; if flang -Werror -x f95 foo.f90 does assume fixed form then that's incorrect. I thought we were assuming fixed-form based only on whether both .i is the file extension and we were in flang -fc1, neither should be the case with that command. That's definitely a bug if it's the case, but the bug isn't that flang -fc1 wants to assume that .i means fixed form.

Sorry for my overly long comments on this, I'm just trying to share all the context as I understand it 😄. Thanks for your patience!

Apologies for the delayed response, DavidTruby. We took some time to evaluate how best to approach the problem and ensure clarity. Given the discussion, I will close this PR and open a new one with only the necessary changes and a revised description to keep the focus on the key points.

Thanks for your time and for sharing all this context, it is really helpful! And sorry for any confusion along the way.

@inaki-amatria
Copy link
Member Author

Superseded by #130268.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category flang:driver flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants