[flang] Improve error recovery in tricky situation #95168

klausler · 2024-06-11T20:27:04Z

When the very first statement of the executable part has syntax errors, it's not at all obvious whether the error messages that are reported to the user should be those from its failure to be the last statement of the specification part or its failure to be the first executable statement when both failures are at the same character in the cooked character stream. Fortran makes this problem more exciting by allowing statement function definitions look a lot like several executable statements.

The current error recovery scheme for declaration constructs depends on a look-ahead test to see whether the failed construct is actually the first executable statement. This works fine when the first executable statement is not in error, but should also allow for some error cases that begin with the tokens of an executable statement.

This can obviously still go wrong for declaration constructs that are unparseable and also have ambiguity in their leading tokens with executable statements, but that seems to be a less likely case.

Also improves error recovery for parenthesized items.

When the very first statement of the executable part has syntax errors, it's not at all obvious whether the error messages that are reported to the user should be those from its failure to be the last statement of the specification part or its failure to be the first executable statement when both failures are at the same character in the cooked character stream. Fortran makes this problem more exciting by allowing statement function definitions look a lot like several executable statements. The current error recovery scheme for declaration constructs depends on a look-ahead test to see whether the failed construct is actually the first executable statement. This works fine when the first executable statement is not in error, but should also allow for some error cases that begin with the tokens of an executable statement. This can obviously still go wrong for declaration constructs that are unparseable and also have ambiguity in their leading tokens with executable statements, but that seems to be a less likely case. Also improves error recovery for parenthesized items.

llvmbot · 2024-06-12T18:46:22Z

@llvm/pr-subscribers-flang-parser

Author: Peter Klausler (klausler)

Changes

When the very first statement of the executable part has syntax errors, it's not at all obvious whether the error messages that are reported to the user should be those from its failure to be the last statement of the specification part or its failure to be the first executable statement when both failures are at the same character in the cooked character stream. Fortran makes this problem more exciting by allowing statement function definitions look a lot like several executable statements.

The current error recovery scheme for declaration constructs depends on a look-ahead test to see whether the failed construct is actually the first executable statement. This works fine when the first executable statement is not in error, but should also allow for some error cases that begin with the tokens of an executable statement.

This can obviously still go wrong for declaration constructs that are unparseable and also have ambiguity in their leading tokens with executable statements, but that seems to be a less likely case.

Also improves error recovery for parenthesized items.

Full diff: https://github.com/llvm/llvm-project/pull/95168.diff

5 Files Affected:

(modified) flang/lib/Parser/expr-parsers.cpp (+2-1)
(modified) flang/lib/Parser/program-parsers.cpp (+16-8)
(modified) flang/lib/Parser/token-parsers.h (+27-1)
(added) flang/test/Parser/recovery01.f90 (+10)
(added) flang/test/Parser/recovery02.f90 (+8)

diff --git a/flang/lib/Parser/expr-parsers.cpp b/flang/lib/Parser/expr-parsers.cpp
index a47aae166b575..77a13de7fd02d 100644
--- a/flang/lib/Parser/expr-parsers.cpp
+++ b/flang/lib/Parser/expr-parsers.cpp
@@ -70,7 +70,8 @@ TYPE_PARSER(construct<AcImpliedDoControl>(
 constexpr auto primary{instrumented("primary"_en_US,
     first(construct<Expr>(indirect(Parser<CharLiteralConstantSubstring>{})),
         construct<Expr>(literalConstant),
-        construct<Expr>(construct<Expr::Parentheses>(parenthesized(expr))),
+        construct<Expr>(construct<Expr::Parentheses>("(" >>
+            expr / !","_tok / recovery(")"_tok, SkipPastNested<'(', ')'>{}))),
         construct<Expr>(indirect(functionReference) / !"("_tok / !"%"_tok),
         construct<Expr>(designator / !"("_tok / !"%"_tok),
         construct<Expr>(indirect(Parser<SubstringInquiry>{})), // %LEN or %KIND
diff --git a/flang/lib/Parser/program-parsers.cpp b/flang/lib/Parser/program-parsers.cpp
index 6f25ba4827220..b51b60157f39c 100644
--- a/flang/lib/Parser/program-parsers.cpp
+++ b/flang/lib/Parser/program-parsers.cpp
@@ -86,10 +86,15 @@ TYPE_CONTEXT_PARSER("specification part"_en_US,
 // are in contexts that impose constraints on the kinds of statements that
 // are allowed, and so we have a variant production for declaration-construct
 // that implements those constraints.
-constexpr auto execPartLookAhead{first(actionStmt >> ok, openaccConstruct >> ok,
-    openmpConstruct >> ok, "ASSOCIATE ("_tok, "BLOCK"_tok, "SELECT"_tok,
-    "CHANGE TEAM"_sptok, "CRITICAL"_tok, "DO"_tok, "IF ("_tok, "WHERE ("_tok,
-    "FORALL ("_tok, "!$CUF"_tok)};
+constexpr auto actionStmtLookAhead{first(actionStmt >> ok,
+    // Also accept apparent action statements with errors if they might be
+    // first in the execution part
+    "ALLOCATE ("_tok, "CALL" >> name >> "("_tok, "GO TO"_tok, "OPEN ("_tok,
+    "PRINT"_tok / space / !"("_tok, "READ ("_tok, "WRITE ("_tok)};
+constexpr auto execPartLookAhead{first(actionStmtLookAhead >> ok,
+    openaccConstruct >> ok, openmpConstruct >> ok, "ASSOCIATE ("_tok,
+    "BLOCK"_tok, "SELECT"_tok, "CHANGE TEAM"_sptok, "CRITICAL"_tok, "DO"_tok,
+    "IF ("_tok, "WHERE ("_tok, "FORALL ("_tok, "!$CUF"_tok)};
 constexpr auto declErrorRecovery{
     stmtErrorRecoveryStart >> !execPartLookAhead >> skipStmtErrorRecovery};
 constexpr auto misplacedSpecificationStmt{Parser<UseStmt>{} >>
@@ -446,10 +451,13 @@ TYPE_PARSER(extension<LanguageFeature::CUDA>(
     "<<<" >> construct<CallStmt::Chevrons>(scalarExpr, "," >> scalarExpr,
                  maybe("," >> scalarIntExpr), maybe("," >> scalarIntExpr)) /
         ">>>"))
-TYPE_PARSER(construct<CallStmt>(
-    sourced(construct<CallStmt>("CALL" >> Parser<ProcedureDesignator>{},
-        maybe(Parser<CallStmt::Chevrons>{}),
-        defaulted(parenthesized(optionalList(actualArgSpec)))))))
+constexpr auto actualArgSpecList{optionalList(actualArgSpec)};
+TYPE_CONTEXT_PARSER("CALL statement"_en_US,
+    construct<CallStmt>(
+        sourced(construct<CallStmt>("CALL" >> Parser<ProcedureDesignator>{},
+            maybe(Parser<CallStmt::Chevrons>{}) / space,
+            "(" >> actualArgSpecList / ")" ||
+                lookAhead(endOfStmt) >> defaulted(actualArgSpecList)))))
 
 // R1522 procedure-designator ->
 //         procedure-name | proc-component-ref | data-ref % binding-name
diff --git a/flang/lib/Parser/token-parsers.h b/flang/lib/Parser/token-parsers.h
index 2495017d19649..fe6bc1f69f576 100644
--- a/flang/lib/Parser/token-parsers.h
+++ b/flang/lib/Parser/token-parsers.h
@@ -560,6 +560,8 @@ template <char goal> struct SkipPast {
     while (std::optional<const char *> p{state.GetNextChar()}) {
       if (**p == goal) {
         return {Success{}};
+      } else if (**p == '\n') {
+        break;
       }
     }
     return std::nullopt;
@@ -574,8 +576,32 @@ template <char goal> struct SkipTo {
     while (std::optional<const char *> p{state.PeekAtNextChar()}) {
       if (**p == goal) {
         return {Success{}};
+      } else if (**p == '\n') {
+        break;
+      } else {
+        state.UncheckedAdvance();
+      }
+    }
+    return std::nullopt;
+  }
+};
+
+template <char left, char right> struct SkipPastNested {
+  using resultType = Success;
+  constexpr SkipPastNested() {}
+  constexpr SkipPastNested(const SkipPastNested &) {}
+  static std::optional<Success> Parse(ParseState &state) {
+    int nesting{1};
+    while (std::optional<const char *> p{state.GetNextChar()}) {
+      if (**p == right) {
+        if (!--nesting) {
+          return {Success{}};
+        }
+      } else if (**p == left) {
+        ++nesting;
+      } else if (**p == '\n') {
+        break;
       }
-      state.UncheckedAdvance();
     }
     return std::nullopt;
   }
diff --git a/flang/test/Parser/recovery01.f90 b/flang/test/Parser/recovery01.f90
new file mode 100644
index 0000000000000..674abaccc7c7d
--- /dev/null
+++ b/flang/test/Parser/recovery01.f90
@@ -0,0 +1,10 @@
+! RUN: not %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s
+program main
+    call foo(i, &
+             j, &
+             k, &
+             1$)
+end
+
+!CHECK: error: expected ')'
+!CHECK: in the context: CALL statement
diff --git a/flang/test/Parser/recovery02.f90 b/flang/test/Parser/recovery02.f90
new file mode 100644
index 0000000000000..0d0d15545cf39
--- /dev/null
+++ b/flang/test/Parser/recovery02.f90
@@ -0,0 +1,8 @@
+! RUN: not %flang_fc1 -fsyntax-only %s 2>&1 | FileCheck %s
+continue ! force executable part
+CALL ADD_HASH_BLOCK(d_c,f_c,dimc, &
+  (h2b-1+noab*(h1b-1+noab*(p4b-noab-1+nvab*(p3b-noab-1$)))))
+end
+
+!CHECK: error: expected ')'
+!CHECK: in the context: CALL statement

When the very first statement of the executable part has syntax errors, it's not at all obvious whether the error messages that are reported to the user should be those from its failure to be the last statement of the specification part or its failure to be the first executable statement when both failures are at the same character in the cooked character stream. Fortran makes this problem more exciting by allowing statement function definitions look a lot like several executable statements. The current error recovery scheme for declaration constructs depends on a look-ahead test to see whether the failed construct is actually the first executable statement. This works fine when the first executable statement is not in error, but should also allow for some error cases that begin with the tokens of an executable statement. This can obviously still go wrong for declaration constructs that are unparseable and also have ambiguity in their leading tokens with executable statements, but that seems to be a less likely case. Also improves error recovery for parenthesized items.

klausler requested a review from jeffhammond June 11, 2024 20:27

jeffhammond approved these changes Jun 12, 2024

View reviewed changes

klausler force-pushed the jeff1 branch from 4c95b21 to 5dbb1d1 Compare June 12, 2024 18:45

llvmbot added flang Flang issues not falling into any other category flang:parser labels Jun 12, 2024

klausler merged commit 010c55b into llvm:main Jun 13, 2024
9 checks passed

klausler deleted the jeff1 branch June 13, 2024 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang] Improve error recovery in tricky situation #95168

[flang] Improve error recovery in tricky situation #95168

Uh oh!

klausler commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 12, 2024

Uh oh!

Uh oh!

Uh oh!

[flang] Improve error recovery in tricky situation #95168

[flang] Improve error recovery in tricky situation #95168

Uh oh!

Conversation

klausler commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 12, 2024

Uh oh!

Uh oh!

Uh oh!