Skip to content

[ELF] Reject error-prone meta characters in input section description #84130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Mar 6, 2024

The lexer is overly permissive. When parsing file patterns in an input
section description and there is a missing ), we would accept many
non-sensible tokens (e.g. }) as patterns, leading to confusion, e.g.
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)
(#81804).

Ideally, the lexer should be stateful to report more errors like GNU ld
and get rid of hacks like ScriptLexer::maybeSplitExpr, but that would
require a large rewrite of the lexer. For now, just reject certain
non-wildcard meta characters to detect common mistakes.

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Mar 6, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Fangrui Song (MaskRay)

Changes

Our lexing rule is loose and recognizes certain non-wildcard meta
characters as input file patterns. This can be confusing in certain
cases, e.g.
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)
(} without a closing )) (#81804).

Ideally, the lexer should be state-aware to report more errors like GNU
ld, but that would require a large rewrite. For now, just report errors
for one of (){} used as an input file pattern.


Full diff: https://github.com/llvm/llvm-project/pull/84130.diff

2 Files Affected:

  • (modified) lld/ELF/ScriptParser.cpp (+12-2)
  • (modified) lld/test/ELF/linkerscript/wildcards.s (+14-7)
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index f0ede1f43bbdb3..282f95bd04b085 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -717,9 +717,19 @@ SmallVector<SectionPattern, 0> ScriptParser::readInputSectionsList() {
 
     StringMatcher SectionMatcher;
     // Break if the next token is ), EXCLUDE_FILE, or SORT*.
-    while (!errorCount() && peek() != ")" && peek() != "EXCLUDE_FILE" &&
-           peekSortKind() == SortSectionPolicy::Default)
+    while (!errorCount() && peekSortKind() == SortSectionPolicy::Default) {
+      StringRef s = peek();
+      if (s == ")" || s == "EXCLUDE_FILE")
+        break;
+      // Detect common mistakes that certain non-wildcard meta characters used
+      // without a closing ')'.
+      if (s.size() == 1 && strchr("(){}", s[0])) {
+        skip();
+        setError("section pattern is expected");
+        break;
+      }
       SectionMatcher.addPattern(unquote(next()));
+    }
 
     if (!SectionMatcher.empty())
       ret.push_back({std::move(excludeFilePat), std::move(SectionMatcher)});
diff --git a/lld/test/ELF/linkerscript/wildcards.s b/lld/test/ELF/linkerscript/wildcards.s
index 1eea27891dfc2c..24d4102559c95e 100644
--- a/lld/test/ELF/linkerscript/wildcards.s
+++ b/lld/test/ELF/linkerscript/wildcards.s
@@ -91,24 +91,31 @@ SECTIONS {
   .text : { *([.]abc .ab[v-y] ) }
 }
 
-## Test a few non-wildcard meta characters rejected by GNU ld.
+## Test a few non-wildcard characters rejected by GNU ld.
 
 #--- lbrace.lds
-# RUN: ld.lld -T lbrace.lds a.o -o out
+# RUN: not ld.lld -T lbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE --match-full-lines --strict-whitespace
+#      ERR-LBRACE:{{.*}}: section pattern is expected
+# ERR-LBRACE-NEXT:>>>   .text : { *(.a* { ) }
+# ERR-LBRACE-NEXT:>>>                   ^
 SECTIONS {
   .text : { *(.a* { ) }
 }
 
 #--- lparen.lds
-## ( is recognized as a section name pattern. Note, ( is rejected by GNU ld.
-# RUN: ld.lld -T lparen.lds a.o -o out
-# RUN: llvm-objdump --section-headers out | FileCheck --check-prefix=SEC-NO %s
+# RUN: not ld.lld -T lparen.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LPAREN --match-full-lines --strict-whitespace
+#      ERR-LPAREN:{{.*}}: section pattern is expected
+# ERR-LPAREN-NEXT:>>>   .text : { *(.a* ( ) }
+# ERR-LPAREN-NEXT:>>>                   ^
 SECTIONS {
- .text : { *(.a* ( ) }
+  .text : { *(.a* ( ) }
 }
 
 #--- rbrace.lds
-# RUN: ld.lld -T rbrace.lds a.o -o out
+# RUN: not ld.lld -T rbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-RBRACE --match-full-lines --strict-whitespace
+#      ERR-RBRACE:{{.*}}: section pattern is expected
+# ERR-RBRACE-NEXT:>>>   .text : { *(.a* } ) }
+# ERR-RBRACE-NEXT:>>>                   ^
 SECTIONS {
   .text : { *(.a* } ) }
 }

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions on the test and a comment.

# RUN: ld.lld -T lbrace.lds a.o -o out
# RUN: not ld.lld -T lbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE --match-full-lines --strict-whitespace
# ERR-LBRACE:{{.*}}: section pattern is expected
# ERR-LBRACE-NEXT:>>> .text : { *(.a* { ) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a case when there is no space between the disallowed character for example .text : { *(.a*{) as I understand it (,),{,} are lexed as a single token so the spaces shouldn't matter.

From reading the line and the tests all having the character separated by spaces, it made me double check that we could catch more than just a single isolated character.

if (s.size() == 1 && strchr("(){}", s[0]))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated. Changed s.size() == 1 to !s.empty() to be clearer that we just guard again "" special case.

StringRef s = peek();
if (s == ")" || s == "EXCLUDE_FILE")
break;
// Detect common mistakes that certain non-wildcard meta characters used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest
// Detect common mistakes when certain non-wildcard meta characters are used without a closing )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Created using spr 1.3.5-bogner
Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates.

Created using spr 1.3.5-bogner
Copy link
Contributor

@mysterymath mysterymath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

MaskRay added 2 commits March 6, 2024 17:16
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner
@MaskRay MaskRay merged commit 551e20d into main Mar 7, 2024
@MaskRay MaskRay deleted the users/MaskRay/spr/elf-reject-error-prone-meta-characters-in-input-section-description branch March 7, 2024 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants