[ELF] Reject error-prone meta characters in input section description #84130

MaskRay · 2024-03-06T07:54:35Z

The lexer is overly permissive. When parsing file patterns in an input
section description and there is a missing ), we would accept many
non-sensible tokens (e.g. }) as patterns, leading to confusion, e.g.
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)
(#81804).

Ideally, the lexer should be stateful to report more errors like GNU ld
and get rid of hacks like ScriptLexer::maybeSplitExpr, but that would
require a large rewrite of the lexer. For now, just reject certain
non-wildcard meta characters to detect common mistakes.

Created using spr 1.3.5-bogner

llvmbot · 2024-03-06T07:55:06Z

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Fangrui Song (MaskRay)

Changes

Our lexing rule is loose and recognizes certain non-wildcard meta
characters as input file patterns. This can be confusing in certain
cases, e.g.
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)
(} without a closing )) (#81804).

Ideally, the lexer should be state-aware to report more errors like GNU
ld, but that would require a large rewrite. For now, just report errors
for one of (){} used as an input file pattern.

Full diff: https://github.com/llvm/llvm-project/pull/84130.diff

2 Files Affected:

(modified) lld/ELF/ScriptParser.cpp (+12-2)
(modified) lld/test/ELF/linkerscript/wildcards.s (+14-7)

diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index f0ede1f43bbdb3..282f95bd04b085 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -717,9 +717,19 @@ SmallVector<SectionPattern, 0> ScriptParser::readInputSectionsList() {
 
     StringMatcher SectionMatcher;
     // Break if the next token is ), EXCLUDE_FILE, or SORT*.
-    while (!errorCount() && peek() != ")" && peek() != "EXCLUDE_FILE" &&
-           peekSortKind() == SortSectionPolicy::Default)
+    while (!errorCount() && peekSortKind() == SortSectionPolicy::Default) {
+      StringRef s = peek();
+      if (s == ")" || s == "EXCLUDE_FILE")
+        break;
+      // Detect common mistakes that certain non-wildcard meta characters used
+      // without a closing ')'.
+      if (s.size() == 1 && strchr("(){}", s[0])) {
+        skip();
+        setError("section pattern is expected");
+        break;
+      }
       SectionMatcher.addPattern(unquote(next()));
+    }
 
     if (!SectionMatcher.empty())
       ret.push_back({std::move(excludeFilePat), std::move(SectionMatcher)});
diff --git a/lld/test/ELF/linkerscript/wildcards.s b/lld/test/ELF/linkerscript/wildcards.s
index 1eea27891dfc2c..24d4102559c95e 100644
--- a/lld/test/ELF/linkerscript/wildcards.s
+++ b/lld/test/ELF/linkerscript/wildcards.s
@@ -91,24 +91,31 @@ SECTIONS {
   .text : { *([.]abc .ab[v-y] ) }
 }
 
-## Test a few non-wildcard meta characters rejected by GNU ld.
+## Test a few non-wildcard characters rejected by GNU ld.
 
 #--- lbrace.lds
-# RUN: ld.lld -T lbrace.lds a.o -o out
+# RUN: not ld.lld -T lbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE --match-full-lines --strict-whitespace
+#      ERR-LBRACE:{{.*}}: section pattern is expected
+# ERR-LBRACE-NEXT:>>>   .text : { *(.a* { ) }
+# ERR-LBRACE-NEXT:>>>                   ^
 SECTIONS {
   .text : { *(.a* { ) }
 }
 
 #--- lparen.lds
-## ( is recognized as a section name pattern. Note, ( is rejected by GNU ld.
-# RUN: ld.lld -T lparen.lds a.o -o out
-# RUN: llvm-objdump --section-headers out | FileCheck --check-prefix=SEC-NO %s
+# RUN: not ld.lld -T lparen.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LPAREN --match-full-lines --strict-whitespace
+#      ERR-LPAREN:{{.*}}: section pattern is expected
+# ERR-LPAREN-NEXT:>>>   .text : { *(.a* ( ) }
+# ERR-LPAREN-NEXT:>>>                   ^
 SECTIONS {
- .text : { *(.a* ( ) }
+  .text : { *(.a* ( ) }
 }
 
 #--- rbrace.lds
-# RUN: ld.lld -T rbrace.lds a.o -o out
+# RUN: not ld.lld -T rbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-RBRACE --match-full-lines --strict-whitespace
+#      ERR-RBRACE:{{.*}}: section pattern is expected
+# ERR-RBRACE-NEXT:>>>   .text : { *(.a* } ) }
+# ERR-RBRACE-NEXT:>>>                   ^
 SECTIONS {
   .text : { *(.a* } ) }
 }

smithp35

A few small suggestions on the test and a comment.

smithp35 · 2024-03-06T10:14:17Z

lld/test/ELF/linkerscript/wildcards.s

-# RUN: ld.lld -T lbrace.lds a.o -o out
+# RUN: not ld.lld -T lbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE --match-full-lines --strict-whitespace
+#      ERR-LBRACE:{{.*}}: section pattern is expected
+# ERR-LBRACE-NEXT:>>>   .text : { *(.a* { ) }


Is it worth adding a case when there is no space between the disallowed character for example .text : { *(.a*{) as I understand it (,),{,} are lexed as a single token so the spaces shouldn't matter.

From reading the line and the tests all having the character separated by spaces, it made me double check that we could catch more than just a single isolated character.

if (s.size() == 1 && strchr("(){}", s[0]))

Thanks for the suggestion. Updated. Changed s.size() == 1 to !s.empty() to be clearer that we just guard again "" special case.

smithp35 · 2024-03-06T10:14:28Z

lld/ELF/ScriptParser.cpp

+      StringRef s = peek();
+      if (s == ")" || s == "EXCLUDE_FILE")
+        break;
+      // Detect common mistakes that certain non-wildcard meta characters used


Suggest
// Detect common mistakes when certain non-wildcard meta characters are used without a closing )

Created using spr 1.3.5-bogner

smithp35

LGTM, thanks for the updates.

Created using spr 1.3.5-bogner

mysterymath

LGTM, thanks!

Created using spr 1.3.5-bogner

[𝘀𝗽𝗿] initial version

0803329

Created using spr 1.3.5-bogner

MaskRay requested a review from smithp35 March 6, 2024 07:54

llvmbot added lld lld:ELF labels Mar 6, 2024

MaskRay requested a review from mysterymath March 6, 2024 07:54

smithp35 reviewed Mar 6, 2024

View reviewed changes

improve comment and tests

5aca4c0

Created using spr 1.3.5-bogner

smithp35 approved these changes Mar 6, 2024

View reviewed changes

add missing "add" in a comment.

f31fc0b

Created using spr 1.3.5-bogner

mysterymath approved these changes Mar 7, 2024

View reviewed changes

MaskRay added 2 commits March 6, 2024 17:16

update message

5b4050d

Created using spr 1.3.5-bogner

improve a test

ac23c0a

Created using spr 1.3.5-bogner

MaskRay merged commit 551e20d into main Mar 7, 2024

MaskRay deleted the users/MaskRay/spr/elf-reject-error-prone-meta-characters-in-input-section-description branch March 7, 2024 01:20

MaskRay mentioned this pull request Mar 7, 2024

LLD linker script parser accepts too much in input section wildcard patterns #81804

Closed

This was referenced Jul 22, 2024

[ELF] A new Lexer for Linker Script #99920

Open

[ELF] Added struct Token and changed next() and peek() to return Token #100180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ELF] Reject error-prone meta characters in input section description #84130

[ELF] Reject error-prone meta characters in input section description #84130

Uh oh!

MaskRay commented Mar 6, 2024 •

edited

Loading

Uh oh!

llvmbot commented Mar 6, 2024 •

edited

Loading

Uh oh!

smithp35 left a comment

Uh oh!

smithp35 Mar 6, 2024

Uh oh!

MaskRay Mar 6, 2024

Uh oh!

smithp35 Mar 6, 2024

Uh oh!

MaskRay Mar 6, 2024

Uh oh!

smithp35 left a comment

Uh oh!

mysterymath left a comment

Uh oh!

Uh oh!

[ELF] Reject error-prone meta characters in input section description #84130

[ELF] Reject error-prone meta characters in input section description #84130

Uh oh!

Conversation

MaskRay commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smithp35 left a comment

Choose a reason for hiding this comment

Uh oh!

smithp35 Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

MaskRay Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

smithp35 Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

MaskRay Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

smithp35 left a comment

Choose a reason for hiding this comment

Uh oh!

mysterymath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaskRay commented Mar 6, 2024 •

edited

Loading

llvmbot commented Mar 6, 2024 •

edited

Loading