Skip to content

[RISCV][Disassemble] Ensure the comment stream of the disassembler is set. #125962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fpetrogalli
Copy link
Member

No description provided.

@fpetrogalli fpetrogalli requested review from anemet and topperc February 5, 2025 23:54
@llvmbot llvmbot added backend:RISC-V mc Machine (object) code labels Feb 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Francesco Petrogalli (fpetrogalli)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/125962.diff

3 Files Affected:

  • (modified) llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h (+5)
  • (modified) llvm/lib/MC/MCDisassembler/MCDisassembler.cpp (+6-2)
  • (modified) llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp (+1)
diff --git a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
index 901bfcf5fa54f4e..cb773d0f9589c46 100644
--- a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
+++ b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
@@ -131,6 +131,11 @@ class MCDisassembler {
   ///                   MCDisassembler::SoftFail if the instruction was
   ///                                            disassemblable but invalid,
   ///                   MCDisassembler::Fail if the instruction was invalid.
+  ///
+  /// Note: to avoid potential issues caused by the field
+  /// `MCDisassembler::CommentStream` being set nullptr (its default
+  /// value), an implementation of this method should make sure to set
+  /// `CommentStream = &CStream;`.
   virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
                                       ArrayRef<uint8_t> Bytes, uint64_t Address,
                                       raw_ostream &CStream) const = 0;
diff --git a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
index 6aa4b0e4fcb99da..0dfa17eb1346fdb 100644
--- a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
+++ b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
@@ -29,17 +29,21 @@ bool MCDisassembler::tryAddingSymbolicOperand(MCInst &Inst, int64_t Value,
                                               uint64_t Address, bool IsBranch,
                                               uint64_t Offset, uint64_t OpSize,
                                               uint64_t InstSize) const {
-  if (Symbolizer)
+  if (Symbolizer) {
+    assert(CommentStream && "CommentStream is not set.");
     return Symbolizer->tryAddingSymbolicOperand(Inst, *CommentStream, Value,
                                                 Address, IsBranch, Offset,
                                                 OpSize, InstSize);
+  }
   return false;
 }
 
 void MCDisassembler::tryAddingPcLoadReferenceComment(int64_t Value,
                                                      uint64_t Address) const {
-  if (Symbolizer)
+  if (Symbolizer) {
+    assert(CommentStream && "CommentStream is not set.");
     Symbolizer->tryAddingPcLoadReferenceComment(*CommentStream, Value, Address);
+  }
 }
 
 void MCDisassembler::setSymbolizer(std::unique_ptr<MCSymbolizer> Symzer) {
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 3ec465810b1d114..01648ec0cbe9ea9 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -789,6 +789,7 @@ DecodeStatus RISCVDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                                                ArrayRef<uint8_t> Bytes,
                                                uint64_t Address,
                                                raw_ostream &CS) const {
+  CommentStream = &CS;
   // It's a 16 bit instruction if bit 0 and 1 are not 0b11.
   if ((Bytes[0] & 0b11) != 0b11)
     return getInstruction16(MI, Size, Bytes, Address, CS);

@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2025

@llvm/pr-subscribers-mc

Author: Francesco Petrogalli (fpetrogalli)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/125962.diff

3 Files Affected:

  • (modified) llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h (+5)
  • (modified) llvm/lib/MC/MCDisassembler/MCDisassembler.cpp (+6-2)
  • (modified) llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp (+1)
diff --git a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
index 901bfcf5fa54f4e..cb773d0f9589c46 100644
--- a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
+++ b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
@@ -131,6 +131,11 @@ class MCDisassembler {
   ///                   MCDisassembler::SoftFail if the instruction was
   ///                                            disassemblable but invalid,
   ///                   MCDisassembler::Fail if the instruction was invalid.
+  ///
+  /// Note: to avoid potential issues caused by the field
+  /// `MCDisassembler::CommentStream` being set nullptr (its default
+  /// value), an implementation of this method should make sure to set
+  /// `CommentStream = &CStream;`.
   virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
                                       ArrayRef<uint8_t> Bytes, uint64_t Address,
                                       raw_ostream &CStream) const = 0;
diff --git a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
index 6aa4b0e4fcb99da..0dfa17eb1346fdb 100644
--- a/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
+++ b/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
@@ -29,17 +29,21 @@ bool MCDisassembler::tryAddingSymbolicOperand(MCInst &Inst, int64_t Value,
                                               uint64_t Address, bool IsBranch,
                                               uint64_t Offset, uint64_t OpSize,
                                               uint64_t InstSize) const {
-  if (Symbolizer)
+  if (Symbolizer) {
+    assert(CommentStream && "CommentStream is not set.");
     return Symbolizer->tryAddingSymbolicOperand(Inst, *CommentStream, Value,
                                                 Address, IsBranch, Offset,
                                                 OpSize, InstSize);
+  }
   return false;
 }
 
 void MCDisassembler::tryAddingPcLoadReferenceComment(int64_t Value,
                                                      uint64_t Address) const {
-  if (Symbolizer)
+  if (Symbolizer) {
+    assert(CommentStream && "CommentStream is not set.");
     Symbolizer->tryAddingPcLoadReferenceComment(*CommentStream, Value, Address);
+  }
 }
 
 void MCDisassembler::setSymbolizer(std::unique_ptr<MCSymbolizer> Symzer) {
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 3ec465810b1d114..01648ec0cbe9ea9 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -789,6 +789,7 @@ DecodeStatus RISCVDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                                                ArrayRef<uint8_t> Bytes,
                                                uint64_t Address,
                                                raw_ostream &CS) const {
+  CommentStream = &CS;
   // It's a 16 bit instruction if bit 0 and 1 are not 0b11.
   if ((Bytes[0] & 0b11) != 0b11)
     return getInstruction16(MI, Size, Bytes, Address, CS);

@fpetrogalli fpetrogalli force-pushed the ensure-disassembler-has-comment-stream branch from 4e1f72a to 91a73ec Compare February 6, 2025 20:55
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fpetrogalli fpetrogalli merged commit b40ef2c into llvm:main Feb 6, 2025
6 of 7 checks passed
@tedwoodward
Copy link

@fpetrogalli not all disassemblers set CommentStream. This causes disassemblers to assert, including in lldb. I don't think there should be asserts in MCDisassembler.cpp

@tedwoodward
Copy link

bin/lldb ~/lldb_test/factorial
(lldb) target create "/usr2/tedwood/lldb_test/factorial"
Current executable set to '/usr2/tedwood/lldb_test/factorial' (hexagon).
(lldb) dis -n main
lldb: /local/mnt/workspace/tedwood/src/llvm-top/llvm/lib/MC/MCDisassembler/MCDisassembler.cpp:33: bool llvm::MCDisassembler::tryAddingSymbolicOperand(llvm::MCInst &, int64_t, uint64_t, bool, uint64_t, uint64_t, uint64_t) const: Assertion `CommentStream && "CommentStream is not set."' failed.
LLDB diagnostics will be written to /tmp/diagnostics-4b838b
Please include the directory content when filing a bug report
Abort (core dumped)

@fpetrogalli
Copy link
Member Author

@fpetrogalli not all disassemblers set CommentStream. This causes disassemblers to assert, including in lldb. I don't think there should be asserts in MCDisassembler.cpp

Maybe all disassembler should set CommentStream for their target? Both functions are dereferencing CommentStream, I think it is better to leave the assertion, or alternatively to modify Symbolizer->tryAddingSymbolicOperand to handle the nullptr case.

    assert(CommentStream && "CommentStream is not set.");
    return Symbolizer->tryAddingSymbolicOperand(Inst, *CommentStream, Value,
                                                Address, IsBranch, Offset,
                                                OpSize, InstSize);
  }

@topperc
Copy link
Collaborator

topperc commented Feb 10, 2025

There used to be code to handle a nullptr there, but it was removed in 1ea5ce6

@topperc
Copy link
Collaborator

topperc commented Feb 10, 2025

Maybe we just need to set the comment stream in HexagonDisassembler::getInstruction?

@fpetrogalli
Copy link
Member Author

fpetrogalli commented Feb 10, 2025

Maybe we just need to set the comment stream in HexagonDisassembler::getInstruction?

Or maybe just revert my commit and 1ea5ce6 and get back to the assumption that the comment stream might be null?

CC @MaskRay

@tedwoodward
Copy link

tedwoodward commented Feb 10, 2025

Maybe we just need to set the comment stream in HexagonDisassembler::getInstruction?

I'm talking to our guys internally about that; but there are other cores out there that will need to do the same thing. Plus, I don't like the idea of library code asserting on things. This is fine for something that is OK to die on a fatal error, but the debugger should never die on bad input. Just throw an error, and let the user continue.

@fpetrogalli
Copy link
Member Author

Maybe we just need to set the comment stream in HexagonDisassembler::getInstruction?

[...] This is fine for something that is OK to die on a fatal error, but the debugger should never die on bad input. Just throw an error, and let the user continue.

Is this the case? I don't think there is anything wrong with the input being debugged. I would consider this a bug of the debugger (assuming of course we decide to keep 1ea5ce6).

@tedwoodward
Copy link

Is this the case? I don't think there is anything wrong with the input being debugged. I would consider this a bug of the debugger (assuming of course we decide to keep 1ea5ce6).

The problem is when lldb instantiates the disassembler, if the target MCDisassembler code doesn't set CommentStream, it will be nullptr and the assert will fire, which takes down lldb. lldb doesn't control the CommentStream, but it not being set will cause lldb to crash. Maybe all the MCDisassemblers should set CommentStream, but that wasn't required in the past, so many don't.

Debuggers inherently deal with bad input all the time, and shouldn't die when that happens, but return an error to the user.

@topperc
Copy link
Collaborator

topperc commented Feb 11, 2025

Is this the case? I don't think there is anything wrong with the input being debugged. I would consider this a bug of the debugger (assuming of course we decide to keep 1ea5ce6).

The problem is when lldb instantiates the disassembler, if the target MCDisassembler code doesn't set CommentStream, it will be nullptr and the assert will fire, which takes down lldb. lldb doesn't control the CommentStream, but it not being set will cause lldb to crash. Maybe all the MCDisassemblers should set CommentStream, but that wasn't required in the past, so many don't.

Debuggers inherently deal with bad input all the time, and shouldn't die when that happens, but return an error to the user.

Is Hexagon just getting lucky that the null comment stream was never written to by the Symbolizer which would cause segmentation fault.

quic-areg added a commit to quic-areg/llvm-project that referenced this pull request Feb 11, 2025
quic-areg added a commit that referenced this pull request Feb 11, 2025
@tedwoodward
Copy link

Is Hexagon just getting lucky that the null comment stream was never written to by the Symbolizer which would cause segmentation fault.

I'm not sure. I haven't worked in the guts of the disassembler in many years!

It looks like we pushed up a change to add CommentStream to the Hexagon disassembler; thanks @quic-areg !

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
@fpetrogalli fpetrogalli deleted the ensure-disassembler-has-comment-stream branch February 12, 2025 23:46
flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants