-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm-mc] Add --hex to disassemble hex bytes #119992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-mc] Add --hex to disassemble hex bytes #119992
Conversation
Created using spr 1.3.5-bogner
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-x86 Author: Fangrui Song (MaskRay) Changes
This patch adds --hex to disassemble hex pairs, similar to many other
Full diff: https://github.com/llvm/llvm-project/pull/119992.diff 5 Files Affected:
diff --git a/llvm/docs/CommandGuide/llvm-mc.rst b/llvm/docs/CommandGuide/llvm-mc.rst
index c5d2f9396dce71..ba568da6f9aeb6 100644
--- a/llvm/docs/CommandGuide/llvm-mc.rst
+++ b/llvm/docs/CommandGuide/llvm-mc.rst
@@ -92,6 +92,11 @@ End-user Options
Generate DWARF debugging info for assembly source files.
+.. option:: --hex
+
+ Take hex pairs as input for the disassembler.
+ Whitespace is ignored.
+
.. option:: --large-code-model
Create CFI directives that assume the code might be more than 2 GB.
diff --git a/llvm/test/MC/Disassembler/X86/hex-pairs.txt b/llvm/test/MC/Disassembler/X86/hex-pairs.txt
new file mode 100644
index 00000000000000..7c759a1853b96b
--- /dev/null
+++ b/llvm/test/MC/Disassembler/X86/hex-pairs.txt
@@ -0,0 +1,58 @@
+# RUN: rm -rf %t && split-file %s %t && cd %t
+# RUN: llvm-mc -triple=x86_64 --disassemble --hex a.s | FileCheck %s
+# RUN: llvm-mc -triple=x86_64 --disassemble --hex decode1.s 2>&1 | FileCheck %s --check-prefix=DECODE1 --implicit-check-not=warning:
+# RUN: not llvm-mc -triple=x86_64 --disassemble --hex decode2.s 2>&1 | FileCheck %s --check-prefix=DECODE2 --implicit-check-not=warning:
+# RUN: not llvm-mc -triple=x86_64 --disassemble --hex err1.s 2>&1 | FileCheck %s --check-prefix=ERR1 --implicit-check-not=error:
+# RUN: not llvm-mc -triple=x86_64 --disassemble --hex err2.s 2>&1 | FileCheck %s --check-prefix=ERR2 --implicit-check-not=error:
+
+#--- a.s
+4883ec08 31 # comment
+# comment
+ ed4829 c390
+[c3c3][4829c3]
+[90]
+
+# CHECK: subq $8, %rsp
+# CHECK-NEXT: xorl %ebp, %ebp
+# CHECK-NEXT: subq %rax, %rbx
+# CHECK-NEXT: nop
+# CHECK-NEXT: retq
+# CHECK-NEXT: retq
+# CHECK-NEXT: subq %rax, %rbx
+# CHECK-NEXT: nop
+# CHECK-EMPTY:
+
+#--- decode1.s
+4889
+
+# DECODE1: 1:1: warning: invalid instruction encoding
+
+#--- decode2.s
+[4889][4889] [4889]4889c3
+ [4889]
+
+# DECODE2: 1:2: warning: invalid instruction encoding
+# DECODE2: 1:8: warning: invalid instruction encoding
+# DECODE2: 1:15: warning: invalid instruction encoding
+# DECODE2: 2:3: warning: invalid instruction encoding
+
+#--- err1.s
+0x31ed
+0xcc
+
+# ERR1: 1:1: error: invalid input token
+# ERR1: 2:1: error: invalid input token
+# ERR1: xorl %ebp, %ebp
+# ERR1-NEXT: int3
+# ERR1-EMPTY:
+
+#--- err2.s
+90c
+cc
+c
+
+# ERR2: 1:3: error: expected two hex digits
+# ERR2: 3:1: error: expected two hex digits
+# ERR2: nop
+# ERR2-NEXT: int3
+# ERR2-EMPTY:
diff --git a/llvm/tools/llvm-mc/Disassembler.cpp b/llvm/tools/llvm-mc/Disassembler.cpp
index a588058437ec9a..f96ccd17f1b6c5 100644
--- a/llvm/tools/llvm-mc/Disassembler.cpp
+++ b/llvm/tools/llvm-mc/Disassembler.cpp
@@ -12,6 +12,7 @@
//===----------------------------------------------------------------------===//
#include "Disassembler.h"
+#include "llvm/ADT/StringExtras.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
@@ -94,10 +95,8 @@ static bool SkipToToken(StringRef &Str) {
}
}
-
-static bool ByteArrayFromString(ByteArrayTy &ByteArray,
- StringRef &Str,
- SourceMgr &SM) {
+static bool byteArrayFromString(ByteArrayTy &ByteArray, StringRef &Str,
+ SourceMgr &SM, bool HexPairs) {
while (SkipToToken(Str)) {
// Handled by higher level
if (Str[0] == '[' || Str[0] == ']')
@@ -109,7 +108,24 @@ static bool ByteArrayFromString(ByteArrayTy &ByteArray,
// Convert to a byte and add to the byte vector.
unsigned ByteVal;
- if (Value.getAsInteger(0, ByteVal) || ByteVal > 255) {
+ if (HexPairs) {
+ if (Next < 2) {
+ SM.PrintMessage(SMLoc::getFromPointer(Value.data()),
+ SourceMgr::DK_Error, "expected two hex digits");
+ Str = Str.substr(Next);
+ return true;
+ }
+ Next = 2;
+ unsigned C0 = hexDigitValue(Value[0]);
+ unsigned C1 = hexDigitValue(Value[1]);
+ if (C0 == -1u || C1 == -1u) {
+ SM.PrintMessage(SMLoc::getFromPointer(Value.data()),
+ SourceMgr::DK_Error, "invalid input token");
+ Str = Str.substr(Next);
+ return true;
+ }
+ ByteVal = C0 * 16 + C1;
+ } else if (Value.getAsInteger(0, ByteVal) || ByteVal > 255) {
// If we have an error, print it and skip to the end of line.
SM.PrintMessage(SMLoc::getFromPointer(Value.data()), SourceMgr::DK_Error,
"invalid input token");
@@ -130,9 +146,8 @@ static bool ByteArrayFromString(ByteArrayTy &ByteArray,
int Disassembler::disassemble(const Target &T, const std::string &Triple,
MCSubtargetInfo &STI, MCStreamer &Streamer,
MemoryBuffer &Buffer, SourceMgr &SM,
- MCContext &Ctx,
- const MCTargetOptions &MCOptions) {
-
+ MCContext &Ctx, const MCTargetOptions &MCOptions,
+ bool HexPairs) {
std::unique_ptr<const MCRegisterInfo> MRI(T.createMCRegInfo(Triple));
if (!MRI) {
errs() << "error: no register info for target " << Triple << "\n";
@@ -188,7 +203,7 @@ int Disassembler::disassemble(const Target &T, const std::string &Triple,
}
// It's a real token, get the bytes and emit them
- ErrorOccurred |= ByteArrayFromString(ByteArray, Str, SM);
+ ErrorOccurred |= byteArrayFromString(ByteArray, Str, SM, HexPairs);
if (!ByteArray.first.empty())
ErrorOccurred |=
diff --git a/llvm/tools/llvm-mc/Disassembler.h b/llvm/tools/llvm-mc/Disassembler.h
index d0226abadc630a..68f32066ccd89c 100644
--- a/llvm/tools/llvm-mc/Disassembler.h
+++ b/llvm/tools/llvm-mc/Disassembler.h
@@ -32,7 +32,7 @@ class Disassembler {
static int disassemble(const Target &T, const std::string &Triple,
MCSubtargetInfo &STI, MCStreamer &Streamer,
MemoryBuffer &Buffer, SourceMgr &SM, MCContext &Ctx,
- const MCTargetOptions &MCOptions);
+ const MCTargetOptions &MCOptions, bool HexPairs);
};
} // namespace llvm
diff --git a/llvm/tools/llvm-mc/llvm-mc.cpp b/llvm/tools/llvm-mc/llvm-mc.cpp
index 898d79b9233b9a..04d94d474df466 100644
--- a/llvm/tools/llvm-mc/llvm-mc.cpp
+++ b/llvm/tools/llvm-mc/llvm-mc.cpp
@@ -94,6 +94,10 @@ static cl::opt<bool>
cl::desc("Prefer hex format for immediate values"),
cl::cat(MCCategory));
+static cl::opt<bool>
+ HexPairs("hex", cl::desc("Take hex pairs as input for the disassembler"),
+ cl::cat(MCCategory));
+
static cl::list<std::string>
DefineSymbol("defsym",
cl::desc("Defines a symbol to be an integer constant"),
@@ -592,7 +596,7 @@ int main(int argc, char **argv) {
}
if (disassemble)
Res = Disassembler::disassemble(*TheTarget, TripleName, *STI, *Str, *Buffer,
- SrcMgr, Ctx, MCOptions);
+ SrcMgr, Ctx, MCOptions, HexPairs);
// Keep output if no errors.
if (Res == 0) {
|
Maybe require hex pairs to be space separated? Otherwise a sequence of pairs looks like a single hex number and may cause confusion regarding endianness. |
Thanks for the quick response. Actually, it's intentional to make spaces optional. The hex pair form is used by tools like To interpret 0x12345678 as 0x78 0x56 0x34 0x12, we can keep using the existing functionality, or add another option if there is sufficient demand (I personally I don't think it is useful; easily misused as the length is not clear). |
Created using spr 1.3.5-bogner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
llvm/tools/llvm-mc/Disassembler.h
Outdated
@@ -32,7 +32,7 @@ class Disassembler { | |||
static int disassemble(const Target &T, const std::string &Triple, | |||
MCSubtargetInfo &STI, MCStreamer &Streamer, | |||
MemoryBuffer &Buffer, SourceMgr &SM, MCContext &Ctx, | |||
const MCTargetOptions &MCOptions); | |||
const MCTargetOptions &MCOptions, bool HexPairs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const MCTargetOptions &MCOptions, bool HexPairs); | |
const MCTargetOptions &MCOptions, bool HexBytes); |
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner
Thanks very much for putting this together! I've been wanting a feature like this for quite a while for quick manual inspection of machine code serialized to hex. |
--disassemble
/--cdis
parses input bytes as decimal, 0bbin, 0ooct, or0xhex. While the hexadecimal digit form is most commonly used, requiring
a 0x prefix for each byte (
0x48 0x29 0xc3
) is cumbersome.Tools like xxd -p and rz-asm use a plain hex dump form without the 0x
prefix or space separator. This patch adds --hex to disassemble such hex
bytes with optional whitespace.