Skip to content

[lldb] Improve identification of Dlang mangled names #93881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kastiglione
Copy link
Contributor

@kastiglione kastiglione commented May 30, 2024

Reduce false positive identification of C names as Dlang mangled names. This happens when a C function uses the prefix _D.

The Dlang ABI shows that mangled names have a length immediately following the _D prefix. This change checks for a digit after the _D prefix, when identifying the mangling scheme of a symbol. This doesn't prevent false positives entirely, but does make it less likely.

@kastiglione kastiglione requested a review from JDevlieghere as a code owner May 30, 2024 21:19
@llvmbot llvmbot added the lldb label May 30, 2024
@llvmbot
Copy link
Member

llvmbot commented May 30, 2024

@llvm/pr-subscribers-lldb

Author: Dave Lee (kastiglione)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/93881.diff

4 Files Affected:

  • (modified) lldb/source/Core/Mangled.cpp (+9-2)
  • (added) lldb/test/API/lang/c/non-mangled/Makefile (+4)
  • (added) lldb/test/API/lang/c/non-mangled/TestCNonMangled.py (+16)
  • (added) lldb/test/API/lang/c/non-mangled/main.c (+8)
diff --git a/lldb/source/Core/Mangled.cpp b/lldb/source/Core/Mangled.cpp
index 8efc4c639cca5..3142c81d12ed9 100644
--- a/lldb/source/Core/Mangled.cpp
+++ b/lldb/source/Core/Mangled.cpp
@@ -19,6 +19,7 @@
 #include "lldb/Utility/Stream.h"
 #include "lldb/lldb-enumerations.h"
 
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Demangle/Demangle.h"
 #include "llvm/Support/Compiler.h"
@@ -48,8 +49,14 @@ Mangled::ManglingScheme Mangled::GetManglingScheme(llvm::StringRef const name) {
   if (name.starts_with("_R"))
     return Mangled::eManglingSchemeRustV0;
 
-  if (name.starts_with("_D"))
-    return Mangled::eManglingSchemeD;
+  if (name.starts_with("_D")) {
+    // A dlang mangled name begins with `_D`, followed by a numeric length.
+    // See `SymbolName` and `LName` in
+    // https://dlang.org/spec/abi.html#name_mangling
+    llvm::StringRef buf = name.drop_front(2);
+    if (!buf.empty() && llvm::isDigit(buf.front()))
+      return Mangled::eManglingSchemeD;
+  }
 
   if (name.starts_with("_Z"))
     return Mangled::eManglingSchemeItanium;
diff --git a/lldb/test/API/lang/c/non-mangled/Makefile b/lldb/test/API/lang/c/non-mangled/Makefile
new file mode 100644
index 0000000000000..695335e068c0c
--- /dev/null
+++ b/lldb/test/API/lang/c/non-mangled/Makefile
@@ -0,0 +1,4 @@
+C_SOURCES := main.c
+CFLAGS_EXTRAS := -std=c99
+
+include Makefile.rules
diff --git a/lldb/test/API/lang/c/non-mangled/TestCNonMangled.py b/lldb/test/API/lang/c/non-mangled/TestCNonMangled.py
new file mode 100644
index 0000000000000..32bd778fa6eb6
--- /dev/null
+++ b/lldb/test/API/lang/c/non-mangled/TestCNonMangled.py
@@ -0,0 +1,16 @@
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.lldbtest import *
+
+
+class TestCase(TestBase):
+
+    def test_functions_having_dlang_mangling_prefix(self):
+        """
+        Ensure C functions with a '_D' prefix alone are not mistakenly treated
+        as a Dlang mangled name. A proper Dlang mangling will have digits
+        immediately following the '_D' prefix.
+        """
+        self.build()
+        _, _, thread, _ = lldbutil.run_to_name_breakpoint(self, "_Dfunction")
+        symbol = thread.frame[0].symbol
+        self.assertEqual(symbol.GetDisplayName(), "_Dfunction")
diff --git a/lldb/test/API/lang/c/non-mangled/main.c b/lldb/test/API/lang/c/non-mangled/main.c
new file mode 100644
index 0000000000000..ad9d86e5c25a8
--- /dev/null
+++ b/lldb/test/API/lang/c/non-mangled/main.c
@@ -0,0 +1,8 @@
+#include <stdio.h>
+
+void _Dfunction() {}
+
+int main() {
+  _Dfunction();
+  return 0;
+}

Copy link

github-actions bot commented May 30, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Collaborator

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a little bit raises the question why we don't do the same checks for _R and _Z.

It would be even better if the various demanglers in llvm had a quick "could this be mine" check so we didn't have to encode this in lldb.

But you don't have to fix everything to be allowed to fix one thing.

@kastiglione
Copy link
Contributor Author

This a little bit raises the question why we don't do the same checks for _R and _Z.

I asked myself the same. Turns out Rust doesn't have a number following the _R prefix, so it's not as simple. Given that C++ is much more load bearing, I didn't want to include changes that affect C++ in this diff. I don't want the Dlang functionality to be reverted if something goes wrong with an equivalent C++ change.

It would be even better if the various demanglers in llvm had a quick "could this be mine" check so we didn't have to encode this in lldb.

I agree. That can be a next step.

@Michael137
Copy link
Member

This a little bit raises the question why we don't do the same checks for _R and _Z.

It would be even better if the various demanglers in llvm had a quick "could this be mine" check so we didn't have to encode this in lldb.

But you don't have to fix everything to be allowed to fix one thing.

For _Z it would be tricky to do such a check because the number of symbols that could follow _Z is quite varied. See libcxxabi/test/test_demangle.pass.cpp for numerous examples.

@bulbazord
Copy link
Member

Change itself looks fine to me.

For _Z it would be tricky to do such a check because the number of symbols that could follow _Z is quite varied. See libcxxabi/test/test_demangle.pass.cpp for numerous examples.

Agreed. For example, the mangled name _ZNK4lldb8SBTarget7IsValidEv corresponds to lldb::SBTarget::IsValid() const.

@bulbazord
Copy link
Member

Also, something to keep in mind: If Dlang ever decides to change their mangled name scheme in the future, this change may become wrong then. I'm not sure what commitments the D language project has for their ABI stability though.

@kastiglione
Copy link
Contributor Author

Thanks Alex, Michael. I didn't even look closely at C++ mangling because I knew I didn't want to bundle any C++ changes with this smaller scoped patch.

@jimingham
Copy link
Collaborator

Also, something to keep in mind: If Dlang ever decides to change their mangled name scheme in the future, this change may become wrong then. I'm not sure what commitments the D language project has for their ABI stability though.

That's why this shouldn't be in lldb in the long run, we should ask the D demangler in llvm (if we don't have a D demangler then we should just be ignoring D mangled symbols). We can't guarantee we'll grok anything newer than the D in your lldb, but that seems reasonable. But that way we wouldn't have to track this in lldb.

@kastiglione kastiglione merged commit 5a02a9a into llvm:main May 31, 2024
5 checks passed
@kastiglione kastiglione deleted the lldb-Improve-identification-of-Dlang-mangled-names branch May 31, 2024 18:20
@gulfemsavrun
Copy link
Contributor

We started seeing test failures in LLDBCoreTests:

Script:
--
/b/s/w/ir/x/w/llvm_build/tools/lldb/unittests/Core/./LLDBCoreTests --gtest_filter=MangledTest.EmptyForInvalidDLangName
--
lldb/unittests/Core/MangledTest.cpp:89: Failure
Expected equality of these values:
  ""
  the_demangled.GetCString()
    Which is: "_DDD"


lldb/unittests/Core/MangledTest.cpp:89
Expected equality of these values:
  ""
  the_demangled.GetCString()
    Which is: "_DDD"

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8746397023380830177/overview

@kastiglione
Copy link
Contributor Author

@gulfemsavrun apologies, fix is here: #94046

kastiglione added a commit that referenced this pull request May 31, 2024
Follow up to #93881. Updates missed tests and handles `_Dmain`.
@gulfemsavrun
Copy link
Contributor

@gulfemsavrun apologies, fix is here: #94046

Thanks, this fixed the issue.

@DavidSpickett
Copy link
Collaborator

The test is still failing on Windows on Arm, I will figure it out.

@kastiglione
Copy link
Contributor Author

@DavidSpickett thanks, is there a link to a log? I'm curious.

@DavidSpickett
Copy link
Collaborator

Fix is #94196.

There isn't anything to log really, the function just didn't have a symbol on Windows.

kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Jun 3, 2024
Reduce false positive identification of C names as Dlang mangled names. This happens
when a C function uses the prefix `_D`.

The [Dlang ABI](https://dlang.org/spec/abi.html#name_mangling) shows that mangled names
have a length immediately following the `_D` prefix. This change checks for a digit
after the `_D` prefix, when identifying the mangling scheme of a symbol. This doesn't
prevent false positives entirely, but does make it less likely.

(cherry picked from commit 5a02a9a)
kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Jun 3, 2024
Follow up to llvm#93881. Updates missed tests and handles `_Dmain`.

(cherry picked from commit 68fdc1c)
@labath
Copy link
Collaborator

labath commented Jun 4, 2024

Fix is #94196.

There isn't anything to log really, the function just didn't have a symbol on Windows.

AIUI, symtabs just aren't a thing on windows. You either have debug info, or you have the exported symbols (aka .dynsym). There's no inbetween state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants