Skip to content

[ObjC]: Make type encoding safe in symbol names #77797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

qmfrederik
Copy link
Contributor

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

@qmfrederik
Copy link
Contributor Author

/cc @davidchisnall

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-clang

Author: Frederik Carlier (qmfrederik)

Changes

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

Full diff: https://github.com/llvm/llvm-project/pull/77797.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+18-13)
  • (modified) clang/test/CodeGenObjC/encode-test-6.m (+36-8)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index cd1a0b6a130ff0..6e76765cea3db1 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -1431,12 +1431,25 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                 const std::string &TypeEncoding) override {
     return GetConstantSelector(Sel, TypeEncoding);
   }
+  std::string GetSymbolNameForTypeEncoding(const std::string &TypeEncoding) {
+    std::string MangledTypes = std::string(TypeEncoding);
+    // @ is used as a special character in ELF symbol names (used for symbol
+    // versioning), so mangle the name to not include it.  Replace it with a
+    // character that is not a valid type encoding character (and, being
+    // non-printable, never will be!)
+    if (CGM.getTriple().isOSBinFormatELF())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '@', '\1');
+    // = in dll exported names causes lld to fail when linking on Windows.
+    if (CGM.getTriple().isOSWindows())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '=', '\2');
+    return MangledTypes;
+  }
   llvm::Constant  *GetTypeString(llvm::StringRef TypeEncoding) {
     if (TypeEncoding.empty())
       return NULLPtr;
-    std::string MangledTypes = std::string(TypeEncoding);
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(std::string(TypeEncoding));
     std::string TypesVarName = ".objc_sel_types_" + MangledTypes;
     auto *TypesGlobal = TheModule.getGlobalVariable(TypesVarName);
     if (!TypesGlobal) {
@@ -1453,13 +1466,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
   }
   llvm::Constant *GetConstantSelector(Selector Sel,
                                       const std::string &TypeEncoding) override {
-    // @ is used as a special character in symbol names (used for symbol
-    // versioning), so mangle the name to not include it.  Replace it with a
-    // character that is not a valid type encoding character (and, being
-    // non-printable, never will be!)
-    std::string MangledTypes = TypeEncoding;
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(TypeEncoding);
     auto SelVarName = (StringRef(".objc_selector_") + Sel.getAsString() + "_" +
       MangledTypes).str();
     if (auto *GV = TheModule.getNamedGlobal(SelVarName))
@@ -1671,9 +1678,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                         const ObjCIvarDecl *Ivar) override {
     std::string TypeEncoding;
     CGM.getContext().getObjCEncodingForType(Ivar->getType(), TypeEncoding);
-    // Prevent the @ from being interpreted as a symbol version.
-    std::replace(TypeEncoding.begin(), TypeEncoding.end(),
-      '@', '\1');
+    TypeEncoding = GetSymbolNameForTypeEncoding(TypeEncoding);
     const std::string Name = "__objc_ivar_offset_" + ID->getNameAsString()
       + '.' + Ivar->getNameAsString() + '.' + TypeEncoding;
     return Name;
diff --git a/clang/test/CodeGenObjC/encode-test-6.m b/clang/test/CodeGenObjC/encode-test-6.m
index 261eb7fb3368b2..1eb6808c7d12f8 100644
--- a/clang/test/CodeGenObjC/encode-test-6.m
+++ b/clang/test/CodeGenObjC/encode-test-6.m
@@ -1,5 +1,14 @@
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -emit-llvm -o %t %s
-// RUN: FileCheck < %t %s
+// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
 
 typedef struct {} Z;
 
@@ -13,8 +22,17 @@ -(void)bar:(Z)a {}
 -(void)foo:(Z)a: (char*)b : (Z)c : (double) d {}
 @end
 
-// CHECK: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
-// CHECK: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+// CHECK-DWARF: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
+// CHECK-DWARF: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+
+// CHECK-MINGW: @".objc_sel_types_v16@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-MINGW: @".objc_sel_types_v32@0:8{?\02}16*16{?\02}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
+
+// CHECK-MSVC: @".objc_sel_types_v20@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v20@0:8{?=}16\00"
+// CHECK-MSVC: @".objc_sel_types_v40@0:8{?\02}16*20{?\02}28d32" = linkonce_odr hidden constant [26 x i8] c"v40@0:8{?=}16*20{?=}28d32\00"
+
+// CHECK-ELF: @".objc_sel_types_v16\010:8{?=}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-ELF: @".objc_sel_types_v32\010:8{?=}16*16{?=}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
 
 @interface NSObject @end
 
@@ -31,7 +49,10 @@ @implementation BABugExample
 @synthesize property = _property;
 @end
 
-// CHECK: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-DWARF: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-MINGW: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-MSVC: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-ELF @".objc_sel_types_\0116\010:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
 
 @class SCNCamera;
 typedef SCNCamera C3DCamera;
@@ -48,7 +69,10 @@ @implementation SCNCamera
     C3DCameraStorage _storage;
 }
 @end
-// CHECK: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-DWARF: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-MINGW: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-MSVC: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-ELF: @"__objc_ivar_offset_SCNCamera._storage.{?=\01}"
 
 int i;
 typeof(@encode(typeof(i))) e = @encode(typeof(i));
@@ -56,6 +80,10 @@ @implementation SCNCamera
 {
     return e;
 }
-// CHECK: @e ={{.*}} global [2 x i8] c"i\00", align 1
-// CHECK: define{{.*}} ptr @Test()
-// CHECK: ret ptr @e
+// CHECK-DWARF: @e ={{.*}} global [2 x i8] c"i\00", align 1
+// CHECK-DWARF: define{{.*}} ptr @Test()
+// CHECK-DWARF: ret ptr @e
+
+// CHECK-MSVC: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-MINGW: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-ELF: @e = global [2 x i8] c"i\00", align 1

@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-clang-codegen

Author: Frederik Carlier (qmfrederik)

Changes

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

Full diff: https://github.com/llvm/llvm-project/pull/77797.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+18-13)
  • (modified) clang/test/CodeGenObjC/encode-test-6.m (+36-8)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index cd1a0b6a130ff0..6e76765cea3db1 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -1431,12 +1431,25 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                 const std::string &TypeEncoding) override {
     return GetConstantSelector(Sel, TypeEncoding);
   }
+  std::string GetSymbolNameForTypeEncoding(const std::string &TypeEncoding) {
+    std::string MangledTypes = std::string(TypeEncoding);
+    // @ is used as a special character in ELF symbol names (used for symbol
+    // versioning), so mangle the name to not include it.  Replace it with a
+    // character that is not a valid type encoding character (and, being
+    // non-printable, never will be!)
+    if (CGM.getTriple().isOSBinFormatELF())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '@', '\1');
+    // = in dll exported names causes lld to fail when linking on Windows.
+    if (CGM.getTriple().isOSWindows())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '=', '\2');
+    return MangledTypes;
+  }
   llvm::Constant  *GetTypeString(llvm::StringRef TypeEncoding) {
     if (TypeEncoding.empty())
       return NULLPtr;
-    std::string MangledTypes = std::string(TypeEncoding);
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(std::string(TypeEncoding));
     std::string TypesVarName = ".objc_sel_types_" + MangledTypes;
     auto *TypesGlobal = TheModule.getGlobalVariable(TypesVarName);
     if (!TypesGlobal) {
@@ -1453,13 +1466,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
   }
   llvm::Constant *GetConstantSelector(Selector Sel,
                                       const std::string &TypeEncoding) override {
-    // @ is used as a special character in symbol names (used for symbol
-    // versioning), so mangle the name to not include it.  Replace it with a
-    // character that is not a valid type encoding character (and, being
-    // non-printable, never will be!)
-    std::string MangledTypes = TypeEncoding;
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(TypeEncoding);
     auto SelVarName = (StringRef(".objc_selector_") + Sel.getAsString() + "_" +
       MangledTypes).str();
     if (auto *GV = TheModule.getNamedGlobal(SelVarName))
@@ -1671,9 +1678,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                         const ObjCIvarDecl *Ivar) override {
     std::string TypeEncoding;
     CGM.getContext().getObjCEncodingForType(Ivar->getType(), TypeEncoding);
-    // Prevent the @ from being interpreted as a symbol version.
-    std::replace(TypeEncoding.begin(), TypeEncoding.end(),
-      '@', '\1');
+    TypeEncoding = GetSymbolNameForTypeEncoding(TypeEncoding);
     const std::string Name = "__objc_ivar_offset_" + ID->getNameAsString()
       + '.' + Ivar->getNameAsString() + '.' + TypeEncoding;
     return Name;
diff --git a/clang/test/CodeGenObjC/encode-test-6.m b/clang/test/CodeGenObjC/encode-test-6.m
index 261eb7fb3368b2..1eb6808c7d12f8 100644
--- a/clang/test/CodeGenObjC/encode-test-6.m
+++ b/clang/test/CodeGenObjC/encode-test-6.m
@@ -1,5 +1,14 @@
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -emit-llvm -o %t %s
-// RUN: FileCheck < %t %s
+// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
 
 typedef struct {} Z;
 
@@ -13,8 +22,17 @@ -(void)bar:(Z)a {}
 -(void)foo:(Z)a: (char*)b : (Z)c : (double) d {}
 @end
 
-// CHECK: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
-// CHECK: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+// CHECK-DWARF: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
+// CHECK-DWARF: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+
+// CHECK-MINGW: @".objc_sel_types_v16@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-MINGW: @".objc_sel_types_v32@0:8{?\02}16*16{?\02}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
+
+// CHECK-MSVC: @".objc_sel_types_v20@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v20@0:8{?=}16\00"
+// CHECK-MSVC: @".objc_sel_types_v40@0:8{?\02}16*20{?\02}28d32" = linkonce_odr hidden constant [26 x i8] c"v40@0:8{?=}16*20{?=}28d32\00"
+
+// CHECK-ELF: @".objc_sel_types_v16\010:8{?=}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-ELF: @".objc_sel_types_v32\010:8{?=}16*16{?=}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
 
 @interface NSObject @end
 
@@ -31,7 +49,10 @@ @implementation BABugExample
 @synthesize property = _property;
 @end
 
-// CHECK: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-DWARF: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-MINGW: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-MSVC: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-ELF @".objc_sel_types_\0116\010:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
 
 @class SCNCamera;
 typedef SCNCamera C3DCamera;
@@ -48,7 +69,10 @@ @implementation SCNCamera
     C3DCameraStorage _storage;
 }
 @end
-// CHECK: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-DWARF: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-MINGW: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-MSVC: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-ELF: @"__objc_ivar_offset_SCNCamera._storage.{?=\01}"
 
 int i;
 typeof(@encode(typeof(i))) e = @encode(typeof(i));
@@ -56,6 +80,10 @@ @implementation SCNCamera
 {
     return e;
 }
-// CHECK: @e ={{.*}} global [2 x i8] c"i\00", align 1
-// CHECK: define{{.*}} ptr @Test()
-// CHECK: ret ptr @e
+// CHECK-DWARF: @e ={{.*}} global [2 x i8] c"i\00", align 1
+// CHECK-DWARF: define{{.*}} ptr @Test()
+// CHECK-DWARF: ret ptr @e
+
+// CHECK-MSVC: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-MINGW: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-ELF: @e = global [2 x i8] c"i\00", align 1

Copy link

github-actions bot commented Jan 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@qmfrederik qmfrederik force-pushed the mingw-type-encoding-structure branch from 5511454 to e71dd48 Compare January 11, 2024 17:07
Comment on lines 2 to 11
// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s

// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s

// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s

// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to use a temporary file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, no need for a temp file here. Fixed that.

Type encodings are part of symbol names in the Objective C ABI.  Replace
characters which are reseved in symbol names:

- ELF: avoid including '@' characters in type encodings
- Windows: avoid including '=' characters in type encodings
@qmfrederik qmfrederik force-pushed the mingw-type-encoding-structure branch from e71dd48 to 34933c7 Compare January 11, 2024 20:02
@davidchisnall davidchisnall merged commit 3168192 into llvm:main Jan 12, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Type encodings are part of symbol names in the Objective C ABI. Replace
characters which are reseved in symbol names:

- ELF: avoid including '@' characters in type encodings
- Windows: avoid including '=' characters in type encodings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants