Skip to content

[clang][bytecode] Handle __builtin_wcslen #119187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2024
Merged

Conversation

tbaederr
Copy link
Contributor

@tbaederr tbaederr commented Dec 9, 2024

Handle different char widths in builtin_strlen.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-clang

Author: Timm Baeder (tbaederr)

Changes

Handle different char widths in builtin_strlen.


Full diff: https://github.com/llvm/llvm-project/pull/119187.diff

2 Files Affected:

  • (modified) clang/lib/AST/ByteCode/InterpBuiltin.cpp (+18-3)
  • (modified) clang/test/AST/ByteCode/builtin-functions.cpp (+12)
diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index 24b630d0455e14..df5b4fcb3f0fbb 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -243,7 +243,7 @@ static bool interp__builtin_strlen(InterpState &S, CodePtr OpPC,
   unsigned ID = Func->getBuiltinID();
   const Pointer &StrPtr = getParam<Pointer>(Frame, 0);
 
-  if (ID == Builtin::BIstrlen)
+  if (ID == Builtin::BIstrlen || ID == Builtin::BIwcslen)
     diagnoseNonConstexprBuiltin(S, OpPC, ID);
 
   if (!CheckArray(S, OpPC, StrPtr))
@@ -256,7 +256,7 @@ static bool interp__builtin_strlen(InterpState &S, CodePtr OpPC,
     return false;
 
   assert(StrPtr.getFieldDesc()->isPrimitiveArray());
-
+  unsigned ElemSize = StrPtr.getFieldDesc()->getElemSize();
   size_t Len = 0;
   for (size_t I = StrPtr.getIndex();; ++I, ++Len) {
     const Pointer &ElemPtr = StrPtr.atIndex(I);
@@ -264,7 +264,20 @@ static bool interp__builtin_strlen(InterpState &S, CodePtr OpPC,
     if (!CheckRange(S, OpPC, ElemPtr, AK_Read))
       return false;
 
-    uint8_t Val = ElemPtr.deref<uint8_t>();
+    uint32_t Val;
+    switch (ElemSize) {
+    case 1:
+      Val = ElemPtr.deref<uint8_t>();
+      break;
+    case 2:
+      Val = ElemPtr.deref<uint16_t>();
+      break;
+    case 4:
+      Val = ElemPtr.deref<uint32_t>();
+      break;
+    default:
+      llvm_unreachable("Unsupported char size");
+    }
     if (Val == 0)
       break;
   }
@@ -1859,6 +1872,8 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const Function *F,
     break;
   case Builtin::BI__builtin_strlen:
   case Builtin::BIstrlen:
+  case Builtin::BI__builtin_wcslen:
+  case Builtin::BIwcslen:
     if (!interp__builtin_strlen(S, OpPC, Frame, F, Call))
       return false;
     break;
diff --git a/clang/test/AST/ByteCode/builtin-functions.cpp b/clang/test/AST/ByteCode/builtin-functions.cpp
index e2121a54e15768..4c21496d3972c9 100644
--- a/clang/test/AST/ByteCode/builtin-functions.cpp
+++ b/clang/test/AST/ByteCode/builtin-functions.cpp
@@ -15,6 +15,10 @@
 #error "huh?"
 #endif
 
+extern "C" {
+  typedef decltype(sizeof(int)) size_t;
+  extern size_t wcslen(const wchar_t *p);
+}
 
 namespace strcmp {
   constexpr char kFoobar[6] = {'f','o','o','b','a','r'};
@@ -93,6 +97,14 @@ constexpr const char *a = "foo\0quux";
   constexpr char d[] = { 'f', 'o', 'o' }; // no nul terminator.
   constexpr int bad = __builtin_strlen(d); // both-error {{constant expression}} \
                                            // both-note {{one-past-the-end}}
+
+  constexpr int wn = __builtin_wcslen(L"hello");
+  static_assert(wn == 5);
+  constexpr int wm = wcslen(L"hello"); // both-error {{constant expression}} \
+                                       // both-note {{non-constexpr function 'wcslen' cannot be used in a constant expression}}
+
+  int arr[3]; // both-note {{here}}
+  int wk = arr[wcslen(L"hello")]; // both-warning {{array index 5}}
 }
 
 namespace nan {

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comment

size_t Len = 0;
for (size_t I = StrPtr.getIndex();; ++I, ++Len) {
const Pointer &ElemPtr = StrPtr.atIndex(I);

if (!CheckRange(S, OpPC, ElemPtr, AK_Read))
return false;

uint8_t Val = ElemPtr.deref<uint8_t>();
uint32_t Val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that if ID == Builtin::BIwcslen ElemSize is the size of wchar_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Handle different char widths in builtin_strlen.
@tbaederr tbaederr merged commit 00b50c9 into llvm:main Dec 9, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants