Skip to content

[libc][wchar] implement wcslen #124150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 23, 2025
Merged

Conversation

nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers commented Jan 23, 2025

Update string_utils' string_length to work with char* or wchar*, so that it may be reusable when implementing wmemchr, wcspbrk,
wcsrchr, wcsstr.

Link: #121183
Link: #124027

Co-authored-by: Nick Desaulniers [email protected]

Add internal helper, which may be reusable when implementing wmemchr, wcspbrk,
wcsrchr, wcsstr.

Link: llvm#121183
Link: llvm#124027

Co-authored-by: Nick Desaulniers <[email protected]>
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libc

Author: Nick Desaulniers (nickdesaulniers)

Changes

Add internal helper, which may be reusable when implementing wmemchr, wcspbrk,
wcsrchr, wcsstr.

Link: #121183
Link: #124027

Co-authored-by: Nick Desaulniers <[email protected]>


Full diff: https://github.com/llvm/llvm-project/pull/124150.diff

12 Files Affected:

  • (modified) libc/config/gpu/amdgpu/entrypoints.txt (+1)
  • (modified) libc/config/gpu/nvptx/entrypoints.txt (+1)
  • (modified) libc/config/linux/aarch64/entrypoints.txt (+1)
  • (modified) libc/config/linux/riscv/entrypoints.txt (+1)
  • (modified) libc/config/linux/x86_64/entrypoints.txt (+2-1)
  • (modified) libc/include/wchar.yaml (+6)
  • (modified) libc/src/wchar/CMakeLists.txt (+17)
  • (added) libc/src/wchar/wcslen.cpp (+23)
  • (added) libc/src/wchar/wcslen.h (+22)
  • (added) libc/src/wchar/wide_string_utils.h (+29)
  • (modified) libc/test/src/wchar/CMakeLists.txt (+12)
  • (added) libc/test/src/wchar/wcslen_test.cpp (+26)
diff --git a/libc/config/gpu/amdgpu/entrypoints.txt b/libc/config/gpu/amdgpu/entrypoints.txt
index 7a1982808dfeb7..756b2cdc7496ec 100644
--- a/libc/config/gpu/amdgpu/entrypoints.txt
+++ b/libc/config/gpu/amdgpu/entrypoints.txt
@@ -261,6 +261,7 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.time.nanosleep
 
     # wchar.h entrypoints
+    libc.src.wchar.wcslen
     libc.src.wchar.wctob
 
     # locale.h entrypoints
diff --git a/libc/config/gpu/nvptx/entrypoints.txt b/libc/config/gpu/nvptx/entrypoints.txt
index 059dc9b20d6dd2..6b25dae158cc94 100644
--- a/libc/config/gpu/nvptx/entrypoints.txt
+++ b/libc/config/gpu/nvptx/entrypoints.txt
@@ -261,6 +261,7 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.time.nanosleep
 
     # wchar.h entrypoints
+    libc.src.wchar.wcslen
     libc.src.wchar.wctob
 
     # locale.h entrypoints
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index f5ba3414117682..8bf47fa952cd96 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -350,6 +350,7 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.unistd.write
 
     # wchar.h entrypoints
+    libc.src.wchar.wcslen
     libc.src.wchar.wctob
 
     # sys/uio.h entrypoints
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index 49a8d61b938027..f9ab28c2598d5f 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -346,6 +346,7 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.unistd.write
 
     # wchar.h entrypoints
+    libc.src.wchar.wcslen
     libc.src.wchar.wctob
 )
 
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 0c1ae9561a7e69..3db9a911c59fe3 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -349,8 +349,9 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.unistd.write
 
     # wchar.h entrypoints
-    libc.src.wchar.wctob
     libc.src.wchar.btowc
+    libc.src.wchar.wcslen
+    libc.src.wchar.wctob
 
     # sys/uio.h entrypoints
     libc.src.sys.uio.writev
diff --git a/libc/include/wchar.yaml b/libc/include/wchar.yaml
index 27a5926b574554..5bbf8064c713cd 100644
--- a/libc/include/wchar.yaml
+++ b/libc/include/wchar.yaml
@@ -9,6 +9,12 @@ types:
 enums: []
 objects: []
 functions:
+  - name: wcslen
+    standards:
+      - stdc
+    return_type: size_t
+    arguments:
+      - type: const wchar_t *
   - name: wctob
     standards:
       - stdc
diff --git a/libc/src/wchar/CMakeLists.txt b/libc/src/wchar/CMakeLists.txt
index d4c98ea527a8f9..930c20ca673983 100644
--- a/libc/src/wchar/CMakeLists.txt
+++ b/libc/src/wchar/CMakeLists.txt
@@ -1,3 +1,20 @@
+add_header_library(
+  wide_string_utils
+  HDRS
+    wide_string_utils.h
+)
+
+add_entrypoint_object(
+  wcslen
+  SRCS
+    wcslen.cpp
+  HDRS
+    wcslen.h
+  DEPENDS
+    .wide_string_utils
+    libc.hdr.types.size_t
+    libc.hdr.types.wchar_t
+)
 
 add_entrypoint_object(
   wctob
diff --git a/libc/src/wchar/wcslen.cpp b/libc/src/wchar/wcslen.cpp
new file mode 100644
index 00000000000000..7d0d8cdf872e8d
--- /dev/null
+++ b/libc/src/wchar/wcslen.cpp
@@ -0,0 +1,23 @@
+//===-- Implementation of wcslen ------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/wchar/wcslen.h"
+
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/config.h"
+#include "src/wchar/wide_string_utils.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+LLVM_LIBC_FUNCTION(size_t, wcslen, (const wchar_t *src)) {
+  return internal::wide_string_length(src);
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/wchar/wcslen.h b/libc/src/wchar/wcslen.h
new file mode 100644
index 00000000000000..7c022533e9b499
--- /dev/null
+++ b/libc/src/wchar/wcslen.h
@@ -0,0 +1,22 @@
+//===-- Implementation header for wcslen ----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_WCHAR_WCSLEN_H
+#define LLVM_LIBC_SRC_WCHAR_WCSLEN_H
+
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+#include "src/__support/macros/config.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+size_t wcslen (const wchar_t *src);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_WCHAR_WCSLEN_H
diff --git a/libc/src/wchar/wide_string_utils.h b/libc/src/wchar/wide_string_utils.h
new file mode 100644
index 00000000000000..dba01d28856948
--- /dev/null
+++ b/libc/src/wchar/wide_string_utils.h
@@ -0,0 +1,29 @@
+//===-- Wide String utils -------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
+#define LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
+
+#include "src/__support/macros/config.h"
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace internal {
+
+LIBC_INLINE size_t wide_string_length(const wchar_t *src) {
+  const wchar_t *cpy = src;
+  while (*cpy)
+    ++cpy;
+  return cpy - src;
+}
+
+} // namespace internal
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif //  LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
diff --git a/libc/test/src/wchar/CMakeLists.txt b/libc/test/src/wchar/CMakeLists.txt
index 3cc404b9c86fc5..d41e328fc9d90c 100644
--- a/libc/test/src/wchar/CMakeLists.txt
+++ b/libc/test/src/wchar/CMakeLists.txt
@@ -1,5 +1,17 @@
 add_custom_target(libc_wchar_unittests)
 
+add_libc_test(
+  wcslen_test
+  SUITE
+    libc_wchar_unittests
+  SRCS
+    wcslen_test.cpp
+  DEPENDS
+    libc.hdr.types.size_t
+    libc.hdr.types.wchar_t
+    libc.src.wchar.wcslen
+)
+
 add_libc_test(
   btowc_test
   SUITE
diff --git a/libc/test/src/wchar/wcslen_test.cpp b/libc/test/src/wchar/wcslen_test.cpp
new file mode 100644
index 00000000000000..fe975cea592f67
--- /dev/null
+++ b/libc/test/src/wchar/wcslen_test.cpp
@@ -0,0 +1,26 @@
+//===-- Unittests for wcslen ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/wchar/wcslen.h"
+#include "hdr/types/wchar_t.h"
+#include "hdr/types/size_t.h"
+#include "test/UnitTest/Test.h"
+
+TEST(LlvmLibcWCSLenTest, EmptyString) {
+  const wchar_t *empty = L"";
+
+  size_t result = LIBC_NAMESPACE::wcslen(empty);
+  ASSERT_EQ(size_t{0}, result);
+}
+
+TEST(LlvmLibcWCSLenTest, AnyString) {
+  const wchar_t *any = L"Hello World!";
+
+  size_t result = LIBC_NAMESPACE::wcslen(any);
+  ASSERT_EQ(size_t{12}, result);
+}

Copy link

github-actions bot commented Jan 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, feel free to land as is if you decide to keep this design.

Comment on lines +94 to +97
size_t length;
for (length = 0; *src; ++src, ++length)
;
return length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to keep string_length_byte_read separate and template that instead of templating the whole string_length function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why we wouldn't want internal::string_length to work on wide char strings and normal strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, reading back through the comments I see you already tried this. In the case would it make sense to have a generic template for string_length that just calls a templated string_length_byte_read, then a specialization for char with the wide read variant?

// Returns the length of a string, denoted by the first occurrence
// of a null terminator.
LIBC_INLINE size_t string_length(const char *src) {
template <typename T> LIBC_INLINE size_t string_length(const T *src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I was imagining:

template <typename T> LIBC_INLINE size_t string_length(const T *src) {
  size_t length;
  for (length = 0; *src; ++src, ++length)
    ;
  return length;
}

template <char> LIBC_INLINE size_t string_length(const char *src){
#ifdef LIBC_COPT_STRING_UNSAFE_WIDE_READ
  // Unsigned int is the default size for most processors, and on x86-64 it
  // performs better than larger sizes when the src pointer can't be assumed to
  // be aligned to a word boundary, so it's the size we use for reading the
  // string a block at a time.
  return string_length_wide_read<unsigned int>(src);
#else
  size_t length;
  for (length = 0; *src; ++src, ++length)
    ;
  return length;
#endif
}

The reason is to avoid issues around passing incorrect types to string_length_wide_read, I think if constexpr is evaluated late enough in the process you might get warnings about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication between the two versions (for loop) isn't as DRY as I prefer; I don't think the above approach can be simplified to have the char specialized version of string_length call the non-specialized version.


The reason is to avoid issues around passing incorrect types to string_length_wide_read

The parameter of string_length_wide_read isn't the template variable; the if constexpr statement as written compiles without warning. https://godbolt.org/z/ajYhWbddd


Now that I am pulling in cpp::type_traits, I could add some SFINAE to string_length so that it can only be called with char* and wchar_t*, if you'd like?

I don't feel strongly about any of the above, but please let me know what you prefer. I'll wait to land this until then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it written out, it's probably overcomplicating things to add the template specialization just to avoid the if constexpr. If it compiles without warnings, it's probably fine.

I don't think we really need to add the SFINAE to limit the template. In practice I expect it'll be used for char, wchar_t, and maybe unsigned char, but it will behave as expected for any other integer type. Adding the check seems like unnecessary complexity for little practical benefit.

Copy link
Member

@RossComputerGuy RossComputerGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, once @michaelrj-google's is satisfied then I think this would be good to go.

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to land as-is.

// Returns the length of a string, denoted by the first occurrence
// of a null terminator.
LIBC_INLINE size_t string_length(const char *src) {
template <typename T> LIBC_INLINE size_t string_length(const T *src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it written out, it's probably overcomplicating things to add the template specialization just to avoid the if constexpr. If it compiles without warnings, it's probably fine.

I don't think we really need to add the SFINAE to limit the template. In practice I expect it'll be used for char, wchar_t, and maybe unsigned char, but it will behave as expected for any other integer type. Adding the check seems like unnecessary complexity for little practical benefit.

@nickdesaulniers nickdesaulniers merged commit 631a6e0 into llvm:main Jan 23, 2025
11 checks passed
@nickdesaulniers nickdesaulniers deleted the wcslen branch January 23, 2025 21:33
@nickdesaulniers
Copy link
Member Author

ah sorry @RossComputerGuy , 631a6e0 winded up with me as the author and github as the committer, because I used the gh command line util to merge. My intent was for you to be the author of record and myself to be the committer of record. My apologies. If you feel strongly; I can revert+reland with this corrected, though it is churn.

basioli-k added a commit to basioli-k/llvm-project that referenced this pull request Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants