-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc][wchar] implement wcslen #124150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc][wchar] implement wcslen #124150
Conversation
Add internal helper, which may be reusable when implementing wmemchr, wcspbrk, wcsrchr, wcsstr. Link: llvm#121183 Link: llvm#124027 Co-authored-by: Nick Desaulniers <[email protected]>
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-libc Author: Nick Desaulniers (nickdesaulniers) ChangesAdd internal helper, which may be reusable when implementing wmemchr, wcspbrk, Link: #121183 Co-authored-by: Nick Desaulniers <[email protected]> Full diff: https://github.com/llvm/llvm-project/pull/124150.diff 12 Files Affected:
diff --git a/libc/config/gpu/amdgpu/entrypoints.txt b/libc/config/gpu/amdgpu/entrypoints.txt
index 7a1982808dfeb7..756b2cdc7496ec 100644
--- a/libc/config/gpu/amdgpu/entrypoints.txt
+++ b/libc/config/gpu/amdgpu/entrypoints.txt
@@ -261,6 +261,7 @@ set(TARGET_LIBC_ENTRYPOINTS
libc.src.time.nanosleep
# wchar.h entrypoints
+ libc.src.wchar.wcslen
libc.src.wchar.wctob
# locale.h entrypoints
diff --git a/libc/config/gpu/nvptx/entrypoints.txt b/libc/config/gpu/nvptx/entrypoints.txt
index 059dc9b20d6dd2..6b25dae158cc94 100644
--- a/libc/config/gpu/nvptx/entrypoints.txt
+++ b/libc/config/gpu/nvptx/entrypoints.txt
@@ -261,6 +261,7 @@ set(TARGET_LIBC_ENTRYPOINTS
libc.src.time.nanosleep
# wchar.h entrypoints
+ libc.src.wchar.wcslen
libc.src.wchar.wctob
# locale.h entrypoints
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index f5ba3414117682..8bf47fa952cd96 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -350,6 +350,7 @@ set(TARGET_LIBC_ENTRYPOINTS
libc.src.unistd.write
# wchar.h entrypoints
+ libc.src.wchar.wcslen
libc.src.wchar.wctob
# sys/uio.h entrypoints
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index 49a8d61b938027..f9ab28c2598d5f 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -346,6 +346,7 @@ set(TARGET_LIBC_ENTRYPOINTS
libc.src.unistd.write
# wchar.h entrypoints
+ libc.src.wchar.wcslen
libc.src.wchar.wctob
)
diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 0c1ae9561a7e69..3db9a911c59fe3 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -349,8 +349,9 @@ set(TARGET_LIBC_ENTRYPOINTS
libc.src.unistd.write
# wchar.h entrypoints
- libc.src.wchar.wctob
libc.src.wchar.btowc
+ libc.src.wchar.wcslen
+ libc.src.wchar.wctob
# sys/uio.h entrypoints
libc.src.sys.uio.writev
diff --git a/libc/include/wchar.yaml b/libc/include/wchar.yaml
index 27a5926b574554..5bbf8064c713cd 100644
--- a/libc/include/wchar.yaml
+++ b/libc/include/wchar.yaml
@@ -9,6 +9,12 @@ types:
enums: []
objects: []
functions:
+ - name: wcslen
+ standards:
+ - stdc
+ return_type: size_t
+ arguments:
+ - type: const wchar_t *
- name: wctob
standards:
- stdc
diff --git a/libc/src/wchar/CMakeLists.txt b/libc/src/wchar/CMakeLists.txt
index d4c98ea527a8f9..930c20ca673983 100644
--- a/libc/src/wchar/CMakeLists.txt
+++ b/libc/src/wchar/CMakeLists.txt
@@ -1,3 +1,20 @@
+add_header_library(
+ wide_string_utils
+ HDRS
+ wide_string_utils.h
+)
+
+add_entrypoint_object(
+ wcslen
+ SRCS
+ wcslen.cpp
+ HDRS
+ wcslen.h
+ DEPENDS
+ .wide_string_utils
+ libc.hdr.types.size_t
+ libc.hdr.types.wchar_t
+)
add_entrypoint_object(
wctob
diff --git a/libc/src/wchar/wcslen.cpp b/libc/src/wchar/wcslen.cpp
new file mode 100644
index 00000000000000..7d0d8cdf872e8d
--- /dev/null
+++ b/libc/src/wchar/wcslen.cpp
@@ -0,0 +1,23 @@
+//===-- Implementation of wcslen ------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/wchar/wcslen.h"
+
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/config.h"
+#include "src/wchar/wide_string_utils.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+LLVM_LIBC_FUNCTION(size_t, wcslen, (const wchar_t *src)) {
+ return internal::wide_string_length(src);
+}
+
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/wchar/wcslen.h b/libc/src/wchar/wcslen.h
new file mode 100644
index 00000000000000..7c022533e9b499
--- /dev/null
+++ b/libc/src/wchar/wcslen.h
@@ -0,0 +1,22 @@
+//===-- Implementation header for wcslen ----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_WCHAR_WCSLEN_H
+#define LLVM_LIBC_SRC_WCHAR_WCSLEN_H
+
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+#include "src/__support/macros/config.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+size_t wcslen (const wchar_t *src);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_WCHAR_WCSLEN_H
diff --git a/libc/src/wchar/wide_string_utils.h b/libc/src/wchar/wide_string_utils.h
new file mode 100644
index 00000000000000..dba01d28856948
--- /dev/null
+++ b/libc/src/wchar/wide_string_utils.h
@@ -0,0 +1,29 @@
+//===-- Wide String utils -------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
+#define LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
+
+#include "src/__support/macros/config.h"
+#include "hdr/types/size_t.h"
+#include "hdr/types/wchar_t.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace internal {
+
+LIBC_INLINE size_t wide_string_length(const wchar_t *src) {
+ const wchar_t *cpy = src;
+ while (*cpy)
+ ++cpy;
+ return cpy - src;
+}
+
+} // namespace internal
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_WCHAR_WIDE_STRING_UTILS_H
diff --git a/libc/test/src/wchar/CMakeLists.txt b/libc/test/src/wchar/CMakeLists.txt
index 3cc404b9c86fc5..d41e328fc9d90c 100644
--- a/libc/test/src/wchar/CMakeLists.txt
+++ b/libc/test/src/wchar/CMakeLists.txt
@@ -1,5 +1,17 @@
add_custom_target(libc_wchar_unittests)
+add_libc_test(
+ wcslen_test
+ SUITE
+ libc_wchar_unittests
+ SRCS
+ wcslen_test.cpp
+ DEPENDS
+ libc.hdr.types.size_t
+ libc.hdr.types.wchar_t
+ libc.src.wchar.wcslen
+)
+
add_libc_test(
btowc_test
SUITE
diff --git a/libc/test/src/wchar/wcslen_test.cpp b/libc/test/src/wchar/wcslen_test.cpp
new file mode 100644
index 00000000000000..fe975cea592f67
--- /dev/null
+++ b/libc/test/src/wchar/wcslen_test.cpp
@@ -0,0 +1,26 @@
+//===-- Unittests for wcslen ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/wchar/wcslen.h"
+#include "hdr/types/wchar_t.h"
+#include "hdr/types/size_t.h"
+#include "test/UnitTest/Test.h"
+
+TEST(LlvmLibcWCSLenTest, EmptyString) {
+ const wchar_t *empty = L"";
+
+ size_t result = LIBC_NAMESPACE::wcslen(empty);
+ ASSERT_EQ(size_t{0}, result);
+}
+
+TEST(LlvmLibcWCSLenTest, AnyString) {
+ const wchar_t *any = L"Hello World!";
+
+ size_t result = LIBC_NAMESPACE::wcslen(any);
+ ASSERT_EQ(size_t{12}, result);
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, feel free to land as is if you decide to keep this design.
size_t length; | ||
for (length = 0; *src; ++src, ++length) | ||
; | ||
return length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to keep string_length_byte_read
separate and template that instead of templating the whole string_length
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why we wouldn't want internal::string_length
to work on wide char strings and normal strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, reading back through the comments I see you already tried this. In the case would it make sense to have a generic template for string_length
that just calls a templated string_length_byte_read
, then a specialization for char
with the wide read variant?
// Returns the length of a string, denoted by the first occurrence | ||
// of a null terminator. | ||
LIBC_INLINE size_t string_length(const char *src) { | ||
template <typename T> LIBC_INLINE size_t string_length(const T *src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I was imagining:
template <typename T> LIBC_INLINE size_t string_length(const T *src) {
size_t length;
for (length = 0; *src; ++src, ++length)
;
return length;
}
template <char> LIBC_INLINE size_t string_length(const char *src){
#ifdef LIBC_COPT_STRING_UNSAFE_WIDE_READ
// Unsigned int is the default size for most processors, and on x86-64 it
// performs better than larger sizes when the src pointer can't be assumed to
// be aligned to a word boundary, so it's the size we use for reading the
// string a block at a time.
return string_length_wide_read<unsigned int>(src);
#else
size_t length;
for (length = 0; *src; ++src, ++length)
;
return length;
#endif
}
The reason is to avoid issues around passing incorrect types to string_length_wide_read
, I think if constexpr
is evaluated late enough in the process you might get warnings about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication between the two versions (for loop) isn't as DRY as I prefer; I don't think the above approach can be simplified to have the char
specialized version of string_length
call the non-specialized version.
The reason is to avoid issues around passing incorrect types to string_length_wide_read
The parameter of string_length_wide_read
isn't the template variable; the if constexpr
statement as written compiles without warning. https://godbolt.org/z/ajYhWbddd
Now that I am pulling in cpp::type_traits, I could add some SFINAE to string_length
so that it can only be called with char*
and wchar_t*
, if you'd like?
I don't feel strongly about any of the above, but please let me know what you prefer. I'll wait to land this until then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it written out, it's probably overcomplicating things to add the template specialization just to avoid the if constexpr
. If it compiles without warnings, it's probably fine.
I don't think we really need to add the SFINAE to limit the template. In practice I expect it'll be used for char
, wchar_t
, and maybe unsigned char
, but it will behave as expected for any other integer type. Adding the check seems like unnecessary complexity for little practical benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, once @michaelrj-google's is satisfied then I think this would be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to land as-is.
// Returns the length of a string, denoted by the first occurrence | ||
// of a null terminator. | ||
LIBC_INLINE size_t string_length(const char *src) { | ||
template <typename T> LIBC_INLINE size_t string_length(const T *src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it written out, it's probably overcomplicating things to add the template specialization just to avoid the if constexpr
. If it compiles without warnings, it's probably fine.
I don't think we really need to add the SFINAE to limit the template. In practice I expect it'll be used for char
, wchar_t
, and maybe unsigned char
, but it will behave as expected for any other integer type. Adding the check seems like unnecessary complexity for little practical benefit.
ah sorry @RossComputerGuy , 631a6e0 winded up with me as the author and github as the committer, because I used the |
Update string_utils' string_length to work with char* or wchar*, so that it may be reusable when implementing wmemchr, wcspbrk,
wcsrchr, wcsstr.
Link: #121183
Link: #124027
Co-authored-by: Nick Desaulniers [email protected]