Skip to content

Add basic char*_t support for libc (partial WG14 N2653) #90360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 30, 2024

Conversation

Febbe
Copy link
Contributor

@Febbe Febbe commented Apr 27, 2024

This PR implements a part of WG14 N2653:

  • Define C23 char8_t
  • Define C11 char16_t
  • Define C11 char32_t

Missing goals are:
- The type of UTF-8 character literals is changed from unsigned char to char8_t. (Since UTF-8 character literals already have type unsigned char, this is not a semantic change).
- New mbrtoc8() and c8rtomb() functions declared in <uchar.h> enable conversions between multibyte characters and UTF-8.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro.
- A new atomic_char8_t typedef name.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc label Apr 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 27, 2024

@llvm/pr-subscribers-libc

Author: Fabian Keßler (Febbe)

Changes
  • Define C23 char8_t
  • Define C11 char16_t
  • Define C11 char32_t

Full diff: https://github.com/llvm/llvm-project/pull/90360.diff

9 Files Affected:

  • (modified) libc/config/baremetal/api.td (+6-1)
  • (modified) libc/config/linux/x86_64/headers.txt (+1)
  • (modified) libc/include/CMakeLists.txt (+3)
  • (modified) libc/include/llvm-libc-types/CMakeLists.txt (+3)
  • (added) libc/include/llvm-libc-types/char16_t.h (+17)
  • (added) libc/include/llvm-libc-types/char32_t.h (+17)
  • (added) libc/include/llvm-libc-types/char8_t.h (+16)
  • (modified) libc/spec/spec.td (+3)
  • (modified) libc/spec/stdc.td (+3)
diff --git a/libc/config/baremetal/api.td b/libc/config/baremetal/api.td
index 25aa06aacb642e..a6547d843c85ee 100644
--- a/libc/config/baremetal/api.td
+++ b/libc/config/baremetal/api.td
@@ -85,5 +85,10 @@ def TimeAPI : PublicAPI<"time.h"> {
 }
 
 def UCharAPI : PublicAPI<"uchar.h"> {
-  let Types = ["mbstate_t"];
+  let Types = [
+    "mbstate_t",
+    "char8_t",
+    "char16_t",
+    "char32_t",
+  ];
 }
diff --git a/libc/config/linux/x86_64/headers.txt b/libc/config/linux/x86_64/headers.txt
index e51c7931942706..44d640b75e2bf7 100644
--- a/libc/config/linux/x86_64/headers.txt
+++ b/libc/config/linux/x86_64/headers.txt
@@ -29,6 +29,7 @@ set(TARGET_PUBLIC_HEADERS
     libc.include.time
     libc.include.unistd
     libc.include.wchar
+    libc.include.uchar
 
     libc.include.arpa_inet
 
diff --git a/libc/include/CMakeLists.txt b/libc/include/CMakeLists.txt
index aeef46aabfce5c..6dea8e539969d0 100644
--- a/libc/include/CMakeLists.txt
+++ b/libc/include/CMakeLists.txt
@@ -603,6 +603,9 @@ add_gen_header(
   DEPENDS
     .llvm_libc_common_h
     .llvm-libc-types.mbstate_t
+    .llvm-libc-types.char8_t
+    .llvm-libc-types.char16_t
+    .llvm-libc-types.char32_t
 )
 
 add_gen_header(
diff --git a/libc/include/llvm-libc-types/CMakeLists.txt b/libc/include/llvm-libc-types/CMakeLists.txt
index 310374fb62ffe0..c8999f3d25f4cd 100644
--- a/libc/include/llvm-libc-types/CMakeLists.txt
+++ b/libc/include/llvm-libc-types/CMakeLists.txt
@@ -90,6 +90,9 @@ add_header(tcflag_t HDR tcflag_t.h)
 add_header(struct_termios HDR struct_termios.h DEPENDS .cc_t .speed_t .tcflag_t)
 add_header(__getoptargv_t HDR __getoptargv_t.h)
 add_header(wchar_t HDR wchar_t.h)
+add_header(char8_t HDR char8_t.h)
+add_header(char16_t HDR char16_t.h)
+add_header(char32_t HDR char32_t.h)
 add_header(wint_t HDR wint_t.h)
 add_header(sa_family_t HDR sa_family_t.h)
 add_header(socklen_t HDR socklen_t.h)
diff --git a/libc/include/llvm-libc-types/char16_t.h b/libc/include/llvm-libc-types/char16_t.h
new file mode 100644
index 00000000000000..59810d0f6e5d85
--- /dev/null
+++ b/libc/include/llvm-libc-types/char16_t.h
@@ -0,0 +1,17 @@
+//===-- Definition of clock_t type ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_TYPES_CHAR8_T_H
+#define LLVM_LIBC_TYPES_CHAR8_T_H
+
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+#include <stdint.h>
+typedef uint_least16_t char16_t;
+#endif
+
+#endif // LLVM_LIBC_TYPES_CHAR8_T_H
\ No newline at end of file
diff --git a/libc/include/llvm-libc-types/char32_t.h b/libc/include/llvm-libc-types/char32_t.h
new file mode 100644
index 00000000000000..5cbd21e78a808a
--- /dev/null
+++ b/libc/include/llvm-libc-types/char32_t.h
@@ -0,0 +1,17 @@
+//===-- Definition of clock_t type ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_TYPES_CHAR8_T_H
+#define LLVM_LIBC_TYPES_CHAR8_T_H
+
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+#include <stdint.h>
+typedef uint_least32_t char32_t;
+#endif
+
+#endif // LLVM_LIBC_TYPES_CHAR8_T_H
\ No newline at end of file
diff --git a/libc/include/llvm-libc-types/char8_t.h b/libc/include/llvm-libc-types/char8_t.h
new file mode 100644
index 00000000000000..12972161c7e466
--- /dev/null
+++ b/libc/include/llvm-libc-types/char8_t.h
@@ -0,0 +1,16 @@
+//===-- Definition of clock_t type ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_TYPES_CHAR8_T_H
+#define LLVM_LIBC_TYPES_CHAR8_T_H
+
+#if !defined(__cplusplus) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
+typedef unsigned char char8_t;
+#endif
+
+#endif // LLVM_LIBC_TYPES_CHAR8_T_H
\ No newline at end of file
diff --git a/libc/spec/spec.td b/libc/spec/spec.td
index 87bf4435e16724..ea8fa4cd373cf3 100644
--- a/libc/spec/spec.td
+++ b/libc/spec/spec.td
@@ -65,6 +65,9 @@ def SizeTType : NamedType<"size_t">;
 def SizeTPtr : PtrType<SizeTType>;
 def RestrictedSizeTPtr : RestrictedPtrType<SizeTType>;
 
+def Char8TType : NamedType<"char8_t">;
+def Char16TType : NamedType<"char16_t">;
+def Char32TType : NamedType<"char32_t">;
 def WCharType : NamedType<"wchar_t">;
 def WIntType : NamedType<"wint_t">;
 
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index 01aa7c70b3b9df..88758dec643fd4 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -1396,6 +1396,9 @@ def StdC : StandardSpec<"stdc"> {
       [], // Macros
       [ //Types
         MBStateTType,
+        Char8TType,
+        Char16TType,
+        Char32TType,
       ],
       [], // Enumerations
       []

Copy link

github-actions bot commented Apr 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Febbe Febbe force-pushed the Add-C23-char-_t-support branch from 68236e5 to c73c973 Compare April 28, 2024 09:29
 - Define C23 char8_t
 - Define C11 char16_t
 - Define C11 char32_t

Preparation for functions like `mbrtoc8` and `c8rtomb` which are
introduced in C23.
@Febbe Febbe force-pushed the Add-C23-char-_t-support branch from c73c973 to 1a3366e Compare April 28, 2024 09:45
@Febbe Febbe changed the title Add basic char*_t support for libc Add basic char*_t support for libc (partial WG14 N2653) Apr 28, 2024
@Febbe
Copy link
Contributor Author

Febbe commented Apr 28, 2024

@tahonermann I am asking for a review.

@Febbe Febbe force-pushed the Add-C23-char-_t-support branch from 288221b to 40e9779 Compare April 29, 2024 21:18
@Febbe
Copy link
Contributor Author

Febbe commented Apr 29, 2024

Thank you for the feedback. I applied all of your changes and added uchar (and wchar) also to the list of included headers on linux-aarch, linux-arm, linux-riscv

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good with some minor style nits

 - headers fixed
 - included `stdint-macros.h` instead of `stdint.h`
 - Updated dependencies of `char16_t` and `char32_t`
 - Added uchar support for linux-riscv
 - Added uchar & wchar support for linux-arm & linux-aarch64
 - Added UCharAPI type to linux/api.td
@Febbe Febbe force-pushed the Add-C23-char-_t-support branch from 40e9779 to 97ad4a7 Compare April 30, 2024 08:33
@Febbe
Copy link
Contributor Author

Febbe commented Apr 30, 2024

Ok, now everything should be fixed. I can squash the commits together if that's desired. But I don't have merge rights, someone has to merge this.

@michaelrj-google
Copy link
Contributor

Github automatically squashes, I can merge this for you.

@michaelrj-google michaelrj-google merged commit cd7a7a5 into llvm:main Apr 30, 2024
Copy link

@Febbe Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants