Skip to content

[libc] Make hdrgen support macro_header YAML field. #123265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 14, 2025

Conversation

frobtech
Copy link
Contributor

A macro can specify macro_header instead of macro_value to
indicate that an llvm-libc-macros/ header file is supposed to
define this macro. This is used for dlfcn.h, which previously
bogusly redefined the RTLD_* macros to empty.

A macro can specify macro_header instead of macro_value to
indicate that an llvm-libc-macros/ header file is supposed to
define this macro.  This is used for dlfcn.h, which previously
bogusly redefined the RTLD_* macros to empty.
Copy link

github-actions bot commented Jan 17, 2025

✅ With the latest revision this PR passed the Python code formatter.

@frobtech frobtech marked this pull request as ready for review January 17, 2025 01:39
@llvmbot llvmbot added the libc label Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-libc

Author: Roland McGrath (frobtech)

Changes

A macro can specify macro_header instead of macro_value to
indicate that an llvm-libc-macros/ header file is supposed to
define this macro. This is used for dlfcn.h, which previously
bogusly redefined the RTLD_* macros to empty.


Full diff: https://github.com/llvm/llvm-project/pull/123265.diff

9 Files Affected:

  • (modified) libc/include/dlfcn.yaml (+4-4)
  • (modified) libc/utils/hdrgen/gpu_headers.py (+1)
  • (modified) libc/utils/hdrgen/header.py (+18-6)
  • (modified) libc/utils/hdrgen/macro.py (+7-3)
  • (modified) libc/utils/hdrgen/tests/expected_output/test_header.h (+13-1)
  • (modified) libc/utils/hdrgen/tests/input/test_small.h.def (-1)
  • (modified) libc/utils/hdrgen/tests/input/test_small.yaml (+5)
  • (modified) libc/utils/hdrgen/tests/test_integration.py (+1)
  • (modified) libc/utils/hdrgen/yaml_to_classes.py (+7-1)
diff --git a/libc/include/dlfcn.yaml b/libc/include/dlfcn.yaml
index 9e8803cb5fa785..78bbeff4e60d98 100644
--- a/libc/include/dlfcn.yaml
+++ b/libc/include/dlfcn.yaml
@@ -2,13 +2,13 @@ header: dlfcn.h
 header_template: dlfcn.h.def
 macros:
   - macro_name: RTLD_LAZY
-    macro_value: null
+    macro_header: dlfcn-macros.h
   - macro_name: RTLD_NOW
-    macro_value: null
+    macro_header: dlfcn-macros.h
   - macro_name: RTLD_GLOBAL
-    macro_value: null
+    macro_header: dlfcn-macros.h
   - macro_name: RTLD_LOCAL
-    macro_value: null
+    macro_header: dlfcn-macros.h
 types: []
 enums: []
 objects: []
diff --git a/libc/utils/hdrgen/gpu_headers.py b/libc/utils/hdrgen/gpu_headers.py
index 8c4ff6e08b1126..505adfa8fee8a4 100644
--- a/libc/utils/hdrgen/gpu_headers.py
+++ b/libc/utils/hdrgen/gpu_headers.py
@@ -8,6 +8,7 @@
 
 from header import HeaderFile
 
+
 class GpuHeaderFile(HeaderFile):
     def __str__(self):
         content = []
diff --git a/libc/utils/hdrgen/header.py b/libc/utils/hdrgen/header.py
index 9339acceaf7a97..7e14e768de2bbe 100644
--- a/libc/utils/hdrgen/header.py
+++ b/libc/utils/hdrgen/header.py
@@ -6,6 +6,8 @@
 #
 # ==-------------------------------------------------------------------------==#
 
+from pathlib import PurePath
+
 
 class HeaderFile:
     def __init__(self, name):
@@ -32,8 +34,20 @@ def add_object(self, object):
     def add_function(self, function):
         self.functions.append(function)
 
+    def includes(self):
+        return sorted(
+            {
+                PurePath("llvm-libc-macros") / macro.header
+                for macro in self.macros
+                if macro.header is not None
+            }
+        )
+
     def public_api(self):
-        content = [""]
+        header_dir = PurePath(self.name).parent
+        content = [
+            f'#include "{file.relative_to(header_dir)}"' for file in self.includes()
+        ] + [""]
 
         for macro in self.macros:
             content.append(f"{macro}\n")
@@ -76,11 +90,9 @@ def public_api(self):
             content.append(f"#endif // {current_guard}")
             content.append("")
 
-        for object in self.objects:
-            content.append(str(object))
+        content.extend(str(object) for object in self.objects)
         if self.objects:
-            content.append("\n__END_C_DECLS")
-        else:
-            content.append("__END_C_DECLS")
+            content.append("")
+        content.append("__END_C_DECLS")
 
         return "\n".join(content)
diff --git a/libc/utils/hdrgen/macro.py b/libc/utils/hdrgen/macro.py
index 9a712f2a1c743b..e1eb678f7a832e 100644
--- a/libc/utils/hdrgen/macro.py
+++ b/libc/utils/hdrgen/macro.py
@@ -8,12 +8,16 @@
 
 
 class Macro:
-    def __init__(self, name, value=None):
+    def __init__(self, name, value=None, header=None):
         self.name = name
         self.value = value
+        self.header = header
 
     def __str__(self):
+        if self.header != None:
+            return f"""#ifndef {self.name}
+#error "{self.name} should be defined by llvm-libc-macros/{self.header}"
+#endif"""
         if self.value != None:
             return f"#define {self.name} {self.value}"
-        else:
-            return f"#define {self.name}"
+        return f"#define {self.name}"
diff --git a/libc/utils/hdrgen/tests/expected_output/test_header.h b/libc/utils/hdrgen/tests/expected_output/test_header.h
index a777976134b045..97870df58d0e87 100644
--- a/libc/utils/hdrgen/tests/expected_output/test_header.h
+++ b/libc/utils/hdrgen/tests/expected_output/test_header.h
@@ -11,13 +11,25 @@
 
 #include "__llvm-libc-common.h"
 #include "llvm-libc-macros/float16-macros.h"
-#include "llvm-libc-macros/test_small-macros.h"
 #include "llvm-libc-types/float128.h"
 
+#include "llvm-libc-macros/test_more-macros.h"
+#include "llvm-libc-macros/test_small-macros.h"
+
 #define MACRO_A 1
 
 #define MACRO_B 2
 
+#define MACRO_C
+
+#ifndef MACRO_D
+#error "MACRO_D should be defined by llvm-libc-macros/test_small-macros.h"
+#endif
+
+#ifndef MACRO_E
+#error "MACRO_E should be defined by llvm-libc-macros/test_more-macros.h"
+#endif
+
 #include <llvm-libc-types/type_a.h>
 #include <llvm-libc-types/type_b.h>
 
diff --git a/libc/utils/hdrgen/tests/input/test_small.h.def b/libc/utils/hdrgen/tests/input/test_small.h.def
index 075be957b81368..587b163b68d964 100644
--- a/libc/utils/hdrgen/tests/input/test_small.h.def
+++ b/libc/utils/hdrgen/tests/input/test_small.h.def
@@ -11,7 +11,6 @@
 
 #include "__llvm-libc-common.h"
 #include "llvm-libc-macros/float16-macros.h"
-#include "llvm-libc-macros/test_small-macros.h"
 #include "llvm-libc-types/float128.h"
 
 %%public_api()
diff --git a/libc/utils/hdrgen/tests/input/test_small.yaml b/libc/utils/hdrgen/tests/input/test_small.yaml
index 1d4b2990a30027..d5bb2bbfe4468c 100644
--- a/libc/utils/hdrgen/tests/input/test_small.yaml
+++ b/libc/utils/hdrgen/tests/input/test_small.yaml
@@ -5,6 +5,11 @@ macros:
     macro_value: 1
   - macro_name: MACRO_B
     macro_value: 2
+  - macro_name: MACRO_C
+  - macro_name: MACRO_D
+    macro_header: test_small-macros.h
+  - macro_name: MACRO_E
+    macro_header: test_more-macros.h
 types:
   - type_name: type_a
   - type_name: type_b
diff --git a/libc/utils/hdrgen/tests/test_integration.py b/libc/utils/hdrgen/tests/test_integration.py
index 49cb08cd1b339d..110a218fbde8f3 100644
--- a/libc/utils/hdrgen/tests/test_integration.py
+++ b/libc/utils/hdrgen/tests/test_integration.py
@@ -10,6 +10,7 @@ def setUp(self):
         self.output_dir = TestHeaderGenIntegration.output_dir
         self.source_dir = Path(__file__).parent
         self.main_script = self.source_dir.parent / "main.py"
+        self.maxDiff = 80*100
 
     def run_script(self, yaml_file, output_file, entry_points):
         command = [
diff --git a/libc/utils/hdrgen/yaml_to_classes.py b/libc/utils/hdrgen/yaml_to_classes.py
index d64feafc260b7c..4a3f48633dc1dc 100644
--- a/libc/utils/hdrgen/yaml_to_classes.py
+++ b/libc/utils/hdrgen/yaml_to_classes.py
@@ -38,7 +38,13 @@ def yaml_to_classes(yaml_data, header_class, entry_points=None):
     header.template_file = yaml_data.get("header_template")
 
     for macro_data in yaml_data.get("macros", []):
-        header.add_macro(Macro(macro_data["macro_name"], macro_data["macro_value"]))
+        header.add_macro(
+            Macro(
+                macro_data["macro_name"],
+                macro_data.get("macro_value"),
+                macro_data.get("macro_header"),
+            )
+        )
 
     types = yaml_data.get("types", [])
     sorted_types = sorted(types, key=lambda x: x["type_name"])

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jan 17, 2025

This is used for dlfcn.h, which previously bogusly redefined the RTLD_* macros to empty.

/nods yeah that's silly.

Looking into what macro_value is...perhaps we can just remove that entirely? It's only used in stdio, threads, dlfcn and I don't see a good reason why.

The current convention is that if one of these .h files defines macros, then we have (stdio example):

libc/include/stdio.h.def #include "include/llvm-libc-macros/stdio-macros.h"

so I don't see the point of macro_value at all; the current uses of macro_value should be replaced with #defines in libc/include/llvm-libc-macros/{header file name}-macros.h.

macro_value also breaks the convention I'm using in docgen to detect if we define a macro or not (for a macro defined in stdio.h, I grep libc/llvm-libc-macros/stdio-macros.h for it).

So that's 2 reasons to remove macro_value from hdrgen.

If we remove macro_value, and clean up places where we're not sticking to convention (which breaks macro definition detection for docgen), then I don't think we need macro_header either.

@nickdesaulniers
Copy link
Member

Ah, so this is more of "add checks to generated headers that if a given macro isn't defined as expected, we will #error out.

Coincidentally, I just reviewed a patch to bionic where I learned that they have dedicated header tests (seemingly only for POSIX IIUC): https://android.googlesource.com/platform/bionic/+/main/tests/headers/posix/

So I guess those are two different means of testing the generated headers. The trade off being that with the approach in this PR, our headers are bigger; we pay that cost of the preprocessor checking, rechecking, rechecking, ... for every compile, regardless of whether or not we just mean to do a sanity check.

Privately, you seemed to indicate that your had more patches that build on top of this PR. Can you describe a little more how you plan to extend this feature?

Otherwise, I think I mildly prefer bionic's approach of just having unit tests. We already have this concept in the tests under libc/test/include/. I don't see why this could just be:

#ifndef RTLD_LAZY
#error "WTF"
#endif
#ifndef ...

which is effectively what the bionic tests are doing. That way, we keep our generated headers smaller (FWIW), and can remove macro_name/macro_value support from hdrgen (#124957). cc @enh-google

@frobtech
Copy link
Contributor Author

Ah, so this is more of "add checks to generated headers that if a given macro isn't defined as expected, we will #error out.

Coincidentally, I just reviewed a patch to bionic where I learned that they have dedicated header tests (seemingly only for POSIX IIUC): https://android.googlesource.com/platform/bionic/+/main/tests/headers/posix/

So I guess those are two different means of testing the generated headers. The trade off being that with the approach in this PR, our headers are bigger; we pay that cost of the preprocessor checking, rechecking, rechecking, ... for every compile, regardless of whether or not we just mean to do a sanity check.

That was really an incidental feature. I've removed it now, because I concur that having the public headers do this self-test logic is not really the best way to do it. We can get the same automation in testing by having a new mode for hdrgen to produce test sources to perform these things. But we can make that a later addition.

Privately, you seemed to indicate that your had more patches that build on top of this PR. Can you describe a little more how you plan to extend this feature?

I do have some additional changes ready, and I'm considering more along these lines. The overall thrust is to get more of the information to have its single source of truth in YAML and thus be machine-usable and machine-checkable; and also to reduce redundant information that has to be hand-maintained in general. To those ends, my next changes are:

  • Deduce some of the types list from the function signatures so that maintainers don't need to keep the types list up to date wrt to the signatures used, only wrt the formal API contract of types a header is specifically meant to define per some standard.
  • Get all #include lines to be generated from YAML in some fashion rather than hand-maintained in .h.def files. One motivation for this is the very easy-to-make errors about #include "llvm-..." vs #include "../llvm-..." and the like, as well as just general typos and unnecessary or omitted includes. The introduction of macro_header is the first step in this direction. I'd like to do more.
  • An include/merge mechanism for YAML files so that we can have a single source of truth (in only one YAML file) for each function signature, even for cases where the (standard or de facto) API contract has certain functions declared in multiple headers (which aren't supposed to include each other); the current motivating example is the malloc suite in stdlib.h and malloc.h. This could also be addressed by generating more separate "internal" headers that each public header would #include for those declarations. But so far it seems easier in various regards (both for us and for users) to just generate the same function declarations in multiple headers.
  • With those features, I think we can get rid of many of the .h.def files entirely and have only the YAML that has to be maintained manually. More features may be required to eliminate more of them, and probably a few oddball cases like assert.h will need to keep custom .h.def files because they don't quite fit any of the usual patterns.

In this vein, I'd like to do more to get more information maintained solely in YAML and machine-usable over time. These aren't on my critical path, which right now is frankly just about producing a sufficient malloc.h in a way that doesn't feel icky to maintain. But I want to drive our long-term direction on header (and other) maintenance in this direction. The documentation is something you've already been interested in folding into this regime. I'd like to do more. For example, getting more thorough about the standards lists not only for the headers but at the individual symbol granularity within headers. As well as annotating generated documentation about portability status of specific APIs, this help us generate standards-conformance test sorts of things (for standard identifier name space constraints and the like). If we choose to one day generate headers using fine-grained feature-test macro based conformance to different standards or standard editions that hide/expose some symbols, then a thoroughly-specified and fine-grained source of truth in YAML could allow us to do that entirely automatically, etc.

Otherwise, I think I mildly prefer bionic's approach of just having unit tests. We already have this concept in the tests under libc/test/include/. I don't see why this could just be:

#ifndef RTLD_LAZY
#error "WTF"
#endif
#ifndef ...

which is effectively what the bionic tests are doing. That way, we keep our generated headers smaller (FWIW), [...]

I agree separate tests are better. Unlike normal unit tests, this kind of test is entirely mechanical and contains no semantic checking. That means that manual maintenance of such tests is especially likely to have cut&paste and omission errors that make it not test as much as intended. So I think it's better to generate these tests from the YAML files as the single source of truth of who is supposed to define what.

[...] and can remove macro_name/macro_value support from hdrgen (#124957). cc @enh-google

I am not in favor of that direction. As I outlined above, I want to push very much in the direction of having the sources of truth be more in YAML, not less--at least for the "what is the API contract?" truths. For macros, so far that's just the name. There are some macros where the value is also standards-required, such as the _POSIX_* constants in <limits.h> under POSIX. Eventually for those I think it will make sense to have the source of truth for the value be part of the "API spec" in YAML, though I'm not worrying about that right now. As I mentioned above for all kinds of symbols, I want the "what standard requires it" bit to go here too. But right now, we have macro_name as the "it is part of the API" indicator, and I don't want to lose that. For many of the standard-specified macros, they specify something about the C type of the expression, and that's something we should probably encode here as well for the benefit of generated conformance tests.

The foo-macros.h pattern is used fairly commonly, but it's not universal. I also don't think it's necessarily the best pattern. If we do eventually go for fine-grained (generated) conditionals on what names will be defined by a header, then for macros whose values are implementation details (and perhaps not the same for all llvm-libc configurations) we may want to go to a per-macro (or per-few-macros) header model as we do for types. I don't know what we'll want to do, but I think the direction here with the macro_header gives us the freedom to explore that in the ways we might want to.

@frobtech frobtech merged commit 6d3bfdd into llvm:main Feb 14, 2025
13 checks passed
@frobtech frobtech deleted the p/libc-hdrgen-macro branch February 14, 2025 00:07
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
A macro can specify macro_header instead of macro_value to
indicate that an llvm-libc-macros/ header file is supposed to
define this macro.  This is used for dlfcn.h, which previously
bogusly redefined the RTLD_* macros to empty.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
A macro can specify macro_header instead of macro_value to
indicate that an llvm-libc-macros/ header file is supposed to
define this macro.  This is used for dlfcn.h, which previously
bogusly redefined the RTLD_* macros to empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants