Skip to content

Revise the modules document for clarity #90237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 8, 2024

Conversation

AaronBallman
Copy link
Collaborator

The intention isn't to add or change the information provided, but to improve clarity through some grammar fixes, improvements to the markdown, and so forth.

The intention isn't to add or change the information provided, but to
improve clarity through some grammar fixes, improvements to the
markdown, and so forth.
@AaronBallman AaronBallman added documentation c++20 clang:modules C++20 modules and Clang Header Modules labels Apr 26, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 26, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Aaron Ballman (AaronBallman)

Changes

The intention isn't to add or change the information provided, but to improve clarity through some grammar fixes, improvements to the markdown, and so forth.


Patch is 84.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90237.diff

1 Files Affected:

  • (modified) clang/docs/StandardCPlusPlusModules.rst (+561-572)
diff --git a/clang/docs/StandardCPlusPlusModules.rst b/clang/docs/StandardCPlusPlusModules.rst
index ee57fb5da64857..e0ceb2582725c2 100644
--- a/clang/docs/StandardCPlusPlusModules.rst
+++ b/clang/docs/StandardCPlusPlusModules.rst
@@ -8,79 +8,60 @@ Standard C++ Modules
 Introduction
 ============
 
-The term ``modules`` has a lot of meanings. For the users of Clang, modules may
-refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``,
-etc.) or ``Standard C++ Modules``. The implementation of all these kinds of modules in Clang
-has a lot of shared code, but from the perspective of users, their semantics and
-command line interfaces are very different. This document focuses on
-an introduction of how to use standard C++ modules in Clang.
-
-There is already a detailed document about `Clang modules <Modules.html>`_, it
-should be helpful to read `Clang modules <Modules.html>`_ if you want to know
-more about the general idea of modules. Since standard C++ modules have different semantics
-(and work flows) from `Clang modules`, this page describes the background and use of
-Clang with standard C++ modules.
-
-Modules exist in two forms in the C++ Language Specification. They can refer to
-either "Named Modules" or to "Header Units". This document covers both forms.
+The term ``modules`` has a lot of meanings. For Clang users, modules may refer
+to ``Objective-C Modules``, `Clang Modules <Modules.html>`_ (also called
+``Clang Header Modules``, etc.) or ``C++20 Modules`` (or
+``Standard C++ Modules``). The implementation of all these kinds of modules in
+Clang shares a lot of code, but from the perspective of users, their semantics
+and command line interfaces are very different. This document focuses on an
+introduction of focusing on the use of C++20 modules in Clang. In the remainder
+of this document, the term ``modules`` will refer to Standard C++20 modules and
+the term ``Clang modules`` will refer to the Clang modules extension.
+
+Modules exist in two forms in the C++ Standard. They can refer to either
+"Named Modules" or "Header Units". This document covers both forms.
 
 Standard C++ Named modules
 ==========================
 
-This document was intended to be a manual first and foremost, however, we consider it helpful to
-introduce some language background here for readers who are not familiar with
-the new language feature. This document is not intended to be a language
-tutorial; it will only introduce necessary concepts about the
-structure and building of the project.
+In order to understand compiler behavior, it is helpful to introduce some
+language background here for readers who are not familiar with the C++ feature.
+This document is not a tutorial on C++; it only introduces necessary concepts
+to better understand use of modules for a project.
 
 Background and terminology
 --------------------------
 
-Modules
-~~~~~~~
-
-In this document, the term ``Modules``/``modules`` refers to standard C++ modules
-feature if it is not decorated by ``Clang``.
-
-Clang Modules
-~~~~~~~~~~~~~
-
-In this document, the term ``Clang Modules``/``Clang modules`` refer to Clang
-c++ modules extension. These are also known as ``Clang header modules``,
-``Clang module map modules`` or ``Clang c++ modules``.
-
 Module and module unit
 ~~~~~~~~~~~~~~~~~~~~~~
 
-A module consists of one or more module units. A module unit is a special
-translation unit. Every module unit must have a module declaration. The syntax
-of the module declaration is:
+A module consists of one or more module units. A module unit is a special kind
+of translation unit. Every module unit must have a module declaration. The
+syntax of the module declaration is:
 
 .. code-block:: c++
 
   [export] module module_name[:partition_name];
 
-Terms enclosed in ``[]`` are optional. The syntax of ``module_name`` and ``partition_name``
-in regex form corresponds to ``[a-zA-Z_][a-zA-Z_0-9\.]*``. In particular, a literal dot ``.``
-in the name has no semantic meaning (e.g. implying a hierarchy).
-
-In this document, module units are classified into:
-
-* Primary module interface unit.
-
-* Module implementation unit.
+Terms enclosed in ``[]`` are optional. ``module_name`` and ``partition_name``
+are typical C++ identifiers, except that they may contain a period (``.``).
+Note that a ``.`` in the name has no semantic meaning (e.g. implying a
+hierarchy).
 
-* Module interface partition unit.
+In this document, module units are classified as:
 
-* Internal module partition unit.
+* Primary module interface unit
+* Module implementation unit
+* Module interface partition unit
+* Internal module partition unit
 
 A primary module interface unit is a module unit whose module declaration is
-``export module module_name;``. The ``module_name`` here denotes the name of the
+``export module module_name;`` where ``module_name`` denotes the name of the
 module. A module should have one and only one primary module interface unit.
 
 A module implementation unit is a module unit whose module declaration is
-``module module_name;``. A module could have multiple module implementation
-units with the same declaration.
+``module module_name;``. Multiple module implementation units can be declared
+in the same translation unit.
 
 A module interface partition unit is a module unit whose module declaration is
 ``export module module_name:partition_name;``. The ``partition_name`` should be
@@ -95,22 +76,23 @@ In this document, we use the following umbrella terms:
 * A ``module interface unit`` refers to either a ``primary module interface unit``
   or a ``module interface partition unit``.
 
-* An ``importable module unit`` refers to either a ``module interface unit``
-  or a ``internal module partition unit``.
+* An ``importable module unit`` refers to either a ``module interface unit`` or
+  an ``internal module partition unit``.
 
 * A ``module partition unit`` refers to either a ``module interface partition unit``
-  or a ``internal module partition unit``.
+  or an ``internal module partition unit``.
 
-Built Module Interface file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Binary Module Interface
+~~~~~~~~~~~~~~~~~~~~~~~
 
-A ``Built Module Interface file`` stands for the precompiled result of an importable module unit.
-It is also called the acronym ``BMI`` generally.
+A ``Binary Module Interface`` (or ``BMI``) is the precompiled result of an
+importable module unit.
 
 Global module fragment
 ~~~~~~~~~~~~~~~~~~~~~~
 
-In a module unit, the section from ``module;`` to the module declaration is called the global module fragment.
+The ``global module fragment`` (or ``GMF``) is the code between the ``module;``
+and the module declaration within a module unit.
 
 
 How to build projects using modules
@@ -138,7 +120,7 @@ Let's see a "hello world" example that uses modules.
     return 0;
   }
 
-Then we type:
+From the command line, enter:
 
 .. code-block:: console
 
@@ -148,9 +130,9 @@ Then we type:
   Hello World!
 
 In this example, we make and use a simple module ``Hello`` which contains only a
-primary module interface unit ``Hello.cppm``.
+primary module interface unit named ``Hello.cppm``.
 
-Then let's see a little bit more complex "hello world" example which uses the 4 kinds of module units.
+A more complex "hello world" example which uses the 4 kinds of module units is:
 
 .. code-block:: c++
 
@@ -192,7 +174,7 @@ Then let's see a little bit more complex "hello world" example which uses the 4
     return 0;
   }
 
-Then we are able to compile the example by the following command:
+Back on the command line, compile the example with:
 
 .. code-block:: console
 
@@ -216,51 +198,56 @@ We explain the options in the following sections.
 How to enable standard C++ modules
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Currently, standard C++ modules are enabled automatically
-if the language standard is ``-std=c++20`` or newer.
+Standard C++ modules are enabled automatically if the language standard is
+``-std=c++20`` or newer.
 
 How to produce a BMI
 ~~~~~~~~~~~~~~~~~~~~
 
-We can generate a BMI for an importable module unit by either ``--precompile``
-or ``-fmodule-output`` flags.
+To generate a BMI for an importable module unit, use either the ``--precompile``
+or ``-fmodule-output`` command line option.
 
-The ``--precompile`` option generates the BMI as the output of the compilation and the output path
-can be specified using the ``-o`` option.
+The ``--precompile`` option generates the BMI as the output of the compilation
+and the output path can be specified using the ``-o`` option.
 
-The ``-fmodule-output`` option generates the BMI as a by-product of the compilation.
-If ``-fmodule-output=`` is specified, the BMI will be emitted the specified location. Then if
-``-fmodule-output`` and ``-c`` are specified, the BMI will be emitted in the directory of the
-output file with the name of the input file with the new extension ``.pcm``. Otherwise, the BMI
-will be emitted in the working directory with the name of the input file with the new extension
+The ``-fmodule-output`` option generates the BMI as a by-product of the
+compilation. If ``-fmodule-output=`` is specified, the BMI will be emitted to
+the specified location. If ``-fmodule-output`` and ``-c`` are specified, the
+BMI will be emitted in the directory of the output file with the name of the
+input file with the extension ``.pcm``. Otherwise, the BMI will be emitted in
+the working directory with the name of the input file with the extension
 ``.pcm``.
 
-The style to generate BMIs by ``--precompile`` is called two-phase compilation since it takes
-2 steps to compile a source file to an object file. The style to generate BMIs by ``-fmodule-output``
-is called one-phase compilation respectively. The one-phase compilation model is simpler
-for build systems to implement and the two-phase compilation has the potential to compile faster due
-to higher parallelism. As an example, if there are two module units A and B, and B depends on A, the
-one-phase compilation model would need to compile them serially, whereas the two-phase compilation
-model may be able to compile them simultaneously if the compilation from A.pcm to A.o takes a long
-time.
-
-File name requirement
-~~~~~~~~~~~~~~~~~~~~~
+Generating BMIs with ``--precompile`` is called two-phase compilation because
+it takes two steps to compile a source file to an object file. Generating BMIs
+with ``-fmodule-output`` is called one-phase compilation. The one-phase
+compilation model is simpler for build systems to implement and the two-phase
+compilation has the potential to compile faster due to higher parallelism. As
+an example, if there are two module units ``A`` and ``B``, and ``B`` depends on
+``A``, the one-phase compilation model needs to compile them serially, whereas
+the two-phase compilation model may be able to compile them simultaneously if
+the compilation from ``A.pcm`` to ``A.o`` takes a long time.
+
+File name requirements
+~~~~~~~~~~~~~~~~~~~~~~
 
-The file name of an ``importable module unit`` should end with ``.cppm``
-(or ``.ccm``, ``.cxxm``, ``.c++m``). The file name of a ``module implementation unit``
-should end with ``.cpp`` (or ``.cc``, ``.cxx``, ``.c++``).
+By convention, ``importable module unit`` files should use ``.cppm`` (or
+``.ccm``, ``.cxxm``, or ``.c++m``) as a file extension.
+``Module implementation unit`` files should use ``.cpp`` (or ``.cc``, ``.cxx``,
+or ``.c++``) as a file extension.
 
-The file name of BMIs should end with ``.pcm``.
-The file name of the BMI of a ``primary module interface unit`` should be ``module_name.pcm``.
-The file name of BMIs of ``module partition unit`` should be ``module_name-partition_name.pcm``.
+A BMI should use ``.pcm`` as a file extension. The file name of the BMI for a
+``primary module interface unit`` should be ``module_name.pcm``. The file name
+of a BMI for a ``module partition unit`` should be
+``module_name-partition_name.pcm``.
 
-If the file names use different extensions, Clang may fail to build the module.
-For example, if the filename of an ``importable module unit`` ends with ``.cpp`` instead of ``.cppm``,
-then we can't generate a BMI for the ``importable module unit`` by ``--precompile`` option
-since ``--precompile`` option now would only run preprocessor, which is equal to `-E` now.
-If we want the filename of an ``importable module unit`` ends with other suffixes instead of ``.cppm``,
-we could put ``-x c++-module`` in front of the file. For example,
+Clang may fail to build the module if different extensions are used. For
+example, if the filename of an ``importable module unit`` ends with ``.cpp``
+instead of ``.cppm``, then Clang cannot generate a BMI for the
+``importable module unit`` with the ``--precompile`` option because the
+``--precompile`` option would only run the preprocessor (``-E``). If using a
+different extension than the conventional one for an ``importable module unit``
+you can specify ``-x c++-module`` before the file. For example,
 
 .. code-block:: c++
 
@@ -279,8 +266,9 @@ we could put ``-x c++-module`` in front of the file. For example,
     return 0;
   }
 
-Now the filename of the ``module interface`` ends with ``.cpp`` instead of ``.cppm``,
-we can't compile them by the original command lines. But we are still able to do it by:
+In this example, the extension used by the ``module interface`` is ``.cpp``
+instead of ``.cppm``, so is cannot be compiled with the command lines from the
+previous example, but it can be compiled with:
 
 .. code-block:: console
 
@@ -289,12 +277,12 @@ we can't compile them by the original command lines. But we are still able to do
   $ ./Hello.out
   Hello World!
 
-Module name requirement
-~~~~~~~~~~~~~~~~~~~~~~~
+Module name requirements
+~~~~~~~~~~~~~~~~~~~~~~~~
 
-[module.unit]p1 says:
+..
 
-.. code-block:: text
+  [module.unit]p1:
 
   All module-names either beginning with an identifier consisting of std followed by zero
   or more digits or containing a reserved identifier ([lex.name]) are reserved and shall not
@@ -302,7 +290,7 @@ Module name requirement
   module-name is a reserved identifier, the module name is reserved for use by C++ implementations;
   otherwise it is reserved for future standardization.
 
-So all of the following name is not valid by default:
+So none of the following names are valid by default:
 
 .. code-block:: text
 
@@ -312,75 +300,76 @@ So all of the following name is not valid by default:
     __test
     // and so on ...
 
-If you still want to use the reserved module names for any reason, use
-``-Wno-reserved-module-identifier`` to suppress the warning.
+Using a reserved module name is strongly discouraged, but
+``-Wno-reserved-module-identifier`` can be used to suppress the warning.
 
-How to specify the dependent BMIs
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Specifying dependent BMIs
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-There are 3 methods to specify the dependent BMIs:
+There are 3 ways to specify a dependent BMI:
 
-* (1) ``-fprebuilt-module-path=<path/to/directory>``.
-* (2) ``-fmodule-file=<path/to/BMI>`` (Deprecated).
-* (3) ``-fmodule-file=<module-name>=<path/to/BMI>``.
+1. ``-fprebuilt-module-path=<path/to/directory>``.
+2. ``-fmodule-file=<path/to/BMI>`` (Deprecated).
+3. ``-fmodule-file=<module-name>=<path/to/BMI>``.
 
-The option ``-fprebuilt-module-path`` tells the compiler the path where to search for dependent BMIs.
-It may be used multiple times just like ``-I`` for specifying paths for header files. The look up rule here is:
+The ``-fprebuilt-module-path`` option specifies the path to search for
+dependent BMIs. Multiple paths may be specified, similar to using ``-I`` to
+specify a search path for header files. When importing a module ``M``, the
+compiler looks for ``M.pcm`` in the directories specified by
+``-fprebuilt-module-path``. Similarly,  When importing a partition module unit
+``M:P``, the compiler looks for ``M-P.pcm`` in the directories specified by
+``-fprebuilt-module-path``.
 
-* (1) When we import module M. The compiler would look up M.pcm in the directories specified
-  by ``-fprebuilt-module-path``.
-* (2) When we import partition module unit M:P. The compiler would look up M-P.pcm in the
-  directories specified by ``-fprebuilt-module-path``.
-
-The option ``-fmodule-file=<path/to/BMI>`` tells the compiler to load the specified BMI directly.
-The option ``-fmodule-file=<module-name>=<path/to/BMI>`` tells the compiler to load the specified BMI
-for the module specified by ``<module-name>`` when necessary. The main difference is that
+The ``-fmodule-file=<path/to/BMI>`` option causes the compiler to load the
+specified BMI directly. The ``-fmodule-file=<module-name>=<path/to/BMI>``
+option causes the compiler to load the specified BMI for the module specified
+by ``<module-name>`` when necessary. The main difference is that
 ``-fmodule-file=<path/to/BMI>`` will load the BMI eagerly, whereas
-``-fmodule-file=<module-name>=<path/to/BMI>`` will only load the BMI lazily, which is similar
-with ``-fprebuilt-module-path``. The option ``-fmodule-file=<path/to/BMI>`` for named modules is deprecated
-and is planning to be removed in future versions.
+``-fmodule-file=<module-name>=<path/to/BMI>`` will only load the BMI lazily,
+which is similar to ``-fprebuilt-module-path``. The
+``-fmodule-file=<path/to/BMI>`` option for named modules is deprecated and will
+be removed in a future version of Clang.
 
-In case all ``-fprebuilt-module-path=<path/to/directory>``, ``-fmodule-file=<path/to/BMI>`` and
-``-fmodule-file=<module-name>=<path/to/BMI>`` exist, the ``-fmodule-file=<path/to/BMI>`` option
-takes highest precedence and ``-fmodule-file=<module-name>=<path/to/BMI>`` will take the second
-highest precedence.
+When these options are specified in the same invocation of the compiler, the
+``-fmodule-file=<path/to/BMI>`` option takes precedence over
+``-fmodule-file=<module-name>=<path/to/BMI>``, which takes precedence over
+``-fprebuilt-module-path=<path/to/directory>``.
 
-We need to specify all the dependent (directly and indirectly) BMIs.
-See https://github.com/llvm/llvm-project/issues/62707 for detail.
+Note: you must specify all the (directly or indirectly) dependent BMIs
+explicitly. See https://github.com/llvm/llvm-project/issues/62707 for details.
 
-When we compile a ``module implementation unit``, we must specify the BMI of the corresponding
-``primary module interface unit``.
-Since the language specification says a module implementation unit implicitly imports
-the primary module interface unit.
+When compiling a ``module implementation unit``, the BMI of the corresponding
+``primary module interface unit`` must be specified. This is because a module
+implementation unit implicitly imports the primary module interface unit.
 
   [module.unit]p8
 
   A module-declaration that contains neither an export-keyword nor a module-partition implicitly
   imports the primary module interface unit of the module as if by a module-import-declaration.
 
-All of the 3 options ``-fprebuilt-module-path=<path/to/directory>``, ``-fmodule-file=<path/to/BMI>``
-and ``-fmodule-file=<module-name>=<path/to/BMI>`` may occur multiple times.
-For example, the command line to compile ``M.cppm`` in
-the above example could be rewritten into:
+The ``-fprebuilt-module-path=<path/to/directory>``, ``-fmodule-file=<path/to/BMI>``, 
+and ``-fmodule-file=<module-name>=<path/to/BMI>`` options may be specified
+multiple times. For example, the command line to compile ``M.cppm`` in
+the previous example could be rewritten as:
 
 .. code-block:: console
 
   $ clang++ -std=c++20 M.cppm --precompile -fmodule-file=M:interface_part=M-interface_part.pcm -fmodule-file=M:impl_part=M-impl_part.pcm -o M.pcm
 
 When there are multiple ``-fmodule-file=<module-name>=`` options for the same
-``<module-name>``, the last ``-fmodule-file=<module-name>=`` will override the previous
-``-fmodule-file=<module-name>=`` options.
+``<module-name>``, the last ``-fmodule-file=<module-name>=`` overrides the
+previous ``-fmodule-file=<module-name>=`` option.
 
-``-fprebuilt-module-path`` is more convenient and ``-fmodule-file`` is faster since
-it saves time for file lookup.
+``-fprebuilt-module-path`` is more convenient while ``-fmodule-file`` is faster
+because it saves time for file lookup.
 
 Remember that module units still have an object counterpart to the BMI
 ~~~~~~~~~~~~...
[truncated]

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that is rather shallow due to sheer amount of text, but it still took me more like 90 minutes. Hope you find it useful.

@@ -95,22 +76,23 @@ In this document, we use the following umbrella terms:
* A ``module interface unit`` refers to either a ``primary module interface unit``
or a ``module interface partition unit``.

* An ``importable module unit`` refers to either a ``module interface unit``
or a ``internal module partition unit``.
* An ``importable module unit`` refers to either a ``module interface unit`` or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole piece of documentation relies heavily on code font, and I feel we might be overusing it, like here. Maybe we should follow the example of the Standard, and use italics instead.

To be clear, we call the default BMI as Full BMI and the new introduced BMI as Reduced
BMI.
Users can use the ``-fexperimental-modules-reduced-bmi`` option to produce a
Reduced BMI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a better name in a subsequent pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Reduced BMI? The name was discussed in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/52.

This got most of the comments from Vlad and Corentin, but it does not
address comments from Erich.

Additionally, it switches the file to UTF-8 from ANSI Latin 1.
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thanks!

I left some comments about correctness or clearness. And all other change looks good to me.

* Primary module interface unit
* Module implementation unit
* Module partition interface unit
* Module partition implementation unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was suggestion about the naming from Daniel Ruoso that Module partition implementation unit may be confusing with Module implementation unit for people. So he suggest to use Internal module partition unit. Personally I feel the old name is more clear too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've reverted back to the old form.

To be clear, we call the default BMI as Full BMI and the new introduced BMI as Reduced
BMI.
Users can use the ``-fexperimental-modules-reduced-bmi`` option to produce a
Reduced BMI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Reduced BMI? The name was discussed in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/52.

@AaronBallman
Copy link
Collaborator Author

I think I've addressed all of the feedback (thank you all for the improvements!), except for the suggestion from @Endilll to switch to using italics rather than code font. I think the suggestion is a good one, but it's a time-consuming edit that doesn't change the meaning of the document, so I'd prefer that be done in a follow-up.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks : )

@AaronBallman AaronBallman merged commit 9263318 into llvm:main May 8, 2024
@AaronBallman AaronBallman deleted the aballman-modules-documentation branch May 8, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++20 clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants