C++ modules

One of the significant changes that would be a part of the C++20 standard is modules. Modules are quite a significant change in the sense that it re-defines the compilation model that’s been in place so far. Let’s see how the pre-C++20 compilation model looks like. When you want to divide your programs into modules, you would put the functions declarations related to the module in a separate header(.hpp) and define these functions in a source file(.cpp). When a program has to include this module, it includes the header for the compilation. The module is compiled as a shared object/library which is then linked to the final program as shown below:

There are both advantages and disadvantages with this compilation model. The advantages are:
Parallel Compilation: All programs/modules can be compiled independently and come together during the linking phase.
The disadvantages are:
Slower build time: During the pre-processing stage, the compiler textually includes the information in the included headers in the source file, leading to compilation of the header multiple times in all source files that include the header. This does not scale well when there a large number of files are dependent on the module, leading to slower builds.
Lack of encapsulation: Headers are not good at hiding information. Yes, you can use private class members, but you cannot do that for classes/methods that you don’t want your clients to use.
Cyclic and Order dependencies: It’s permitted to have order and cyclic dependencies between headers in the current compilation model. Headers can depend on each other and they could enforce the client to include them in a certain order for the program to work coherently.
Violates One-Definition Rule(ODR): When you have multiple programs including your headers, it leads to a range of issues. For instance, you have a preprocessor directive in your header that changes one of it’s data structure and when one program compiles it with your the pre-processor directive and the other does not, it can cause a glitch when you’re programs are talking to each other through this data structure.

In the pre-C++20 compilation module, each header is not an individual translation unit. The fundamental change with C++20 compilation model is that each modular import is an independent translation unit. The compilation model is as shown below:

Note that in this model, each source file does not textually include header information, instead the modules are first pre-compiled into built module interfaces which are then used by the translation units dependent on them.

Like headers, modules will be defined in .cppm files. Instead of using pre-processor directive #include, you would be using the keyword import which specifies the import of an independent translation unit.

There are a lot of advantages/disadvantages with this compilation model:
Compilation Process: With the textual model, each source file is expanded to include what’s in the header, which makes the build process slower. However, in this compilation model, each modular import is treated as an individual translation unit, which implies that it will be compiled only once during the pre-compilation phase. However, this comes with a disadvantage that the compilation process could no longer be parallel like in the pre-C++20 compilation model.
Encapsulation: Modules offer greater control over what can be exposed to the clients. For instance, classes that you don’t export will not be visible or usable by the clients. Also, modules don’t see programmatic macros included in other translation units.
Circular and order dependencies: One big advantage of using modules is that you cannot have circular dependencies between modules. Circular dependencies are detected during compilation and aborted. Also, with modules, it does not matter in which order you import modules as each of it is a singular translation unit and since cyclic dependencies between modules are disallowed, the order of importing modules do not matter anymore.
One-definition rule: With modules, you cannot forward declare any entity that lives in another module. For example, a module that imports another cannot re-define, re-declare the entities in the modules that it imports. However, it is still permitted for different modules to export entities that have a similar signature. In this case, the program that imports these modules would ill-formed and the compiler would not provide any diagnostic information.

In summary, using modules are better for better compilation times, encapsulation and dependency rules as it makes everything explicit. However, transitioning to modules is not straight-forward given that C++ till now has been using headers which have their own fallbacks. There are more concepts behind modules such as module partitioning, these explained more briefly in this very good talk .

Leave a Reply Cancel reply