Interface files

A warning

Our system of interface files is quite complex. Some of the complexity is justified, or at least was justified at the time. Other aspects are almost certainly accidents. The compiler's old representation of parse trees as raw lists of items, with any structure and constraints being implicit rather than explicit, was quite error prone; since the structure and constraints were not expressed in types, violations did not result in type errors, and thus could accumulate undetected.

I (zs) don't believe any design document for this system ever existed outside of Fergus's head. This document is my (zs's) attempt to reconstruct that design document. In the rest of this file, I will try to be careful to explicitly distinguish between what I know to be true, and what I only believe to be true, either because I remember it, or because I deduce it from the code. Note also that I may be mistaken.

Automatic generation of interface files

The principle of information hiding dictates that only some of the contents of a module should be visible outside the module. The part that is visible outside the module is usually called the interface, while the part that is not visible outside the module is usually called the implementation.

When compiling a module A that imports functionality from module B, the compiler usually wants to read a file containing only the interface of module B. In some languages such as Ada, C and C++, programmers themselves write this file. Having to maintain two files for one module can be a bit of a hassle, so in other languages, such as Haskell, programmers only ever edit one file for each module (its source file). Within that file, they indicate which parts are public and which are not, and the compiler uses this information to generate each module's interface file automatically.

Introduction to interface files

In Mercury, the source code of each module is in a single file, whose suffix is .m, and the generation of interface files is solely the job of the compiler. However, unlike most other similar programming languages, the compiler generates three or four different interface files for each source file, and it generates these in two or three steps.

The steps are as follows.

The different kinds of interface files are as follows.

The contents of .int3 files

The contents of the .int3 file of a module are derived solely from the contents of the interface sections of that module. Each item in these sections

If the item is included in the .int3 file, whether changed or unchanged, it stays in the interface section, so .int3 files contain no implementation section.

After we decide what items to include in the .int3 file, its contents are module qualified to the maximum extent possible. However, this cannot guarantee full module qualification, for reasons explained below.

The rules for choosing between the above three outcomes of course depend on the item's type.

XXX While type definition items will never contain references to type constructor names that may require qualification, type representation items for equivalence types may.

XXX If e.g. the constructor t1/0 appears in an abstract instance definition, and the module defines a type constructor t1/0, this arguably should not require us to include all the imports in the .int3 file. The reason has two parts. Either the t1/0 is defined in one or more of the imported modules, or it isn't. If it is not defined in any of them, then causing the readers of this module's .int3 file to read those other modules' .int3 files will just compute the same result, only slower. On the other hand, if some of those modules do define t1/0, then this type constructor is multiply defined, so its appearance in the instance definition is ambiguous. This fact will be reported in an error message when this module is compiled to target code. Reporting it during some other compiler invocation that happens to read this module's .int3 file may be more of a distraction than a help.

XXX We should investigate replacing the copied import_module declaration list with a single item that says "these type, inst, etc names in this .int3 file may be defined in these other modules". This should have two benefits. One, it would allow us to stop the transitive grabbing of .int3 files as soon as we have read the .int3 files that define all the type, inst etc names that are still outstanding. Two, it should allow us to start using the transitively grabbed files only for their intended purpose, which should stop "leaks" of declarations/definitions from these transitively-included-only-for-module-qualification modules to the HLDS of the module being compiled. Right now, it is possible, though rare, to delete an import of module b from module a, and discover that this results in an error: a type being undefined, when that type is defined in module c. This happens only because the import of b dragged with it the definitions inside c as well, so that the import of c was required by the language definition but not by the compiler.

The contents of .int0 files

The contents of the .int0 file of a module are derived from the contents of both the interface sections and the implementation sections of that module, after they are fully module qualified. This requires reading in, via grab_unqual_imported_modules_make_int,

Items are never moved between sections: items in the interface section of the module are put into the interface section of the .int0 file, while items in the implementation section of the module are put into the implementation section of the .int0 file.

The contents of .int files

The contents of the .int file of a module are derived from the contents of both the interface sections and the implementation sections of that module, after they are fully module qualified. This requires reading in, via grab_unqual_imported_modules_make_int,

Items are never moved between sections: items in the interface section of the module are put into the interface section of the .int file, while items in the implementation section of the module are put into the implementation section of the .int file.

The contents of .int2 files

The contents of the .int2 file of a module are derived from the contents of the module's .int file. This means that indirectly, it is derived from both the interface sections and the implementation sections of that module, after they are fully module qualified, which requires reading in various interface files of various other modules (see the section on .int files above for details).

Inclusion property

A module's .int, .int2 and .int3 files are supposed to maintain the property that

This is implicit in the fact that the algorithm we use to read in interface files (in grab_modules.m) never reads in a .int2 file if it has read in that module's .int file, and never reads in a .int3 file if it has read in that module's .int or .int2 file. In fact, making this possible is the reason why it reads in .int files first, .int2 files next, and .int3 files last.

(Note that in the presence of intermodule optimization, grab_modules.m can read in a module's .int file (as an int-for-opt file) after it reads in its .int2 or .int3 file. This will typically lead to many entities defined in the .int2 or .int3 file being defined twice. The compiler must (and does) reject such double definitions silently.)

I (zs) don't see any inclusion requirements being placed on .int0 files. However, grab_modules.m does have to read in its ancestors'.int0 files first, because the set of .int files it needs to read includes not just the .int files of the modules imported by the current module, but also the .int files of the modules imported by its ancestors.

Note that the inclusion property does not apply to import_module and use_module declarations. While .int and .int2 files contain only use_module declarations, .int3 files contain only import_module declarations, which can be considered more expressive, and .int3 files may contain import_module declarations for modules for which the corresponding .int and/or .int2 files do not contain use_module declarations. This is because in the absence of errors, .int and .int2 files are always fully module qualified, which is something we cannot insist on for .int3 files. This difference is

Consumers of .int3 files need import_module declarations because they need to find which module defines an entity, such as a type constructor; consumers of .int and .int2 files need only use_module declarations because they need only to look up the definition of that entity.

That definition should be needed only in two cases. First, in the case of type definitions, we may need to follow chains of type equivalences to the end in order to make decisions about type representations involving that type. Second, in the case of inst and mode definitions, we may likewise need to follow chains of equivalences to the end in order to figure out the exact expansions of named insts and modes, which we may need for mode analysis.

Timestamp files