The Overall Design of TCC

1. The Overall Design of TCC

1.1. Specifying the API
1.2. The Main Compilation Path
1.3. Input File Types
1.4. Intermediate and Output Files
1.5. Other Compilation Paths
1.6. Finding out what tcc is doing

1.1. Specifying the API

As we have seen, the API plays a far more concrete role in the TDF compilation strategy than in the traditional scheme. Therefore the API needs to be explicitly specified to TCC before any compilation takes place. As can be seen from Fig. 3, the API has three components. Firstly, in the target independent (or production) half of the compilation, there are the target independent headers which describe the API. Secondly in the target dependent (or installation) half, there is the API implementation for the particular target machine. This is divided between the TDF libraries, derived from the system headers, and the system libraries. Specifying the API to TCC essentially consists of telling it what target independent headers, TDF libraries and system libraries to use. The precise way in which this is done is discussed below (in section 4.3).

1.2. The Main Compilation Path

Once the API has been specified, the actual compilation can begin. The default action of TCC is to perform production and installation consecutively on the same machine; any other action needs to be explicitly specified. So let us describe the entire compilation path from C source to executable shown in Fig. 3.

The first stage is production. The C → TDF producer transforms each input C source file into a target independent TDF capsule, using the target independent headers to describe the API in abstract terms. These target independent capsules will contain tokens to represent the uses of objects from the API, but these tokens will be left undefined.
The second stage, which is also the first stage of the installation, is TDF linking. Each target independent capsule is combined with the TDF library describing the API implementation to form a target dependent TDF capsule. Recall that the TDF libraries contain the local definitions of the tokens left undefined by the producer, so the resultant target dependent capsule will contain both the uses of these tokens and the corresponding token definitions.
The third stage of the compilation is for the TDF translator to transform each target dependent TDF capsule into an assembly source file for the appropriate target machine. Some TDF translators output not an assembly source file, but a binary object file. In this case the following assembler stage is redundant and the compilation skips to the system linking.
The next stage of the compilation is for each assembly source file to be translated into a binary object file by the system assembler.
The final compilation phase is for the system linker to combine all the binary object files with the system libraries to form a single, final executable. Recall that the system libraries are the final constituent of the API implementation, so this stage completes the combination of the program with the API implementation started in stage 2).

Let us, for convenience, tabulate these stages, giving the name of each compilation tool (plus the corresponding executable name), a code letter which TCC uses to refer to this stage, and the input and output file types for the stage (also see 7.2).

Stage	Tool	Code	Input	Output
1.	C producer (tdfc)	`c`	C source	Target independant TDF
2.	TDF linker (tld)	`L`	Target independant TDF	Target dependant TDF
3.	TDF translator (trans)	`t`	Target dependant TDF	Assembly source
4.	system assembler (as)	`a`	Assembly source	Binary object
5.	system linker (ld)	`l`	Binary object	Executable

The executable name of the TDF translator varies, depending on the target machine. It will normally start, or end, however, in trans. These stages are documented in more detail in sections 5.1 to 5.5.

The code letters for the various compilation stages can be used in the -Wtool, opt, ... command-line option to TCC. This passes the option(s) opt directly to the executable in the compilation stage identified by the letter tool. For example, -Wl, -x will cause the system linker to be invoked with the -x option. Similarly the -Etool: file allows the executable to be invoked at the compilation stage tool to be specified as file. This allows the TCC user access to the compilation tools in a very direct manner.

1.3. Input File Types

This compilation path may be joined at any point, and terminated at any point. The latter possibility is discussed below. For the former, TCC determines, for each input file it is given, to which of the file types it knows (C source, target independent TDF, etc.) this file belongs. This determines where in the compilation path described this file will start. The method used to determine the type of a file is the normal filename suffix convention:

Type	Content
`*.c`	C source
`*.j`	Target independent TDF capsules
`*.t`	Target dependent TDF capsules
`*.s`	Assembly source files
`*.o`	Binary object files
`*`	Binary object files

Files whose type cannot otherwise be determined are assumed to be binary object files (for a complete list see 7.1).

Thus, for example, we speak of .j files as a shorthand for target independent TDF capsules. Each file type recognised by TCC is assigned an identifying letter. For convenience, this corresponds to the suffix identifying the file type (c for C source files, j for target independent TDF capsules etc.).

There is an alternative method of specifying input files, by means of the -Stype, file, ... command-line option. This specifies that the file file should be treated as an input file of the type corresponding to the letter type, regardless of its actual suffix. Thus, for example, -Sc, file specifies that file should be regarded as a C source (or .c) file.

1.4. Intermediate and Output Files

During the compilation, TCC makes up names for the output files of each of the compilation phases. These names are based on the input file name, but with the input file suffix replaced by the output file suffix (unless the -make_up_names command-line option is given, in which case the intermediate files are given names of the form _tccnnnn.x, where nnnn is a number which is incremented for each intermediate file produced, and x is the suffix corresponding to the output file type). Thus if the input file file.c is given, this will be transformed into file.j by the producer, which in turn will be transformed into file.t by the TDF linker, and so on. The system linker output file name can not be deduced in the same way since it is the result of linking a number of .o files. By default, as with CC, this file is called a.out.

For most purposes these intermediate files are not required to be preserved; if we are compiling a single C source file to an executable, then the only output file we are interested in is the executable, not the intermediate files created during the compilation process. For this reason TCC creates a temporary directory in which to put these intermediate files, and removes this directory when the compilation is complete. All intermediate files are put into this temporary directory except:

those which are an end product of the compilation (such as the executable),
those which are explicitly ordered to be preserved by means of command-line options,
binary object files, when more than one such file is produced (this is for compatibility with CC).

TCC can be made to preserve intermediate files of various types by means of the -Ptype... command-line option, which specifies a list of letters corresponding to the file types to be preserved. Thus for example -Pjt specifies that all TDF capsules produced, whether target independent or target dependent, (i.e. all .j and .t files) should be preserved. The special form -Pa specifies that all intermediate files should be preserved. It is also possible to specify that a certain file type should not be preserved by preceding the corresponding letter by - in the -P option. The only really useful application of this is to use -P-o to cancel the CC convention on preserving binary object files mentioned above.

By default, all preserved files are stored in the current working directory. However the -work dir command-line option specifies that they should be stored in the directory dir.

The compilation can also be halted at any stage. The -Ftype option to TCC tells it to stop the compilation after creating the files of the type corresponding to the letter type. Because any files of this type which are produced will be an end product of the compilation, they will automatically be preserved. For example, -Fo halts the compilation after the creation of the binary object, or .o, files (i.e. just before the system linking), and preserves all such files produced. A number of other TCC options are equivalent to options of the form -Ftype:

-i is equivalent to -Fj (i.e. just apply the producer),

-S is equivalent to -Fs (CC compatibility),

-c is equivalent to -Fo (CC compatibility).

If more than one -F option (including the equivalent options just listed) is given, then TCC issues a warning. The stage coming first in the compilation path takes priority.

If the compilation has precisely one end product output file, then the name of this file can be specified to be file by means of the -o file command-line option. If a -o file option is given when there is more than one end product, then the first such file produced will be called file, and all such files produced subsequently will cause TCC to issue a warning.

Figure 1. TDF Full Compilation Path

1.5. Other Compilation Paths

So far we have been discussing the main TCC compilation path from C source to executable. This is however only part of the picture. The full complexity (almost) of all the possible compilation paths supported by TCC is shown in Fig. 4. This differs from Fig. 3 in that it only shows the left hand, or program, half of the main compilation diagram. The solid arrows show the default compilation paths; the dashed arrows are only followed if TCC is so instructed by means of command-line options. Let us consider those paths in this diagram which have not so far been mentioned.

1.5.1. Preprocessing

The first paths to be considered involve preprocessed C source files. These form a distinct file type which TCC recognises by means of the .i file suffix. Input .i files are treated in exactly the same way as .c files; that is, they are fed into the producer.

TCC can be made to preprocess the C source files it is given by means of the -P and -E options. If the -P option is given then each .c file is transformed into a corresponding .i file by the TDF C preprocessor, tdfcpp. If the -E option is given then the output of tdfcpp is sent instead to the standard output. In both cases the compilation halts after the preprocessor has been applied. Preprocessing is discussed further in section 5.6.

1.5.2. TDF Archives

The second new file type introduced in Fig. 4 is the TDF archive. This is recognised by TCC by means of the .ta file suffix. Basically a TDF archive is a set of target independent TDF capsules (this is slightly simplified, see section 5.2.3 for more details). Any input TDF archives are automatically split into their constituent target independent capsules. These then join the main compilation path in the normal way.

In order to create a TDF archive, TCC must be given the -prod command-line option. It will combine all the target independent TDF capsules it has into an archive, and the compilation will then halt. By default this archive is called a.ta, but another name may be specified using the -o option.

The routines for splitting and building TDF archives are built into TCC, and are not implemented by a separate compilation tool (in particular, TDF archives are not ar archives). Really TDF archives are a TCC-specific construction; they are not part of TDF proper.

1.5.3. TDF Notation

TDF has the form of an abstract syntax tree which is encoded as a series of bits. In order to examine the contents of a TDF capsule it is necessary to translate it into an equivalent human readable form. Two tools are provided which do this. The TDF pretty printer, disp, translates TDF into text, whereas the TDF notation compiler, TNC, both translates TDF to text and text to TDF. The two textual forms of TDF are incompatible - disp output cannot be used as TNC input. disp is in many ways the more sophisticated decoder - it understands the TDF extensions used to handle diagnostics, for example - but it does not handle the text to TDF translation which TNC does. By default TNC is a text to TDF translator, it needs to be passed the -p flag in order to translate TDF into text. We refer to the textual form of TDF supported by TNC as TDF notation.

By default, TCC uses disp. If it is given the -disp command-line option then all target independent TDF capsules (.j files) are transformed into text using disp. The -disp_t option causes all target dependent TDF capsules (.t files) to be transformed into text. In both cases the output files have a .p suffix, and the compilation halts after they are produced.

In order for TNC to be used, the -Ytnc flag should be passed to TCC. In this case the -disp and the -disp_t option cause, not disp, but tnc -p, to be invoked. But this flag also causes TCC to recognise files with a .p suffix as TDF notation source files. These are translated by TNC into target independent TDF capsules, which join the main compilation path in the normal way.

Similarly if the -Ypl_tdf flag is passed to TCC then it recognises files with a .tpl suffix as PL_TDF source files. These are translated by the PL_TDF compiler, TPL, into target independent TDF capsules.

disp and TNC are further discussed in section 5.7.

1.5.4. Merging TDF Capsules

The final unexplored path in Fig. 4 is the ability to combine all the target independent TDF capsules into a single capsule. This is specified by means of the -M command-line option to TCC. The combination of these capsules is performed by the TDF linker, TLD. Whereas in the main compilation path TLD is used to combine a single target independent TDF capsule with the TDF libraries to form a target dependent TDF capsule, in this case it is used to combine several target independent capsules into a single target independent capsule. By default the combined capsule is called a.j. The compilation will continue after the combination phase, with the resultant capsule rejoining the main compilation path. This merging operation is further discussed in section 5.2.2.

The only unresolved issue in this case is, if the -M option is given, to what .j files do the -Fj and the -Pj options refer? In fact, TCC takes them to refer to the merged TDF capsule rather than the capsules which are merged to form it. The -Pa option, however, will cause both sets of capsules to be preserved.

To summarise, TCC has an extra three file types, and an extra three compilation tools (not including the TDF archive creating and splitting routines which are built into TCC). These are:

Type	Content
`*.i`	Preprocessed C source
`*.ta`	TDF archives
`*.p`	TDF notation source

and:

Stage	Tool	Code	Input	Output
6.	C preprocessor (tdfcpp)	`c`	C source	Preprocessed C source
7a.	Pretty printer (disp)	`d`	TDF capsule	TDF notation
7b.	Reverse notation (tnc -p)	`d`	TDF capsule	TDF notation
8.	Notation compiler (tnc)	`d`	TDF notation	TDF capsule

(see 7.1 and 7.2 for complete lists).

1.6. Finding out what tcc is doing

With so many different file types and alternative compilation paths, it is often useful to be able to keep track of what TCC is doing. There are several command-line options which do this. The simplest is -v which specifies that TCC should print each command in the compilation process on the standard output before it is executed. The -vb option is similar, but only causes the name of each input file to be printed as it is processed. Finally the -dry option specifies that the commands should be printed (as with -v) but not actually executed. This can be used to experiment with TCC to find out what it would do in various circumstances.

Occasionally an unclear error message may be printed by one of the compilation tools. In this case the -show_errors option to TCC might be useful. It causes TCC to print the command it was executing when the error occurred. By default, if an error occurs during the construction of an output file, the file is removed by TCC. It can however be preserved for examination using the -keep_errors option. This applies not only to normal errors, but also to exceptional errors such as the user interrupting TCC by pressing ^C, or one of the compilation tools crashing. In the latter case, TCC will also remove any core file produced, unless the -keep_errors option is specified.

For purposes of configuration control, the -version flag will cause TCC to print its version number. This will typically be of the form:

tcc: Version: 4.0, Revision: 1.5, Machine: hp

giving the version and revision number, plus the target machine identifier. The -V flag will also cause each compilation tool to print its version number (if appropriate) as it is invoked.