8. Preprocessing checks

  1. 8.1. Preprocessor directives
  2. 8.2. Indented Preprocessing Directives
  3. 8.3. Multiple macro definitions
  4. 8.4. Macro arguments
  5. 8.5. Unmatched quotes
  6. 8.6. Include depth
  7. 8.7. Text after #endif
  8. 8.8. Text after #
  9. 8.9. New line at end of file
  10. 8.10. Conditional Compilation
  11. 8.11. Target dependent conditional inclusion
  12. 8.12. Unused headers

This chapter describes tdfc2's capabilities for checking the preprocessing constructs that arise in C.

8.1. Preprocessor directives

By default, the TenDRA C checker understands those preprocessor directives specified by the ISO C standard, section 6.8, i.e. #if, #ifdef, #ifndef, #elif, #else, #endif, #error, #line and #pragma. As has been mentioned, #pragma statements play a significant role in the checker. While any recognised #pragma statements are processed, all unknown pragma statements are ignored by default. The check to detect unknown pragma statements is controlled by:

#pragma TenDRA unknown pragma permit

The options for permit are disallow (raise an error if an unknown pragma is encountered), warning (allow unknown pragmas with a warning), or allow (allow unknown pragmas without comment).

In addition, the common non-ISO preprocessor directives, #file, #ident, #assert, #unassert and #weak may be permitted using:

#pragma TenDRA directive dir allow

where dir is one of file, ident, assert, unassert or weak. If allow is replaced by warning then the directive is allowed, but a warning is issued. In either case, the modifier (ignore) may be added to indicate that, although the directive is allowed, its effect is ignored. Thus for example:

#pragma TenDRA directive ident (ignore) allow

causes the checker to ignore any #ident directives without raising any errors.

Finally, the directive dir can be disallowed using:

#pragma TenDRA directive dir disallow

Finally, the directive dir can be disallowed using:

#pragma TenDRA unknown directive allow

Any other unknown preprocessing directives cause the checker to raise an error in the default mode. The pragma may be used to force the checker to ignore such directives without raising any errors.

Disallow and warning variants are also available.

8.2. Indented Preprocessing Directives

The ISO C standard allows white space to occur before the # in a preprocessing directive, and between the # and the directive name. Many older preprocessors have problems with such directives. The checker's treatment of such directives can be set using:

#pragma TenDRA indented # directive permit

which detects white space before the # and:

#pragma TenDRA indented directive after # permit

which detects white space before the # and the directive name. The options for permit are allow, warning or disallow as usual. The default checking profile allows both forms of indented directives.

8.3. Multiple macro definitions

The ISO C standard states that, for two definitions of a function-like macro to be equal, both the spelling of the parameters and the macro definition must be equal. Thus, for example, in:

#define f( x ) ( x )
#define f( y ) ( y )

the two definitions of f are not equal, despite the fact that they are clearly equivalent. Tchk has an alternative definition of macro equality which allows for consistent substitution of parameter names. The type of macro equality used is controlled by:

#pragma TenDRA weak macro equality allow

where permit is allow (use alternative definition of macro equality), warning (as for allow but raise a warning), or disallow (use the ISO C definition of macro equality - this is the default setting).

More generally, the pragma:

#pragma TenDRA extra macro definition allow

allows macros to be redefined, both consistently and inconsistently. If the definitions are incompatible, the first definition is overwritten. This pragma has a disallow variant, which resets the check to its default mode.

8.4. Macro arguments

According to the ISO C standard, section 6.8.3, if a macro argument contains a sequence of preprocessing tokens that would otherwise act as a preprocessing directive, the behaviour is undefined. Tchk allows preprocessing directives in macro arguments by default. The check to detect such macro arguments is controlled by:

#pragma TenDRA directive as macro argument permit

where permit is allow, warning or disallow.

The ISO C standard, section 6.8.3.2, also states that each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list. By default, if tdfc2 encounters a # in a function-like macro replacement list which is not followed by a parameter of the macro an error is raised. The checker's behaviour in this situation is controlled by:

#pragma TenDRA no ident after # permit

where the options for permit are allow (do not raise errors), disallow (default mode) and warning (raise warnings instead of errors).

8.5. Unmatched quotes

The ISO C standard, section 6.1, states that if a ' or " character matches the category of preprocessing tokens described as single non-whitespace-characters that do not lexically match the other preprocessing token categories, then the behaviour is undefined. For example:

#define a 'b

would result in undefined behaviour. By default the ' character is ignored by tdfc2. A check to detect such statements may be controlled by:

#pragma TenDRA unmatched quote permit

The usual allow, warning and disallow options are available.

8.6. Include depth

Most preprocessors set a maximum depth for #include directives (which may be limited by the maximum number of files which can be open on the host system). By default, the checker supports a depth equal to this maximum number. However, a smaller maximum depth can be set using:

#pragma TenDRA includes depth n

where n can be any positive integral constant.

8.7. Text after #endif

The ISO C standard, section 6.8, specifies that #endif and #else preprocessor directives do not take any arguments, but should be followed by a newline. In the default checking mode, tdfc2 raises an error when #endif or #else statements are not directly followed by a new line. This behaviour may be modified using:

#pragma TenDRA text after directive permit

where permit is allow (no errors are raised and any text on the same line as the #endif or #else statement is ignored), warning or disallow.

8.8. Text after #

The ISO C standard specifies that a # occuring outside of a macro replacement list must be followed by a new line or by a preprocessing directive and this is enforced by the checker in default mode. The check is controlled by:

#pragma TenDRA no directive/nline after ident permit

where permit may be allow, disallow or warning.

8.9. New line at end of file

The ISO C standard, section 5.1.1.2, states that source files must end with new lines. Files which do not end in new lines are flagged as errors by the checker in default mode. The behaviour can be modified using:

#pragma TenDRA no nline after file end permit

where permit has the usual allow, disallow and warning options.

8.10. Conditional Compilation

Tchk generally treats conditional compilation in the same way as other compilers and checkers. For example, consider:

#if expr
.... /* First branch */
#else
.... /* Second branch */
#endif

the expression, expr, is evaluated: if it is non-zero the first branch of the conditional is processed; if it is zero the second branch is processed instead.

Sometimes, however, tdfc2 may be unable to evaluate the expression statically because of the abstract types and expressions which arise from the minimum integer range assumptions or the abstract standard headers used by the tool (see target-dependent types in section 4.5). For example, consider the following ISO compliant program:

#include <stdio.h>
#include <limits.h>
int main ()
{
#if ( CHAR_MIN == 0 )
	puts ("char is unsigned");
#else
	puts ("char is signed");
#endif
	return ( 0 );
}

The TenDRA representation of the ISO API merely states that CHAR_MIN - the least value which fits into a char - is a target dependent integral constant. Hence, whether or not it equals zero is again target dependent, so the checker needs to maintain both branches. By contrast, any conventional compiler is compiling to a particular target machine on which CHAR_MIN is a specific integral constant. It can therefore always determine which branch of the conditional it should compile.

In order to allow both branches to be maintained in these cases, it has been necessary for tdfc2 to impose certain restrictions on the form of the conditional branches and the positions in which such target-dependent conditionals may occur. These may be summarised as:

  • Target-dependent conditionals may not appear at the outer level. If the checker encounters a target-dependent conditional at the outer level an error is produced. In order to continue checking in the rest of the file an arbitrary assumption must be made about which branch of the conditional to process; tdfc2 assumes that the conditional is true and the first branch is used;

  • The branches of allowable target-dependent conditionals may not contain declarations or definitions.

8.11. Target dependent conditional inclusion

One of the effects of trying to compile code in a target independent manner is that it is not always possible to completely evaluate the condition in a #if directive. Thus the conditional inclusion needs to be preserved until the installer phase. This can only be done if the target dependent #if is more structured than is normally required for preprocessing directives. There are two cases; in the first, where the #if appears in a statement, it is treated as if it were a if statement with braces including its branches; that is:

#if cond
	true_statements
#else
	false_statements
#endif

maps to:

if ( cond ) {
	true_statements
} else {
	false_statements
}

In the second case, where the #if appears in a list of declarations, normally gives an error. The can however be overridden by the directive:

#pragma TenDRA++ conditional declaration allow

which causes both branches of the #if to be analysed.

8.12. Unused headers

Header files which are included but from which nothing is used within the other source files comprising the translation unit, might just as well not have been included. Tchk can detect top level include files which are unnecessary, by analysing the tdfc2dump output for the file. This check is enabled by passing the -Wd,-H command line flag to tcc. Errors are written to stderr in a simple ascii form by default, or to the unified dump file in dump format if the -D command line option is used.