Variable Analysis

6. Variable Analysis

6.1. Variable lifetime analysis
6.2. 5.6.2 Modification between sequence points
6.3. Unused variables
6.4. Values set and not used
6.5. Variable which has not been set is used
6.6. Variable shadowing
6.7. Overriding the variable analysis

The variable analysis checks are controlled by:

#pragma TenDRA variable analysis status

Where status is on, warning or off as usual. The checks are switched off in the default mode.

There are also equivalent command line options to tdfc2 of the form -X:variable=state, where state can be check, warn or dont.

The variable analysis is concerned with the evaluation of expressions and the use of local variables, including function arguments. Occasionally it may not be possible to statically perform a full analysis on an expression or variable and in these cases the messages produced indicate that there may be a problem. If a full analysis is possible a definite error or warning is produced. The individual checks are listed in sections 5.6.1 to 5.6.6 and section 5.7 describes the source annotations which can be used to fine-tune the variable analysis.

6.1. Variable lifetime analysis

The directive:

#pragma TenDRA variable analysis on

enables checks on the uses of automatic variables and function parameters. These checks detect:

If a variable is not used in its scope.
If the value of a variable is used before it has been assigned to.
If a variable is assigned to twice without an intervening use.
If a variable is assigned to twice without an intervening sequence point.

as illustrated by the variables a, b, c and d respectively in:

void f ()
{
	int a ;        // a never used
	int b ;
	int c = b ;    // b not initialised
	c = 0 ;        // c assigned to twice
	int d = 0 ;
	d = ++d ;      // d assigned to twice
}

The second, and more particularly the third, of these checks requires some fairly sophisticated flow analysis, so any hints which can be picked up from exhaustive switch statements etc. is likely to increase the accuracy of the errors detected.

In a non-static member function the various non-static data members are analysed as if they were automatic variables. It is checked that each member is initialised in a constructor. A common source of initialisation problems in a constructor is that the base classes and members are initialised in the canonical order of virtual bases, non-virtual direct bases and members in the order of their declaration, rather than in the order in which their initialisers appear in the constructor definition. Therefore a check that the initialisers appear in the canonical order is also applied.

It is possible to change the state of a variable during the variable analysis using the directives:

#pragma TenDRA set expression
#pragma TenDRA discard expression

The first asserts that the variable given by the expression has been assigned to; the second asserts that the variable is not used. An alternative way of expressing this is by means of keywords:

SET ( expression )
DISCARD ( expression )

introduced using the directives.

#pragma TenDRA keyword identifier for set
#pragma TenDRA keyword identifier for discard variable

respectively. These expressions can appear in expression statements and as the first argument of a comma expression.

The variable flow analysis checks have not yet been completely implemented. They may not detect errors in certain circumstances and for extremely convoluted code may occasionally give incorrect errors.

6.2. 5.6.2 Modification between sequence points

The ISO C standard states that if an object is modified more than once, or is modified and accessed other than to determine the new value, between two sequence points, then the behaviour is undefined. Thus the result of:

var = arr[i++] + i++ ;

is undefined, since the value of i is being incremented twice between sequence points. This behaviour is detected by the variable analysis.

6.3. Unused variables

As part of the variable analysis, a simple test applied to each local variable at the end of its scope to determine whether it has been used in that scope. For example, in:

int f ( int n )
{
	int r;
	return ( 0 );
}

both the function argument n and the local variable r are unused.

6.4. Values set and not used

This is a more complex test since it is applied to every instance of setting the variable. For example, in:

int f ( int n )
{
	int r = 1;
	r = 5;
	return ( r );
}

the first value r is set to 1 and is not used before it is overwritten by 5 (this second value is used however). This test requires some flow analysis. For example, if the program is modified to:

int f ( int n )
{
	int r = 1;
	if ( n == 3 ) {
		r = 5;
	}
	return ( r );
}

the initial value of r is used when n != 3, so no error is detected. However in:

int f ( int n )
{
	int r = 1;
	if ( n == 3 ) {
		r = 5;
	} else {
		r = 6;
	}
	return ( r );
}

the initial value of r is overwritten regardless of the result of the conditional, and hence is unused.

6.5. Variable which has not been set is used

This test also requires some flow analysis, for example in:

int f ( int n )
{
	int r;
	if ( n == 3 ) {
		r = 5;
	}
	return ( r );
}

the use of the variable r as a return value is reported because there are paths leading to this statement in which r is not set (i.e. when n != 3). However, in:

int f ( int n )
{
	int r;
	if ( n == 3 ) {
		r = 5;
	} else {
		r = 6;
	}
	return ( r );
}

r is always set before it is used, so no error is detected.

6.6. Variable shadowing

It is quite legal in C to have a variable in an inner scope, with the same name as a variable in an outer scope. These variables are distinct and whilst in the inner scope, the declaration in the outer scope is not visible - it is shadowed by the local variable of the same name. Confusion can arise if this was not what the programmer intended. The checker can therefore be configured to detect shadowing in three cases: a local variable shadowing a global variable; a local variable shadowing a local variable with a wider scope and a local variable shadowing a typedef name, by using:

#pragma TenDRA variable hiding analysis status

If status is on an error is raised when a local variable that shadows another variable is declared, if warning is used the error is replaced by a warning and the off option restores the default behaviour (shadowing is permitted and no errors are produced).

The directive:

#pragma TenDRA variable hiding analysis on

can be used to enable a check for hiding of other variables and, in member functions, data members, by local variable declarations.

6.7. Overriding the variable analysis

Although many of the problems discovered by the variable analysis are genuine mistakes, some may be as the result of deliberate decisions by the program writer. In this case, more information needs to be provided to the checker to convey the programmer's intentions. Four constructs are provided for this purpose: the discard variable, the set variable, the exhaustive switch and the non-returning function.

6.7.1. 5.7.1 Discarding variables

Actively discarding a variable counts as a use of that variable in the variable analysis, and so can be used to suppress messages concerning unused variables and values assigned to variables. There are two distinct methods to indicate that the variable x is to be discarded. The first uses a pragma:

#pragma TenDRA discard x;

which the checker treats as if it were a C statement, ending in a semicolon. Having a statement which is noticed by one compiler but ignored by another can lead to problems. For example, in:

if ( n == 3 )
#pragma TenDRA discard x;
	puts ( "n is three" );

tdfc2 believes that x is discarded if n == 3 and the message is always printed, whereas other compilers will ignore the #pragma statement and think that the message is printed if n == 3. An alternative, in many ways neater, solution is to introduce a new keyword for discarding variables. For example, to introduce the keyword DISCARD for this purpose, the pragma:

#pragma TenDRA keyword DISCARD for discard variable

should be used. The variable x can then be discarded by means of the statement:

DISCARD ( x );

A dummy definition for DISCARD to use with normal compilers needs to be given in order to maintain compilability with those compilers. For example, a complete definition of DISCARD might be:

#ifdef __TenDRA__
#pragma TenDRA keyword DISCARD for discard variable
#else
#define DISCARD(x) (( void ) 0 )
#endif

Discarding a variable changes its assignment state to unset, so that any subsequent uses of the variable, without an intervening assignment to it, lead to a variable used before being set error. This feature can be exploited if the same variable is used for distinct purposes in different parts of its scope, by causing the variable analysis to treat the different uses separately. For example, in:

void f ( void )
{
	int i = 0;
	while ( i++ < 10 ) {
		puts ( "hello" );
	}
	while ( i++ < 10 ) {
		puts ( "goodbye" );
	}
}

which is intended to print both messages ten times, the two uses of i as a loop counter are independent - they could have been implemented with different variables. By discarding i after the first loop, the second loop can be analysed separately. In this way, the error of failing to reset i to 0 can be detected.

6.7.2. Setting variables

In addition to discarding variables, it is also possible to set them. In deliberately setting a variable, the programmer is telling the checker to assume that some value will always have been assigned to the variable by that point, so that any variable used without being set errors can be suppressed. This construct is particularly useful in programs with complex flow control, to help out the variable analysis. For example, in:

void f ( int n )
{
	int r;
	if ( n != 0 ) r = n;
	if ( n > 2 ) {
		printf ( "%d\n", r );
	}
}

r is only used if n > 2, in which case we also have n != 0, so that r has already been initialised. However, in its flow analysis, the TenDRA C checker treats all the conditionals it meets as if they were independent and does not look for any such complex dependencies (indeed it is possible to think of examples where such analysis would be impossible). Instead, it needs the programmer to clarify the flow of the program by asserting that r will be set if the second condition is true.

Programmers may assert that the variable, r, is set either by means of a pragma:

#pragma TenDRA set r;

or by using, for example:

SET ( r );

where SET is a keyword which has previously been introduced to stand for the variable setting construct using:

#pragma TenDRA keyword SET for set

(cf. DISCARD above).

6.7.3. Exhaustive switch statements

A special case of a flow control construct which may be used to set the value of a variable is a switch statement. Consider the program:

char *f ( int n )
{
	char *r;
	switch ( n ) {
		case 1 : r = "one"; break;
		case 2 : r = "two"; break;
		case 3 : r = "three"; break;
	}
	return ( r );
}

This leads to an error indicating that r is used but not set, because it is not set if n lies outside the three cases in the switch statement. However, the programmer might know that f is only ever called with these three values, and hence that r is always set before it is used. This information could be expressed by asserting that r is set at the end of the switch construct (see above), but it would be better to express the cause of this setting rather than just its effect. The reason why r is always set is that the switch statement is exhaustive - there are case statements for all the possible values of n.

Programmers may assert that a switch statement is exhaustive by means of a pragma immediately following it. For example, in the above case it would take the form:

....
switch ( n )
#pragma TenDRA exhaustive
{
	case 1 : r = "one"; break;
	....

Again, there is an option to introduce a keyword, EXHAUSTIVE say, for exhaustive switch statements using:

#pragma TenDRA keyword EXHAUSTIVE for exhaustive

Using this form, the example program becomes:

switch ( n ) EXHAUSTIVE {
	case 1 : r = "one"; break;

In order to maintain compatibility with existing compilers, a dummy definition for EXHAUSTIVE must be introduced for them to use. For example, a complete definition of EXHAUSTIVE might be:

#ifdef __TenDRA__
#pragma TenDRA keyword EXHAUSTIVE for exhaustive
#else
#define EXHAUSTIVE
#endif

6.7.4. Switch statements

A switch statement is said to be exhaustive if its control statement is guaranteed to take one of the values of its case labels, or if it has a default label. The TenDRA C and C++ producers allow a switch statement to be asserted to be exhaustive using the syntax:

switch ( cond ) EXHAUSTIVE {
	// switch statement body
}

where EXHAUSTIVE is either the directive:

#pragma TenDRA exhaustive

or a keyword introduced using:

#pragma TenDRA keyword identifier for exhaustive

Knowing whether a switch statement is exhaustive or not means that checks relying on flow analysis (including variable usage checks) can be applied more precisely.

In certain circumstances it is possible to deduce whether a switch statement is exhaustive or not. For example, the directive:

#pragma TenDRA enum switch analysis on

enables a check on switch statements on values of enumeration type. Such statements should be exhaustive, either explicitly by using the EXHAUSTIVE keyword or declaring a default label, or implicitly by having a case label for each enumerator. Conversely, the value of each case label should equal the value of an enumerator. For the purposes of this check, boolean values are treated as if they were declared using an enumeration type of the form:

enum bool { false = 0, true = 1 } ;

A common source of errors in switch statements is the fall-through from one case or default statement to the next. A check for this can be enabled using:

#pragma TenDRA fall into case allow

case or default labels where fall-through from the previous statement is intentional can be marked by preceding them by a keyword, FALL_THRU say, introduced using the directive:

#pragma TenDRA keyword identifier for fall into case

6.7.5. Non-returning functions

Consider a modified version of the program above, in which calls to f with an argument other than 1, 2 or 3 cause an error message to be printed:

extern void error ( const char * );
char *f ( int n )
{
	char *r;
	switch ( n ) {
		case 1 : r = "one"; break;
		case 2 : r = "two"; break;
		case 3 : r = "three"; break;
		default : error( "Illegal value" );
	}
	return ( r );
}

This causes an error because, in the default case, r is not set before it is used. However, depending on the semantics of the function, error, the return statement may never be reached in this case. This is because the fact that a function returns void can mean one of two distinct things:

That the function does not return a value. This is the usual meaning of void.
That the function never returns, for example the library function, exit, uses void in this sense.

If error never returns, then the program above is correct; otherwise, an unset value of r may be returned.

Therefore, we need to be able to declare the fact that a function never returns. This is done by introducing a new type to stand for the non-returning meaning of void (some compilers use volatile void for this purpose). This is done by means of the pragma:

#pragma TenDRA type VOID for bottom

to introduce a type VOID (although any identifier may be used) with this meaning. The declaration of error can then be expressed as:

extern VOID error ( const char * );

In order to maintain compatibility with existing compilers a definition of VOID needs to be supplied. For example:

#ifdef __TenDRA__
#pragma TenDRA type VOID for bottom
#else
typedef void VOID;
#endif

The largest class of non-returning functions occurs in the various standard APIs - for example, exit and abort. The TenDRA descriptions of these APIs contain this information. The information that a function does not return is taken into account in all flow analysis contexts. For example, in:

#include <stdlib.h>

int f ( int n )
{
	exit ( EXIT_FAILURE );
	return ( n );
}

n is unused because the return statement is not reached (a fact that can also be determined by the unreachable code analysis in section 5.2).

6.7.6. Return statements

In C, but not in C++, it is possible to have a return statement without an expression in a function which does not return void. It is possible to enable this behaviour using the directive:

#pragma TenDRA incompatible void return allow

Note that this check includes the implicit return caused by falling off the end of a function. The effect of such a return statement is undefined. The C++ rule that falling off the end of main is equivalent to returning a value of 0 overrides this check.