Configuration for literals

6. Configuration for literals

6.1. Integer literals
6.2. Character literals
6.3. Writeable String literals
6.4. Concatenation of character string literals and wide character string literals
6.5. Escape sequences

6.1. Integer literals

The rules for finding the type of an integer literal can be described using directives of the form:

#pragma TenDRA integer literal literal-spec

where:

literal-spec :
	literal-base literal-suffix? literal-type-list

literal-base :
	octal
	decimal
	hexadecimal

literal-suffix :
	unsigned
	long
	unsigned long
	long long
	unsigned long long

literal-type-list :
	* literal-type-spec
	integer-literal literal-type-spec | literal-type-list
	? literal-type-spec | literal-type-list

literal-type-spec :
	: type-id
	* allow? : identifier
	* * allow? :

Each directive gives a literal base and suffix, describing the form of an integer literal, and a list of possible types for literals of this form. This list gives a mapping from the value of the literal to the type to be used to represent the literal. There are three cases for the literal type; it may be a given integral type, it may be calculated using a given literal type token (see C/C++ Producer Implementation), or it may cause an error to be raised. There are also three cases for describing a literal range; it may be given by values less than or equal to a given integer literal, it may be given by values which are guaranteed to fit into a given integral type, or it may be match any value. For example:

#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
#pragma TenDRA integer literal decimal 32767 : int | ** : l_i

describes how to find the type of a decimal literal with no suffix. Values less that or equal to 32767 have type int; larger values have target dependent type calculated using the token ~lit_int. Introducing a warning into the directive will cause a warning to be printed if the token is used to calculate the value.

Note that this scheme extends that implemented by the C producer, because of the need for more accurate information in the C++ producer. For example, the specification above does not fully express the ISO rule that the type of a decimal integer is the first of the types int, long and unsigned long which it fits into (it only expresses the first step). However with the C++ extensions it is possible to write:

#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
#pragma TenDRA integer literal decimal ? : int | ? : long |\
		? : unsigned long | ** : l_i

6.2. Character literals

By default, a simple character literal has type int in C and type char in C++. The type of such literals can be controlled using the directive:

#pragma TenDRA++ set character literal : type-id

The type of a wide character literal is given by the implementation defined type wchar_t. By default, the definition of this type is taken from the target machine's <stddef.h> C header (note that in ISO C++, wchar_t is actually a keyword, but its underlying representation must be the same as in C). This definition can be overridden in the producer by means of the directive:

#pragma TenDRA set wchar_t : type-id

for an integral type type-id.

6.3. Writeable String literals

By default, character string literals have type char [n] in C and older dialects of C++, but type const char [n] in ISO C++. Similarly wide string literals have type wchar_t [n] or const wchar_t [n]. Whether string literals are const or not can be controlled using the two directives:

#pragma TenDRA++ set string literal : const
#pragma TenDRA++ set string literal : no const

In the case where literals are const, the array-to-pointer conversion is allowed to cast away the const to allow for a degree of backwards compatibility. The status of this deprecated conversion can be controlled using the directive:

#pragma TenDRA writeable string literal allow

(yes, I know that that should be writable). Note that this directive has a slightly different meaning in the C producer.

The ISO C standard, section 6.1.4, states that if the program attempts to modify a string literal of either form, the behaviour is undefined. Assignments to string literals of the form:

"abc" = '3';

always result in errors. Other attempts to modify members of string literals, e.g.

"abc"[1] = '3';

are permitted in the default checking mode. This behaviour can be changed using:

#pragma TenDRA writeable string literal permit

where permit may be allow, warning or disallow.

6.4. Concatenation of character string literals and wide character string literals

Adjacent string literals tokens of similar types (either both character string literals or both wide string literals) are concatenated at an early stage in parser, however it is unspecified what happens if a character string literal token is adjacent to a wide string literal token. By default this gives an error, but the directive:

#pragma TenDRA unify incompatible string literal allow

can be used to enable the strings to be concatenated to give a wide string literal.

If a ' or " character does not have a matching closing quote on the same line then it is undefined whether an implementation should report an unterminated string or treat the quote as a single unknown character. By default, the C++ producer treats this as an unterminated string, but this behaviour can be controlled using the directive:

#pragma TenDRA unmatched quote allow

The ISO C standard, section 6.1.4, states that if a character string literal is adjacent to a wide character string literal, the behaviour is undefined. By default, this is flagged as an error by the checker. If the pragma:

#pragma TenDRA unify incompatible string literal permit

is used, with permit set to allow or warning the character string literal is converted to a wide character string literal and the strings are concatenated, although in the warning case a warning is output. The disallow version of the pragma restores the default behaviour.

6.5. Escape sequences

By default, if the character following the \ in an escape sequence is not one of those listed in the ISO C or C++ standards then an error is given. This behaviour, which is left unspecified by the standards, can be controlled by the directive:

#pragma TenDRA unknown escape allow

The result is that the \ in unknown escape sequences is ignored, so that \z is interpreted as z, for example. Individual escape sequences can be enabled or disabled using the directives:

#pragma TenDRA++ escape character-literal as character-literal allow
#pragma TenDRA++ escape character-literal disallow

so that, for example:

#pragma TenDRA++ escape 'e' as '\033' allow
#pragma TenDRA++ escape 'a' disallow

sets \e to be the ASCII escape character and disables the alert character \a.

By default, if the value of a character, given for example by a \x escape sequence, does not fit into its type then an error is given. This implementation dependent behaviour can however be controlled by the directive:

#pragma TenDRA character escape overflow allow

the value being converted to its type in the normal way.

The ISO C standard specifies a small set of escape sequences in strings, for example \n as newline. Unknown escape sequences lead to an error in the default mode , however the severity of the error may be altered using:

#pragma TenDRA unknown escape permit

where permit is allow (silently replaces the unknown escape sequence, \z say, by z), warning or disallow.