Token syntax – TenDRA

12. Token syntax

12.1. Token specifications
12.2. Token arguments
12.3. Defining tokens

The C and C++ producers allow place-holders for various categories of syntactic classes to be expressed using directives of the form:

#pragma TenDRA token token-spec

or simply:

#pragma token token-spec

These place-holders are represented as TDF tokens and hence are called tokens. These tokens stand for a certain type, expression or whatever which is to be represented by a certain named TDF token in the producer output. This mechanism is used, for example, to allow C API specifications to be represented target independently. The types, functions and expressions comprising the API can be described using #pragma token directives and the target dependent definitions of these tokens, representing the implementation of the API on a particular machine, can be linked in later. This mechanism is described in detail elsewhere.

A summary of the grammar for the #pragma token directives accepted by the C++ producer is given in tdfc2pragma.

12.1. Token specifications

A token specification is divided into two components, a token-introduction giving the token sort, and a token-identification giving the internal and external token names:

token-spec :
	token-introduction token-identification

token-introduction :
	exp-token
	statement-token
	type-token
	member-token
	procedure-token

token-identification :
	token-namespace? identifier # external-identifier?

token-namespace :
	TAG

external-identifier :
	-
	preproc-token-list

The TAG qualifier is used to indicate that the internal name lies in the C tag namespace. This only makes sense for structure and union types. The external token name can be given by any sequence of preprocessing tokens. These tokens are not macro expanded. If no external name is given then the internal name is used. The special external name - is used to indicate that the token does not have an associated external name, and hence is local to the current translation unit. Such a local token must be defined. White space in the external name (other than at the start or end) is used to indicate that a TDF unique name should be used. The white space serves as a separator for the unique name components.

12.1.1. Expression tokens

Expression tokens are specified as follows:

exp-token :
	EXP exp-storage? : type-id :
	NAT
	INTEGER

representing a expression of the given type, a non-negative integer constant and general integer constant, respectively. Each expression has an associated storage class:

exp-storage :
	lvalue
	rvalue
	const

indicating whether it is an lvalue, an rvalue or a compile-time constant expression. An absent exp-storage is equivalent to rvalue. All expression tokens lie in the macro namespace; that is, they may potentially be defined as macros.

For backwards compatibility with the C producer, the directive:

#pragma TenDRA++ rvalue token as const allow

causes rvalue tokens to be treated as const tokens.

12.1.2. Statement tokens

Statement tokens are specified as follows:

statement-token :
	STATEMENT

All statement tokens lie in the macro namespace.

12.1.3. Type tokens

Type tokens are specified as follows:

type-token :
	TYPE
	VARIETY
	VARIETY signed
	VARIETY unsigned
	FLOAT
	ARITHMETIC
	SCALAR
	CLASS
	STRUCT
	UNION

representing a generic type, an integral type, a signed integral type, an unsigned integral type, a floating point type, an arithmetic (integral or floating point) type, a scalar (arithmetic or pointer) type, a class type, a structure type and a union type respectively.

Floating-point, arithmetic and scalar token types have not yet been implemented correctly in either the C or C++ producers.

12.1.4. Member tokens

Member tokens are specified as follows:

member-token :
	MEMBER access-specifier? member-type-id : type-id :

where an access-specifier of public is assumed if none is given. The member type is given by:

member-type-id :
	type-id
	type-id % constant-expression

where % is used to denote bitfield members (since : is used as a separator). The second type denotes the structure or union the given member belongs to. Different types can have members with the same internal name, but the external token name must be unique. Note that only non-static data members can be represented in this form.

Two declarations for the same MEMBER token (including token definitions) should have the same type, however the directive:

#pragma TenDRA++ incompatible member declaration allow

allows declarations with different types, provided these types have the same size and alignment requirements.

12.1.5. Procedure tokens

Procedure, or high-level, tokens are specified in one of three ways:

procedure-token :
	general-procedure
	simple-procedure
	function-procedure

All procedure tokens (except ellipsis functions - see below) lie in the macro namespace. The most general form of procedure token specifies two sets of parameters. The bound parameters are those which are used in encoding the actual TDF output, and the program parameters are those which are specified in the program. The program parameters are expressed in terms of the bound parameters. A program parameter can be an expression token parameter, a statement token parameter, a member token parameter, a procedure token parameter or any type. The bound parameters are deduced from the program parameters by a similar process to that used in template argument deduction.

general-procedure :
	PROC { bound-toks? | prog-pars? } token-introduction 

bound-toks :
	bound-token
	bound-token , bound-toks

bound-token :
	token-introduction token-namespace? identifier

prog-pars :
	program-parameter
	program-parameter , prog-pars

program-parameter :
	EXP identifier
	STATEMENT identifier
	TYPE type-id
	MEMBER type-id : identifier
	PROC identifier

The simplest form of a general-procedure is one in which the prog-pars correspond precisely to the bound-toks. In this case the syntax:

simple-procedure :
	PROC ( simple-toks? ) token-introduction

simple-toks :
	simple-token
	simple-token , simple-toks

simple-token :
	token-introduction token-namespace? identifier?

may be used. Note that the parameter names are optional.

A function token is specified as follows:

function-procedure :
	FUNC type-id :

where the given type is a function type. This has two effects: firstly a function with the given type is declared; secondly, if the function type has the form:

r ( p1, ...., pn )

a procedure token with sort:

PROC ( EXP rvalue : p1 :, ...., EXP rvalue : pn : ) EXP rvalue : r :

is declared. For ellipsis function types only the function, not the token, is declared. Note that the token behaves like a macro definition of the corresponding function. Unless explicitly enclosed in a linkage specification, a function declared using a FUNC token has C linkage. Note that it is possible for two FUNC tokens to have the same internal name, because of function overloading, however external names must be unique.

The directive:

#pragma TenDRA incompatible interface declaration allow

can be used to allow incompatible redeclarations of functions declared using FUNC tokens. The token declaration takes precedence.

Certain of the more complex examples of PROC tokens such as, for example, tokens with PROC parameters, have not been implemented in either the C or C++ producers.

12.2. Token arguments

As mentioned above, the program parameters for a PROC token are those specified in the program itself. These arguments are expressed as a comma-separated list enclosed in brackets, the form of each argument being determined by the corresponding program parameter.

An EXP argument is an assignment expression. This must be an lvalue for lvalue tokens and a constant expression for const tokens. The argument is converted to the token type (for lvalue tokens this is essentially a conversion between the corresponding reference types). A NAT or INTEGER argument is an integer constant expression. In the former case this must be non-negative.

A STATEMENT argument is a statement. This statement should not contain any labels or any goto or return statements.

A type argument is a type identifier. This must name a type of the correct category for the corresponding token. For example, a VARIETY token requires an integral type.

A member argument must describe the offset of a member or nested member of the given structure or union type. The type of the member should agree with that of the MEMBER token. The general form of a member offset can be described in terms of member selectors and array indexes as follows:

member-offset :
	::? id-expression
	member-offset . ::? id-expression
	member-offset [ constant-expression ]

A PROC argument is an identifier. This identifier must name a PROC token of the appropriate sort.

12.3. Defining tokens

Given a token specification of a syntactic object and a normal language definition of the same object (including macro definitions if the token lies in the macro namespace), the producers attempt to unify the two by defining the TDF token in terms of the given definition. Whether the token specification occurs before or after the language definition is immaterial. Unification also takes place in situations where, for example, two types are known to be compatible. Multiple consistent explicit token definitions are allowed by default when allowed by the language; this is controlled by the directive:

#pragma TenDRA compatible token allow

The default unification behaviour may be modified using the directives:

#pragma TenDRA no_def token-list
#pragma TenDRA define token-list
#pragma TenDRA reject token-list

or equivalently:

#pragma no_def token-list
#pragma define token-list
#pragma ignore token-list

which set the state of the tokens given in token-list. A state of no_def means that no unification is attempted and that any attempt to explicitly define the token results in an error. A state of define means that unification takes place and that the token must be defined somewhere in the translation unit. A state of reject means that unification takes place as normal, but any resulting token definition is discarded and not output to the TDF capsule.

If a token with the state define is not defined, then the behaviour depends on the sort of the token. A FUNC token is implicitly defined in terms of its underlying function, such as:

#define f( a1, ...., an )	( f ) ( a1, ...., an )

Other undefined tokens cause an error. This behaviour can be modified using the directives:

#pragma TenDRA++ implicit token definition allow
#pragma TenDRA++ no token definition allow

respectively.

The primitive operations, no_def, define and reject, can also be expressed using the context sensitive directive:

#pragma TenDRA interface token-list

or equivalently:

#pragma interface token-list

By default this is equivalent to no_def, but may be modified by inclusion using one of the directives:

#pragma TenDRA extend header-name
#pragma TenDRA implement header-name

or equivalently:

#pragma extend interface header-name
#pragma implement interface header-name

These are equivalent to:

#include header-name

except that the form [....] is allowed as a header name. This is equivalent to <....> except that it starts the directory search after the point at which the including file was found, rather than at the start of the path (i.e. it is equivalent to the #include_next directive found in some preprocessors). The effect of the extend directive on the state of the interface directive is as follows:

no_def -> no_def define -> reject reject -> reject

The effect of the implement directive is as follows:

no_def -> define define -> define reject -> reject

That is to say, a implement directive will cause all the tokens in the given header to be defined and their definitions output. Any tokens included in this header by extend may be defined, but their definitions will not be output. This is precisely the behaviour which is required to ensure that each token is defined exactly once in an API library build.

The lists of tokens in the directives above are expressed in the form:

token-list :
token-id token-list?
# preproc-token-list

where a token-id represents an internal token name:

token-id :
token-namespace? identifier
type-id . identifier

Note that member tokens are specified by means of both the member name and its parent type. In this type specifier, TAG, rather than class, struct or union, may be used in elaborated type specifiers for structure and union tokens. If the token-id names an overloaded function then the directive is applied to all FUNC tokens of that name. It is possible to be more selective using the # form which allows the external token name to be specified. Such an entry must be the last in a token-list.

A related directive has the form:

#pragma TenDRA++ undef token token-list

which undefines all the given tokens so that they are no longer visible.

As noted above, a macro is only considered as a token definition if the token lies in the macro namespace. Tokens which are not in the macro namespace, such as types and members, cannot be defined using macros. Occasionally API implementations do define member selector as macros in terms of other member selectors. Such a token needs to be explicitly defined using a directive of the form:

#pragma TenDRA member definition type-id : identifier member-offset

where member-offset is as above.