Specifying Objects

3. Specifying Objects

3.1. Internal and External Names
3.2. Namespaces
3.3. +FUNC
3.4. +EXP and +CONST
3.5. +MACRO
3.6. +STATEMENT
3.7. +DEFINE
3.8. +TYPE
3.9. +TYPEDEF
3.10. +FIELD
3.11. +NAT
3.12. +ENUM
3.13. +TOKEN

The main body of any tspec description of an API consists of a list of object specifications. Most of this section is concerned with the various tspec constructs for specifying objects of various kinds, however we start with a few remarks on object names.

3.1. Internal and External Names

All objects specified using tspec actually have two names. The first is the internal name by which it is identified within the program, the second is the external name by which the TDF construct (actually a token) representing this object is referred to for the purposes of TDF linking. The internal names are normal C identifiers and obey the normal C namespace rules (indeed one of the roles of tspec is to keep track of these namespaces). The external token name is constructed by tspec from the internal name.

tspec has two strategies for making up these token names. The first, which is default, is to use the internal name as the external name (there is an exception to this simple rule, namely field selectors - see section 4.9). The second, which is preferred for standard APIs, is to construct a "unique name" from the API name, the header and the internal name. For example, under the first strategy, the external name of the type FILE specified in c/c89:stdio.h would be FILE, whereas under the second it would be c_c89.stdio.FILE. The unique name strategy may be specified by passing the -u command-line option to tspec or by setting the UNIQUE property to 1 (see section 5.4).

Both strategies involve flattening the several C namespaces into the single TDF token namespace, which can lead to clashes. For example, in posix:sys/stat.h both a structure, struct stat, and a procedure, stat, are specified. In C the two uses of stat are in different namespaces and so present no difficulty, however they are mapped onto the same name in the TDF token namespace. To work round such difficulties, tspec allows an alternative external form to be specified. When the object is specified the form:

iname | ename

may be used to specify the internal name iname and the external name ename.

For example, in the stat case above we could distinguish between the two uses as follows:

+TYPE struct stat | struct_stat ;
+FUNC int stat ( const char *, struct stat * ) ;

With simple token names the token corresponding to the structure would be called struct_stat, whereas that corresponding to the procedure would still be stat. With unique token names the names would be posix.stat.struct_stat and posix.stat.stat respectively.

Very occasionally it may be necessary to precisely specify an external token name. This can be done using the form:

iname | "ename"

which makes the object iname have external name ename regardless of the naming strategy used.

3.2. Namespaces

Basically the legal identifiers in tspec (for both internal and external names) are the same as those in C - strings of upper and lower case letters, decimal digits or underscores, which do not begin with a decimal digit. However there is a second class of local identifiers - those consisting of a tilde followed by any number of letters, digits or underscores - which are intended to indicate objects which are local to the API description and should not be visible to any application using the API. For example, to express the specification that t is a pointer type, we could say that there is a locally named type to which t is a pointer:

+TYPE ~t ;
+TYPEDEF ~t *t ;

Finally it is possible to cheat the tspec namespaces. It may actually be legal to have two objects of the same name in an API - they may lie in different branches of a conditional compilation, or not be allowed to coexist. To allow for this, tspec allows version numbers, consisting of a decimal pointer plus a number of digits, to be appended to an identifier name when it is first introduced. These version numbers are purely to tell tspec that this version of the object is different from a previous version with a different version number (or indeed without any version number). If more than one version of an object is specified then which version is retrieved by tspec in any look-up operation is undefined.

3.3. +FUNC

The simplest form of object to specify is a procedure. This is done by means of:

+FUNC prototype ;

where prototype is the full C prototype of the procedure being declared. For example, c/c89:string.h contains:

+FUNC char *strcpy ( char *, const char * ) ;
+FUNC int strcmp ( const char *, const char * ) ;
+FUNC size_t strlen ( const char * ) ;

Strictly speaking, +FUNC means that the procedure may be implemented by a macro, but that there is an underlying library function with the same effect. The exception is for procedures which take a variable number of arguments, such as:

+FUNC int fprintf ( FILE *, const char *, ... ) ;

which cannot be implemented by macros. Occasionally it may be necessary to specify that a procedure is only a library function, and cannot be implemented by a macro. In this case the form:

+FUNC (extern) prototype ;

should be used. Thus:

+FUNC (extern) char *strcpy ( char *, const char * ) ;

would mean that strcpy was only a library function and not a macro.

Increasingly standard APIs are using prototypes to express their procedures. However it still may be necessary on occasion to specify procedures declared using old style declarations. In most cases these can be easily transcribed into prototype declarations, however things are not always that simple. For example, xpg3:stdlib.h declares malloc by the old style declaration:

void *malloc ( sz )
size_t sz ;

which is in general different from the prototype:

void *malloc ( size_t ) ;

In the first case the argument is passed as the integral promotion of size_t, whereas in the second it is passed as a size_t . In general we only know that size_t is an unsigned integral type, so we cannot assert that it is its own integral promotion. One possible solution would be to use the C to TDF producer's weak prototypes (see reference 3). The form:

+FUNC (weak) void *malloc ( size_t ) ;

means that malloc is a library function returning void * which is declared using an old style declaration with a single argument of type size_t. (For an alternative approach see section 4.8.)

3.4. +EXP and +CONST

Expressions correspond to constants, identities and variables. They are specified by:

+EXP type exp1, ..., expn ;

where type is the base type of the expressions expi as in a normal C declaration list. For example, in c/c89:stdio.h:

+EXP FILE *stdin, *stdout, *stderr ;

specifies three expressions of type FILE *.

By default all expressions are rvalues, that is, values which cannot be assigned to. If an lvalue (assignable) expression is required its type should be qualified using the keyword lvalue. This is an extension to the C type syntax which is used in a similar fashion to const. For example, c/c89:errno.h says that errno is an assignable lvalue of type int. This is expressed as follows:

+EXP lvalue int errno ;

On the other hand, posix:errno.h states that errno is an external value of type int. As with procedures the (extern) qualifier may be used to express this as:

+EXP (extern) int errno ;

Note that this automatically means that errno is an lvalue, so the lvalue qualifier is optional in this case.

If all the expressions are guaranteed to be literal constants then one of the equivalent forms:

+EXP (const) type exp1, ..., expn ;
+CONST type exp1, ..., expn ;

should be used. For example, in c/c89:errno.h we have:

+CONST int EDOM, ERANGE ;

3.5. +MACRO

The +MACRO construct is similar in form to the +FUNC construct, except that it means that only a macro exists, and no underlying library function. For example, in xpg3:ctype.h we have:

+MACRO int _toupper ( int ) ;
+MACRO int _tolower ( int ) ;

since these are explicitly stated to be macros and not functions. Of course the (extern) qualifier cannot be used with +MACRO.

One thing which macros can do which functions cannot is to return assignable values or to assign to their arguments. Thus it is legitimate for +MACRO constructs to have their return type or argument types qualified by lvalue, whereas this is not allowed for +FUNC constructs. For example, in svid3:curses.h, a macro getyx is specified which takes a pointer to a window and two integer variables and assigns the cursor position of the window to those variables. This may be expressed by:

+MACRO void getyx ( WINDOW *win, lvalue int y, lvalue int x ) ;

3.6. +STATEMENT

The +STATEMENT construct is very similar to the +MACRO construct except that, instead of being a C expression, it is a C statement (i.e. something ending in a semicolon). As such it does not have a return type and so takes one of the forms:

+STATEMENT stmt ;
+STATEMENT stmt ( arg1, ..., argn ) ;

depending on whether or not it takes any arguments. (A +MACRO without any arguments is an +EXP, so the no argument form does not exist for +MACRO.) As with +MACRO, the argument types argi can be qualified using lvalue.

3.7. +DEFINE

It is possible to insert macro definitions directly into tspec using the +DEFINE construct. This has two forms depending on whether the macro has arguments:

+DEFINE name %% text %% ;
+DEFINE name ( arg1, ..., argn ) %% text %% ;

These translate directly into:

#define name text
#define name( arg1, ..., argn ) text

The macro definition, text, consists of any string of characters delimited by double percents. If text is a simple number or a single identifier then the double percents may be omitted. Thus in c/c89:stddef.h.ts we have:

+DEFINE NULL 0 ;

3.8. +TYPE

New types may be specified using the +TYPE construct. This has the form:

+TYPE type1, ..., typen ;

where each typei has one of the forms:

Type	Description
`name`	A general type (about which we know nothing more)
`(struct) name`	A structure type
`(union) name`	A union type
`struct name`	A structure tag
`union name`	A union tag
`(int) name`	An integral type
`(signed) name`	A signed integral type
`(unsigned) name`	An unsigned integral type
`(float) name`	A floating type
`(arith) name`	An arithmetic (integral or floating) type
`(scalar) name`	A scalar (arithmetic or pointer) type

To make clear the distinction between structure types and structure tags, if we have in C:

typedef struct tag { int x, y ; } type ;

then type is a structure type and tag is a structure tag.

For example, in c/c89 we have:

+TYPE FILE ;
+TYPE struct lconv ;
+TYPE (struct) div_t ;
+TYPE (signed) ptrdiff_t ;
+TYPE (unsigned) size_t ;
+TYPE (arith) time_t ;
+TYPE (int) wchar_t ;

3.9. +TYPEDEF

It is also possible to define new types in terms of existing types. This is done using the +TYPEDEF construct, which is identical in form to the C typedef construct. This construct can be used to define pointer, procedure and array types, but not compound structure and union types. For these see section 4.9 below.

For example, in xpg3:search.h we have:

+TYPE struct entry ;
+TYPEDEF struct entry ENTRY ;

There are a couple of special forms. To understand the first, note that C uses void function returns for two purposes. Firstly to indicate that the function does not return a value, and secondly to indicate that the function does not return at all (exit is an example of this second usage). In TDF terms, in the first case the function returns TOP, in the second it returns BOTTOM . tspec allows types to be introduced which have the second meaning. For example, we could have:

+TYPEDEF ~special ( "bottom" ) ~bottom ;
+FUNC ~bottom exit ( int ) ;

meaning that the local type ~bottom is the BOTTOM form of void. The procedure exit, which never returns, can then be declared to return ~bottom rather than void. Other such special types may be added in future.

The second special form:

+TYPEDEF ~promote ( x ) y ;

means that y is an integral type which is the integral promotion of x. x must have previously been declared as an integral type. This gives an alternative approach to the old style procedure declaration problem described in section 4.2. Recall that:

void *malloc ( sz )
size_t sz ;

means that malloc has one argument which is passed as the integral promotion of size_t. This could be expressed as follows:

+TYPEDEF ~promote ( size_t ) ~size_t ;
+FUNC void *malloc ( ~size_t ) ;

introducing a local type to stand for the integral promotion of size_t.

3.10. +FIELD

Having specified a structure or union type, or a structure or union tag, we may wish to specify certain fields of this structure or union. This is done using the +FIELD construct. This takes the form:

+FIELD type {
ftype field1, ..., fieldn ;
....
} ;

where type is the structure or union type and field1, ..., fieldn are field selectors derived from the base type ftype as in a normal C structure definition. type may have one of the forms:

Type	Description
`(struct) name`	A structure type
`(union) name`	A union type
`struct name`	A structure tag
`union name`	A union tag
`name`	A previously declared structure or union type

Except in the final case (where it is not clear if type is a structure or a union), it is not necessary to have previously introduced type using a +TYPE construct - this declaration is implicit in the +FIELD construct.

For example, in c/c89:time.h we have:

+FIELD struct tm {
int tm_sec ;
int tm_min ;
int tm_hour ;
int tm_mday ;
int tm_mon ;
int tm_year ;
int tm_wday ;
int tm_yday ;
int tm_isdst ;
} ;

meaning that there exists a structure with tag tm with various fields of type int. Any implementation must have these corresponding fields, but they need not be in the given order, nor do they have to comprise the whole structure.

As was mentioned above (in 4.1.1), field selectors form a special case when tspec is making up external token names. For example, in the case above, the token name for the tm_sec field is either tm.tm_sec or c_c89.time.tm.tm_sec, depending on whether or not unique token names are used.

It is possible to have several +FIELD constructs referring to the same structure or union. For example, posix:dirent.h declares a structure with tag dirent and one field, d_name, of this structure. xpg3:dirent.h extends this by adding another field, d_ino.

There is a second form of the +FIELD construct which has more in common with the +TYPEDEF construct. The form:

+FIELD type := {
ftype field1, ..., fieldn ;
....
} ;

means that the type type is defined to be exactly the given structure or union type, with precisely the given fields in the given order.

3.11. +NAT

In the example given in section 4.9, posix:dirent.h specifies that the d_name field of struct dirent is a fixed sized array of characters, but that the size of this array is implementation dependent. We therefore have to introduce a value to stand for the size of this array using the +NAT construct. This has the form:

+NAT nat1, ..., natn ;

where nat1, ..., natn are the array sizes to be declared. The example thus becomes:

+NAT ~dirent_d_name_size ;
+FIELD struct dirent {
char d_name [ ~dirent_d_name_size ] ;
} ;

Note the use of a local variable to stand for a value, namely the array size, which is invisible to the user (see section 4.1.2).

As another example, in c/c89:setjmp.h we know that jmp_buf is an array type. We therefore introduce objects to stand for the type which it is an array of and for the size of the array, and define jmp_buf by a +TYPEDEF command:

+NAT ~jmp_buf_size ;
+TYPE ~jmp_buf_elt ;
+TYPEDEF ~jmp_buf_elt jmp_buf [ ~jmp_buf_size ] ;

Again, local variables have been used for the introduced objects.

3.12. +ENUM

Currently tspec only has limited support for enumeration types. A +ENUM construct is translated directly into a C definition of an enumeration type. The +ENUM construct has the form:

+ENUM etype := {
entry,
....
} ;

where etype is the enumeration type being defined - either a type name or enum etag for some enumeration tag etag - and each entry has one of the forms:

name
name = number

as in a C enumeration type. For example, in xpg3:search.h we have:

+ENUM ACTION := { FIND, ENTER } ;

3.13. +TOKEN

As was mentioned in section 1, the #pragma token syntax is highly complex, and the token descriptions output by tspec form only a small subset of those possible. It is possible to directly access the full #pragma token syntax from tspec using the construct:

+TOKEN name %% text %% ;

where the token name is defined by the sequence of characters text, which is delimited by double percents. This is turned into the token description:

#pragma token text name #

No checks are applied to text. A more sophisticated mechanism for defining complex tokens may be introduced in a later version of tspec.

For example, in c/c89:stdarg.h a token va_arg is defined which takes a variable of type va_list and a type t and returns a value of type t. This is given by:

+TOKEN va_arg %% PROC ( EXP lvalue : va_list : e, TYPE t ) EXP rvalue : t : %% ;

See reference 3 for more details on the token syntax.