Name
tdfc2dump — tdfc2 C/C++ symbol table dump format
Overview
tdfc2 produces an extra output file, called a dump output file, for each translation unit processed. This file is in the form given by the symbol table output specification in Annex E, and contains information about the objects declared, defined or used within an application. Each object encountered during processing is assigned a unique reference number allowing uses of the object to be traced back to the declaration and definition of the object.
In the default mode only external declaration and definition information is written to each dump file. The file to be used as the symbol table output file, plus details of what information is to be included in the dump file can be specified by passing various flags using the -d
command-line option to tcpplus; these flags are documented by tdfc2.
The symbol table dump provides a method whereby third party tools can interface with the C and C++ producers. The producer outputs information on the identifiers declared within a source file, their uses etc. into a file which can then be post-processed by a separate tool. Any error messages and warnings can also be included in this file, allowing more sophisticated error presentation tools to be written.
The dump information is currently used for four main purposes: detecting included header files from which nothing is used within the translation unit; production of lint-like error output; API usage analysis and type checking between translation units.
The format of the dump file is described below; a summary of the syntax is given in tdfc2dump.
Lexical elements
A symbol table dump file consists of a sequence of characters giving information on identifiers, errors etc. arising from a translation unit. The fundamental lexical tokens are a number, consisting of a sequence of decimal digits, and a string, consisting of a sequence of characters enclosed in angle braces. A string can have one of two forms:
string : <characters> &number<characters>
In the first form, the characters are terminated by the first >
character encountered. In the second form, the number of characters is given by the preceding number. No white space is allowed either before or after the number. To aid parsers, the C++ producer always uses the second form for strings containing more than 100 characters. There are no escape characters in strings; the characters can contain any characters, including newlines and #
, except that the first form cannot contain a >
character.
Space, tab and newline characters are white space. Comments begin with #
and run to the end of the line. Comments are treated as white space. All other characters are treated as distinct lexical tokens.
Overall syntax
A symbol table dump file takes the form of a list of commands of various kinds conveying information on the analysed file. This can be represented as follows:
dump-file : command-list? command-list : command command-list? command : version-command identifier-command scope-command override-command base-command api-command template-command promotion-command error-command path-command file-command include-command string-command
The various kinds of command are discussed below. The first command in the dump file should be of the form:
version-command : V number number string
where the two numbers give the version of the dump file format (the version described here is 1.1 so both numbers should be 1) and the string gives the language being represented, for example, <C++>
.
File locations
A location within a source file can be specified using three numbers and two strings. These give respectively, the column number, the line number taking #line
directives into account, the line number not taking #line
directives into account, the file name taking #line
directives into account, and the file name not taking #line
directives into account. Any or all of the trailing elements can be replaced by *
to indicate that they have not changed relative to the last location given. Note that for the two line numbers, unchanged means that the difference of the line numbers, taking #line
directives into account or not, is unchanged. Thus:
location : number number number string string number number number string * number number number * number number * number * *
Note that there is a concept of the current file location, relative to which other locations are given. The initial value of the current file location is undefined. Unless otherwise stated, all location elements update the current file location.
Identifier names
Each identifier is represented in the symbol table dump by a unique number. The number representing an identifier is introduced in the first declaration or use of that identifier and thereafter the number alone is used to denote the identifier:
identifier : number = identifier-name access? scope-identifier number
The identifier name is given by:
identifier-name : string C type D type O string T type
denoting respectively, a simple identifier name, a constructor for a type, a destructor for a type, an overloaded operator function name, and a conversion function name. The empty string is used for anonymous identifiers.
The optional identifier access is given by:
access : N B P
denoting public
, protected
and private
respectively. An absent access is equivalent to public
. Note that all identifiers, not just class members, can have access specifiers; however the access of a non-member is always public
.
The scope (i.e. class, namespace, block etc.) in which an identifier is declared is given by:
scope-identifier : identifier *
denoting either a named or an unnamed scope.
Identifier uses
Each declaration or use of an identifier is represented by a command of the form:
identifier-command : D identifier-info type-info M identifier-info type-info T identifier-info type-info Q identifier-info U identifier-info L identifier-info C identifier-info W identifier-info type-info
where:
identifier-info : identifier-key location identifier
gives the kind of identifier being declared or used, the location of the declaration or use, and the number associated with the identifier. Each declaration may, depending on the identifier-key, associate various type-info with the identifier, giving its type etc.
The various kinds of identifier-command are described below. Any can be preceded by I
to indicate an implicit declaration or use. D
denotes a definition. M
(make) denotes a declaration. T
denotes a tentative definition (C only). Q
denotes the end of a definition, for those identifiers such as classes and functions whose definitions may be spread over several lines. U
denotes an undefine operation (such as #undef
for macro identifiers). C
denotes a call to a function identifier; L
(load) denotes other identifier uses. Finally W
denotes implicit type information such as the C producer gleans from its weak prototype analysis (see C/C++ Checker Reference Manual).
The various identifier-keys are their associated type-info fields are given by the following table:
Key | Type information | Description |
---|---|---|
K | * | keyword |
MO | sort | object macro |
MF | sort | function macro |
MB | sort | built-in macro |
TC | type | class tag |
TS | type | structure tag |
TU | type | union tag |
TE | type | enumeration tag |
TA | type | typedef name |
NN | * | namespace name |
NA | scope-identifier | namespace alias |
VA | type | automatic variable |
VP | type | function parameter |
VE | type | extern variable |
VS | type | static variable |
FE | type identifier? | extern function |
FS | type identifier? | static function |
FB | type identifier? | built-in operator function |
CF | type identifier? | member function |
CS | type identifier? | static member function |
CV | type identifier? | virtual member function |
CM | type | data member |
CD | type | static data member |
E | type | enumerator |
L | * | label |
XO | sort | object token |
XF | sort | procedure token |
XP | sort | token parameter |
XT | sort | template parameter |
The function identifier keys can optionally be followed by C
indicating that the function has C linkage, and I
indicating that the function is inline. By default, functions declared in a C++ dump file have C++ linkage and functions declared in a C dump file have C linkage. The optional identifier which forms part of the type-info of these functions is used to form linked lists of overloaded functions.
Identifier scopes
Each identifier belongs to a scope, called its parent scope, in which it is declared. For example, the parent of a member of a class is the class itself. This information is expressed in an identifier declaration using a scope-identifier. In addition to the obvious scopes such as classes and namespaces, there are other scopes such as blocks in function definitions. It is possible to introduce dummy identifiers to name such scopes. The parent of such a dummy identifier will be the enclosing scope identifier, so these dummy identifiers naturally represent the block structure. The parent of the top-level block in a function definition can be considered to be the function itself.
Information on the start and end of such scopes is given by:
scope-command : SS scope-key location identifier SE scope-key location identifier
where:
scope-key : N S B D H CT CF CC
gives the kind of scope involved: a namespace, a class, a block, some other declarative scope, a declaration block (see below), a true conditional scope, a false conditional scope or a target dependent conditional scope.
A declaration block is a sequence of declarations enclosed in directives of the form:
#pragma TenDRA declaration block identifier begin .... #pragma TenDRA declaration block end
This allows the sequence of declarations to be associated with the given identifier in the symbol dump file. This technique is used in the API description files to aid analysis tools in determining which declarations are part of the API.
Other identifier information
Other information associated with an identifier may be expressed using other dump commands. For example:
override-command : O identifier identifier
is used to express the fact that the two identifiers are virtual member functions, the first of which overrides the second.
The command:
base-command : B identifier-key identifier base-graph base-graph : base-class base-class ( base-list ) base-class : number = V? access? type-name number : base-list : base-graph base-list?
associates a base class graph with a class identifier. Any class which does not have an associated base-command can be assumed to have no base classes. Each node in the graph is a type-name with an associated list of base classes. A V
is used to indicate a virtual base class. Each node is numbered; duplicate numbers are used to indicate bases identified via the virtual base class structure. Any base class can then be referred to as:
base-number : number : type-name
indicating the base class with the given number in the given class.
The command:
api-command : X identifier-key identifier string
associates the external token name given by the string with the given tokenised identifier.
The command:
template-command : Z identifier-key identifier token-application specialise-info
is used to introduce an identifier corresponding to an instance of a template, token-application. This instance may correspond to a specialisation of the primary template; this information is represented by:
specialise-info : identifier token-application *
where *
indicates a non-specialised instance.
Types
The built-in types are represented in the symbol table dump as follows:
Type | Encoding | Type | Encoding | |
---|---|---|---|---|
char | c | float | f | |
signed char | Sc | double | d | |
unsigned char | Uc | long double | r | |
signed short | s | void | v | |
unsigned short | Us | (bottom) | u | |
signed int | i | bool | b | |
unsigned int | Ui | ptrdiff_t | y | |
signed long | l | size_t | z | |
unsigned long | Ul | wchar_t | w | |
signed long long | x | – | – | |
unsigned long long | Ux | – | – |
Named types (classes, enumeration types etc.) can be represented by the corresponding identifier or token application:
type-name : identifier token-application
Composite and qualified types are represented in terms of their subtypes as follows:
Type | Encoding |
---|---|
const type | C type |
volatile type | V type |
pointer type | P type |
reference type | R type |
pointer to member type | M type-name : type |
function type | F type parameter-types |
array type | A nat? : type |
bitfield type | B nat : type |
template type | t parameter-list? : type |
promotion type | p type |
arithmetic type | a type : type |
integer literal type | n lit-base? lit-suffix? |
weak function prototype (C only) | W type parameter-types |
weak parameter type (C only) | q type |
Other types can be represented by their textual representation using the form Q
string, or by *
, indicating an unknown type.
The parameter types for a function type are represented as follows:
parameter-types : : exception-spec? func-qualifier? : . exception-spec? func-qualifier? : . exception-spec? func-qualifier? . , type parameter-types
where the ::
form indicates that there are no further parameters, the .:
form indicates that the parameters are terminated by an ellipsis, and the ..
form indicates that no information is available on the further parameters (this can only happen with non-prototyped functions in C). The function qualifiers are given by:
func-qualifier : C func-qualifier? V func-qualifier?
representing const
and volatile
member functions. The function exception specifier is given by:
exception-spec : ( exception-list? ) exception-list : type type , exception-list
with an absent exception specifier, as in C++, indicating that any exception may be thrown.
Array and bitfield sizes are represented as follows:
nat : + number - number identifier token-application string
where a string is used to hold a textual representation of complex values.
Template types are represented by a list of template parameters, which will have previously been declared using the XT
identifier key, followed by the underlying type expressed in terms of these parameters. The parameters are represented as follows:
parameter-list : identifier identifier , parameter-list
Integer literal types are represented by the value of the literal followed by a representation of the literal base and suffix. These are given by:
lit-base : O X
representing octal and hexadecimal literals respectively (decimal is the default), and:
lit-suffix : U l Ul x Ux
representing the U
, L
, UL
, LL
and ULL
suffixes respectively.
Target dependent integral promotion types are represented using p
, so for example the promotion of unsigned short
is represented as pUs
. Information on the other cases, where the promotion type is known, can be given in a command of the form:
promotion-command : P type : type
Thus the fact that the promotion of short
is int
would be expressed by the command Ps:i
.
Sorts
A sort in the symbol table dump corresponds to the sort of a token declared in the The Pragma Token Syntax #pragma token
syntax. Expression tokens are represented as follows:
expression-sort : ZEL type ZER type ZEC type ZN
corresponding to lvalue
, rvalue
and const
EXP
tokens of the given type, and NAT
or INTEGER
tokens, respectively. Statement tokens are represent by:
statement-sort : ZS
Type tokens are represented as follows:
type-sort : ZTO ZTI ZTF ZTA ZTP ZTS ZTU
corresponding to TYPE
, VARIETY
, FLOAT
, ARITHMETIC
, SCALAR
, STRUCT
or CLASS
, and UNION
token respectively. There are corresponding TAG
forms:
tag-type-sort : ZTTS ZTTU
Member tokens are represented using:
member-sort : ZM type : type-name
where the first type gives the member type and the second gives the parent structure or union type.
Procedure tokens can be represented using:
proc-sort : ZPG parameter-list? ; parameter-list? : sort ZPS parameter-list? : sort
The first form corresponds to the more general form of PROC
token, that expressed using { .... | .... }
, which has separate lists of bound and program parameters. These token parameters will have previously been declared using the XP
identifier key. The second form corresponds to the case where the bound and program parameter lists are equal, that expressed as a PROC
token using ( .... )
. A more specialised version of this second form is a FUNC
token, which is represented as:
func-sort : ZF type
As noted above, template parameters are represented by a sort. Template type parameters are represented by ZTO
, while template expression parameters are represent by ZEC
(recall that such parameters are always constant expressions). The remaining case, template template parameters, can be represented as:
template-sort : ZTt parameter-list? :
Finally, the number of parameters in a macro definition is represented by a sort of the form:
macro-sort : ZUO ZUF number
corresponding to a object-like macro and a function-like macro with the given number of parameters, respectively.
Token applications
Given an identifier representing a PROC
token or a template, an application of that token or an instance of that template can be represented using:
token-application : T identifier , token-argument-list :
where the token or template arguments are given by:
token-argument-list : token-argument token-argument , token-argument-list
Note that the case where there are no arguments is generally just represented by identifier; this case is specified separately in the rest of the grammar.
A token-argument can represent a value of any of the sorts listed above: expressions, integer constants, statements, types, members, functions and templates. These are given respectively by:
token-argument : E expression N nat S statement T type M member F identifier C identifier
where:
expression : nat statement : expression member : identifier string
Errors
By default the error messages generated by the checker are written in a simple ascii form to stderr. If instead, the errors are written to the dump file using the -sym:e
option mentioned above, an alternative lint-like error output may be generated by processing the dump files. The lint-like errors are enabled by passing the -Ycompact
flag to tcc.
Each error in the tcpplus make_err error catalogue is represented by a number. These numbers happen to correspond to the position of the error within the catalogue, but in general this need not be the case. The first use of each error introduces the error number by associating it with a string giving the error name. This has the form cpp.
error where error gives an error name from the C++ (cpp
) error catalogue. Thus:
error-name : number = string number
Each error message written to the symbol table dump has the form:
error-command : ES location error-info EW location error-info EI location error-info EF location error-info EC error-info EA error-argument
denoting constraint errors, warnings, internal errors, fatal errors, continuation errors and error arguments respectively. Note that an error message may consist of several components; the initial error plus a number of continuation errors. Each error message may also have a number of error argument associated with it. This error information is given by:
error-info : error-name number number
where the first number gives the number of error arguments which should be read, and the second is nonzero to indicate that a continuation error should be read.
Each error argument has one of the forms:
error-argument : B base-number C scope-identifier E expression H identifier-name I identifier L location N nat S string T type V number V - number
corresponding to the various syntactic categories described above. Note that a location error argument, while expressed relative to the current file location, does not change this location.
File inclusions
It is possible to include information on header files within the symbol table dump. Firstly a number is associated with each directory on the #include
search path:
path-command : FD number = string string?
The first string gives the directory pathname; the second, if present, gives the associated directory name as specified in the -N
command-line option.
Now the start and end of each file are marked using:
file-command : FS location directory FE location
where directory gives the number of the directory in the search path where the file was found, or *
if the file was found by other means. It is worth noting that if, for example, a function definition is the last item in a file, the FE
command will appear in the symbol table dump before the QFE
command for the end of the function definition. This is because lexical analysis, where the end of file is detected, takes place before parsing, where the end of function is detected.
A #include
directive, whether explicit or implicit, can be represented using:
include-command : FIA location string FIQ location string FIN location string FIS location string FIE location string FIR location
the first three corresponding to header names of the forms <....>
, "...."
and [....]
respectively, the next two corresponding to start-up and end-up files, and the final form being used to resume the original file after the #include
directive has been processed.
String literals
It is possible to dump information on string literals to the symbol table dump file using the commands:
string-command : A location string AC location string AL location string ACL location string
representing string literals, character literals, wide string literals and wide character literals respectively. The given string gives the string text.
Basics
digit : one of 0 1 2 3 4 5 6 7 8 9 digit-sequence : digit digit-sequence number : digit-sequence string : <characters> &digit-sequence<characters> location : number number number string string number number number string * number number number * number number * number * *
Dump commands
dump-file : command-list? command-list : command command command-list command : B base-definition base class graph error-command file-command I identifier-command implicit declarations etc. identifier-command scope-command O identifier identifier overriding virtual function P type : type promotion type specifier string-command V number number string version number X api-info external token name Z template-infotemplate instance
E.3 API information
api-info : identifier key identifier string
Base definitions
virtual : V base-class : number = virtual? access? type-name number : base-list : base-graph base-graph base-list base-graph : base-class base-class ( base-list ) base-definition : identifier-key number base-graph base-number : number : type-name
Error commands
error-command : EA error-argument error argument EC error-info continuation error EF location error-info fatal error EI location error-info internal error ES location error-info serious error EW location error-info warning error-info : error-name number number error-name : number = string number error-argument : B base-number C scope-identifier E exp H hashid I identifier L location N nat S string T type V number V -number
File commands
file-command : FD number = string stringopt inclusion directory FE location file end FIA location string file include with <> FIE location string include end-up FIN location string file include with [] FIQ location stringfile include with "" FIR location resume file FIS location string include startup FS location directory file start directory : number *
Identifier commands
identifier-command : C identifier-info call identifier D identifier-info type-info define identifier L identifier-info use identifier M identifier-info type-info declare identifier Q identifier-info end identifier definition T identifier-info type-info tentatively define identifier U identifier-info undefine identifier W identifier-info type-info weak prototype identifier-info : identifier-key location identifier identifier-key : CD static data member CF function-key member function CM data member CS function-key static member function CV function-key virtual member function E enumerator FB function-key builtin function FE function key external function FS function-key static function K keyword L label MB built-in macro MF function-like macro MO object-like macro NA namespace alias NN namespace name TA type alias TC class tag TE enum tag TS struct tag TU union tag VA automatic variable VE extern variable VP function parameter VS static variable XF procedure token XO object token XP token parameter XT template parameter function-key : empty C function-key C linkage I function-key inline identifier : number = hashid accessopt scope-identifier number hashid : string simple name C type constructor D type destructor O string operator T type conversion access : B protected N public P private
Scope commands
scope-command : SE scope-key location identifier end scope SS scope-key location identifier start scope scope-key : B block scope CC conditional scope N namespace scope CF false conditional scope CT true conditional scope D other declarative scope H header scope S class scope scope-identifier : identifier *
String command
string-command : A location string string literal AC location string character literal ACL location string wide character literal AL location string wide string literal
Templates
specialisation-info : token-application * template-info : identifier-key identifier token-application specialisation-info
Token sort information
sort : ZEC type-info constant expression ZEL type-info lvalue expression ZER type-info rvalue expression ZF type-info function ZM type-info : type-name member ZN integral constant ZPS parameter-list-opt : sort procedure type () ZPG parameter-list-opt ; parameter-list-opt:sort procedure type {} ZS statement ZTA arithmetic type ZTF floating type ZTI integral type ZTO opaque type ZTP scalar type ZTS structure type ZTt parameter-list-opt : template type ZTTS structure tag ZTTU union tag ZTU union type ZUF number function macro ZUO object macro exp : nat member : identifier string statement : exp token-argument : C identifier template argument E exp expression argument F identifier function argument M member member argument N nat integer constant argument S statement statement argument T type-info type argument token-argument-list : token-argument token-argument , token-argument-list token-application : T identifier , token-argument-list :
E.12 Type information
type-info : scope-identifier for namespace alias sort for token, macro etc. type for variable etc. type identifier-opt for overloaded function type : qualifieropt unqualified-type qualifier : C const V volatile CV const volatile unqualified-type : type-name token-application c char s short i int l long x long long f float d double r long double v void b bool w wchar_t Sc signed char Uc unsigned char Us unsigned short Ui unsigned int Ul unsigned long Ux unsigned long long u bottom y ptrdiff_t z size_t a type : type arithmetic type n nat literal type p type promoted type t parameter-listopt : type template type A natopt : type array type B nat : type bitfield type F type parameter-types function type M type-name : type pointer to member type P type pointer type R type reference type W type parameter-types weak function type Q string quoted type * unknown type parameter-types : : exceptionopt qualifieropt : no parameters . exceptionopt qualifieropt : ellipsis . exceptionopt qualifieropt . unknown , type parameter-types exception : ( exception-listopt ) exception-list : type type, exception-list parameter-list : identifier identifier , parameter-list type-name : identifier nat : +number -number string identifier token-application
C++ Symbol table dump syntax
The following gives a summary of the syntax for the symbol table dump file (version 1.1):
dump-file : command-list? command-list : command command-list? command : version-command identifier-command scope-command override-command base-command api-command template-command promotion-command error-command path-command file-command include-command string-command version-command : V number number string location : number number number string string number number number string * number number number * number number * number * * identifier : number = identifier-name access? scope-identifier number identifier-name : string C type D type O string T type access : N B P scope-identifier : identifier * identifier-command : D identifier-info type-info M identifier-info type-info T identifier-info type-info Q identifier-info U identifier-info L identifier-info C identifier-info W identifier-info type-info I identifier-command identifier-info : identifier-key location identifier identifier-key : K MO MF MB TC TS TU TE TA NN NA VA VP VE VS FE function-key? FS function-key? FB function-key? CF function-key? CS function-key? CV function-key? CM CD E L XO XF XP XT function-key : C function-key? I function-key? type-info : type identifier? sort scope-identifier * scope-command : SS scope-key location identifier SE scope-key location identifier scope-key : N S B D H CT CF CC override-command : O identifier identifier base-command : B identifier-key identifier base-graph base-graph : base-class base-class ( base-list ) base-class : number = V? access? type-name number : base-list : base-graph base-list? base-number : number : type-name api-command : X identifier-key identifier string template-command : Z identifier-key identifier token-application specialise-info specialise-info : identifier token-application * type : type-name c s i l x b w y z f d r v u Sc Uc Us Ui Ul Ux C type V type P type R type M type-name : type F type parameter-types A nat? : type B nat : type t parameter-list? : type p type a type : type n lit-base? lit-suffix? W type parameter-types q type Q string * type-name : identifier token-application parameter-types : : exception-spec? func-qualifier? : . exception-spec? func-qualifier? : . exception-spec? func-qualifier? . , type parameter-types func-qualifier : C func-qualifier? V func-qualifier? exception-spec : ( exception-list? ) exception-list : type type , exception-list nat : + number - number identifier token-application string parameter-list : identifier identifier , parameter-list lit-base : O X lit-suffix : U l Ul x Ux promotion-command : P type : type sort : expression-sort statement-sort type-sort tag-type-sort member-sort proc-sort func-sort template-sort macro-sort expression-sort : ZEL type ZER type ZEC type ZN statement-sort : ZS type-sort : ZTO ZTI ZTF ZTA ZTP ZTS ZTU tag-type-sort : ZTTS ZTTU member-sort : ZM type : type-name proc-sort : ZPG parameter-list? ; parameter-list? : sort ZPS parameter-list? : sort func-sort : ZF type template-sort : ZTt parameter-list? : macro-sort : ZUO ZUF number token-application : T identifier , token-argument-list : token-argument-list : token-argument token-argument , token-argument-list token-argument : E expression N nat S statement T type M member F identifier C identifier expression : nat statement : expression member : identifier string error-name : number = string number error-command : ES location error-info EW location error-info EI location error-info EF location error-info EC error-info EA error-argument error-info : error-name number number error-argument : B base-number C scope-identifier E expression H identifier-name I identifier L location N nat S string T type V number V - number path-command : FD number = string string? directory : number * file-command : FS location directory FE location include-command : FIA location string FIQ location string FIN location string FIS location string FIE location string FIR location string-command : A location string AC location string AL location string ACL location string