4. CAPSULEs and UNITs

  1. 4.1. make_capsule and name-spaces
    1. 4.1.1. External linkages
    2. 4.1.2. UNITs
    3. 4.1.3. make_unit
    4. 4.1.4. LINK
  2. 4.2. Definitions and declarations
    1. 4.2.1. Scopes and linking
    2. 4.2.2. Declaration and definition signatures
    3. 4.2.3. STRING

A CAPSULE is typically the result of a single compilation - one could regard it as being the TDF analogue of a Unix .o file. Just as with .o files, a set of CAPSULEs can be linked together to form another. Similarly, a CAPSULE may be translated to make program for some platform, provided certain conditions are met. One of these conditions is obviously that a translator exists for the platform, but there are others. They basically state that any names that are undefined in the CAPSULE can be supplied by the system in which it is to be run. For example, the translator could produce assembly code with external identifiers which will be supplied by some system library.

4.1. make_capsule and name-spaces

The only constructor for a CAPSULE is make_capsule. Its basic function is to compose together UNITs which contain the declarations and definitions of the program. The signature of make_capsule looks rather daunting and is probably best represented graphically.

prop_names capsule_linking external_linkage groups CAPSULE Kinds of UNITs in "groups" "tagdecs" "tagdefs" Number of each kind of name in CAPSULE namespace "tag" "token" "al_tag" 5 6 7 Links of each kind of name in CAPSULE namespace to externals tag LINKEXTERNs token LINKEXTERNs al_tag LINKEXTERNs UNITs of the same kind grouped together in the order given by "prop_names" tagdec UNITs tagdef UNITs
Figure 7. make_capsule

The diagram gives an example of a CAPSULE using the same components as in the following text.

Each CAPSULE has its own name-space, distinct from all other CAPSULEs' name-spaces and also from the name-spaces of its component UNITs (see ). There are several different kinds of names in TDF and each name-space is further subdivided into one for each kind of name. The number of different kinds of names is potentially unlimited but only three are used in core-TDF, namely "tag", "token" and "al_tag". Those names in a "tag" name-space generally correspond to identifiers in normal programs and I shall use these as the paradigm for the properties of them all.

The actual representations of a "tag" name in a given name-space is an integer, described as SORT TDFINT. These integers are drawn from a contiguous set starting from 0 up to some limit given by the constructor which introduces the name-space. For CAPSULE name-spaces, this is given by the capsule_linking parameter of make_capsule:

capsule_linking: SLIST(CAPSULE_LINK)

In the most general case in core-TDF, there would be three entries in the list introducing limits using make_capsule_link for each of the "tag", "token" and "al_tag" name-spaces for the CAPSULE. Thus if:

capsule_linking = (make_capsule_link("tag", 5),
                   make_capsule_link("token", 6),
                   make_capsule_link("al_tag", 7))

there are 5 CAPSULE "tag" names used within the CAPSULE, namely 0, 1, 2, 3 and 4; similarly there are 6 "token" names and 7 "al_tag" names.

4.1.1. External linkages

The context of usage will always determine when and how an integer is to be interpreted as a name in a particular name-space. For example, a TAG in a UNIT is constructed by make_tag applied to a TDFINT which will be interpreted as a name from that UNIT's "tag" name-space. An integer representing a name in the CAPSULE name-space would be found in a LINKEXTERN of the external_linkage parameter of make_capsule.

external_linkage: SLIST(EXTERN_LINK)

Each EXTERN_LINK is itself formed from an SLIST of LINKEXTERNs given by make_extern_link . The order of the EXTERN_LINKs determines which name-space one is dealing with; they are in the same order as given by the extern_linkage parameter. Thus, with the extern_linkage given above, the first EXTERN_LINK would deal with the "tag" name-space; Each of its component LINKEXTERNs constructed by make_linkextern would be identifying a tag number with some name external to the CAPSULE; for example one might be:

make_linkextern(4, string_extern("printf"))

This would mean: identify the CAPSULE's "tag" 4 with an name called "printf", external to the module. The name "printf" would be used to linkage external to the CAPSULE; any name required outside the CAPSULE would have to be linked like this.

4.1.2. UNITs

This name "printf", of course, does not necessarily mean the C procedure in the system library. This depends both on the system context in which the CAPSULE is translated and also the meaning of the CAPSULE "tag" name 4 given by the component UNITs of the CAPSULE in the groups parameter of make_capsule:

groups: SLIST(GROUP)

Each GROUP in the groups SLIST will be formed by sets of UNITs of the same kind. Once again, there are a potentially unlimited number of kinds of UNITs but core-TDF only uses those named "tld","al_tagdefs", "tagdecs", "tagdefs", "tokdecs" and "tokdefs".[b] These names will appear (in the same order as in groups) in the prop_names parameter of make_capsule, one for each kind of UNIT appearing in the CAPSULE:

prop_names: SLIST(TDFIDENT)

Thus if:

prop_names = ("tagdecs", "tagdefs")

then, the first element of groups would contain only "tagdecs" UNITs and and the second would contain only "tagdefs" UNITs. A "tagdecs" UNIT contains things rather like a set of global identifier declarations in C, while a "tagdefs" UNIT is like a set of global definitions of identifiers.

4.1.3. make_unit

Now we come to the construction of UNITs using make_unit, as in the diagram below:

local_vars lks properties UNIT In order given by CAPSULE "capsule_linkage" "tag" "token" "al_tag" 5 6 7 Links between UNIT namespace and CAPSULE namespace tag LINKs token LINKs al_tag LINKs TAGDEF_PROPS BYTESTREAM no_labels tds 3 UNITs of the same kind grouped together in the order given by "prop_names" TAGDEFs of UNIT
Figure 8. make_unit

First we give the limits of the various name-spaces local to the UNIT in the local_vars parameter:

local_vars: SLIST(TDFINT)

Just in the same way as with external_linkage, the numbers in local_vars correspond (in the same order) to the spaces indicated in capsule_linking in section 3.1. With our example,the first element of local_vars gives the number of "tag" names local to the UNIT, the second gives the number of "token" names local to the UNIT etc. These will include all the names used in the body of the UNIT. Each declaration of a TAG, for example, will use a new number from the "tag" name-space; there is no hiding or reuse of names within a UNIT.

4.1.4. LINK

Connections between the CAPSULE name-spaces and the UNIT name-spaces are made by LINKs in the lks parameter of make_unit:

lks: SLIST(LINKS)

Once again, lks is effectively indexed by the kind of name-space a. Each LINKS is an SLIST of LINKs each of which which establish an identity between names in the CAPSULE name-space and names in the UNIT name-space. Thus if the first element of lks contains:

make_link(42, 4)

then, the UNIT "tag" 42 is identical to the CAPSULE "tag" 4.

Note that names from the CAPSULE name-space only arise in two places, LINKs and LINK_EXTERNs. Every other use of names are derived from some UNIT name-space.

4.2. Definitions and declarations

The encoding in the properties:BYTESTREAM parameter of a UNIT is a PROPS, for which there are five constructors corresponding to the kinds of UNITs in core-TDF, make_al_tagdefs, make_tagdecs, make_tagdefs, make_tokdefs and make_tokdecs. Each of these will declare or define names in the appropriate UNIT name-space which can be used by make_link in the UNIT's lks parameter as well as elsewhere in the properties parameter. The distinction between "declarations" and "definitions" is rather similar to C usage; a declaration provides the "type" of a name, while a definition gives its meaning. For tags, the "type" is the SORT SHAPE (see below). For tokens, the "type" is a SORTNAME constructed from the SORTNAMEs of the parameters and result of the TOKEN using token:

params: LIST(SORTNAME)
		result: SORTNAME
		-> SORTNAME

Taking make_tagdefs as a paradigm for PROPS, we have:

no_labels: TDFINT
tds:       SLIST(TAGDEF)
           -> TAGDEF_PROPS

The no_labels parameter introduces the size of yet another name-space local to the PROPS, this time for the LABELs used in the TAGDEFs. Each TAGDEF in tds will define a "tag" name in the UNIT's name-space. The order of these TAGDEFs is immaterial since the initialisations of the tags are values which can be solved at translate time, load time or as unordered dynamic initialisations.

There are three constructors for TAGDEFs, each with slightly different properties. The simplest is make_id_tagdef:

t:         TDFINT
signature: OPTION(STRING)
e:         EXP x
           -> TAGDEF

Here, t is the tag name and the evaluation of e will be the value of SHAPE x of an obtain_tag(t) in an EXP. Note that t is not a variable; the value of obtain_tag(t) will be invariant. The signature parameter gives a STRING (see section 3.2.3) which may be used as an name for the tag, external to TDF and also as a check introduced by the producer that a tagdef and its corresponding tagdec have the same notion of the language-specific type of the tag.

The two other constructors for TAGDEF, make_var_tagdef and common_tagdef both define variable tags and have the same signature:

t:          TDFINT
opt_access: OPTION(ACCESS)
signature:  OPTION(STRING)
e:          EXP x
            -> TAGDEF

Once again t is tag name but now e is initialisation of the variable t. A use of obtain_tag(t) will give a pointer to the variable (of SHAPE POINTER x), rather than its contents.[c] There can only be one make_var_tagdef of a given tag in a program, but there may be more than one common_tagdef, possibly with different initialisations; however these initialisations must overlap consistently just as in common blocks in FORTRAN.

The ACCESS parameter gives various properties required for the tag being defined and is discussed in section 5.3.2.

The initialisation EXPs of TAGDEFs will be evaluated before the "main" program is started. An initialiation EXP must either be a constant (in the sense of section 9) or reduce to (either directly or by token or _cond expansions) to an initial_value:

init: EXP s
      -> EXP s

The translator will arrange that init will be evaluated once only before any procedure application, other than those themselves involved in initial_values, but after any constant initialisations. The order of evaluation of different initial_values is arbitrary.

4.2.1. Scopes and linking

Only names introduced by AL_TAGDEFS, TAGDEFS, TAGDECs, TOKDECs and TOKDEFs can be used in other UNITs (and then, only via the lks parameters of the UNITs involved). You can regard them as being similar to C global declarations. Token definitions include their declarations implicitly; however this is not true of tags. This means that any CAPSULE which uses or defines a tag across UNITs must include a TAGDEC for that tag in its "tagdecs" UNITs. A TAGDEC is constructed using either make_id_tagdec, make_var_tagdec or common_tagdec, all with the same form:

t_intro:   TDFINT
acc:       OPTION(ACCESS)
signature: OPTION(STRING)
x:         SHAPE
           -> TAGDEC

Here the tagname is given by t_intro; the SHAPE x will defined the space and alignment required for the tag (this is analogous to the type in a C declaration). The acc field will define certain properties of the tag not implicit in its SHAPE; I shall return to the kinds of properties envisaged in discussing local declarations in section 5.3.

Most program will appear in the "tagdefs" UNITs - they will include the definitions of the procedures of the program which in turn will include local definitions of tags for the locals of the procedures.

The standard TDF linker allows one to link CAPSULEs together using the name identifications given in the LINKEXTERNs, perhaps hiding some of them in the final CAPSULE. It does this just by generating a new CAPSULE name-space, grouping together component UNITs of the same kind and replacing their lks parameters with values derived from the new CAPSULE name-space without changing the UNITs' name-spaces or their props parameters. The operation of grouping together UNITs is effectively assumed to be associative, commutative and idempotent e.g. if the same tag is declared in two capsules it is assumed to be the same thing . It also means that there is no implied order of evaluation of UNITs or of their component TAGDEFs

Different languages have different conventions for deciding how programs are actually run. For example, C requires the presence of a suitably defined "main" procedure; this is usually enforced by requiring the system ld utility to bind the name "main" along with the definitions of any library values required. Otherwise, the C conventions are met by standard TDF linking. Other languages have more stringent requirements. For example, C++ requires dynamic initialisation of globals, using initial_value. As the only runnable code in TDF is in procedures, C++ would probably require an additional linking phase to construct a "main" procedure which calls the initialisation procedures of each CAPSULE involved if the system linker did not provide suitable C++ linking.

4.2.2. Declaration and definition signatures

The signature arguments of TAGDEFs and TAGDECs are designed to allow a measure of cross-UNIT checking when linking independently compiled CAPSULEs. Suppose that we have a tag, t, used in one CAPSULE and defined in another; the first CAPSULE would have to have a TAGDEC for t whose TAGDEF is in the second. The signature STRING of both could be arranged to represent the language-specific type of t as understood at compilation-time. Clearly, when the CAPSULEs are linked the types must be identical and hence their STRING representation must be the same - a translator will reject any attempt to link definitions and declarations of the same object with different signatures.

Similar considerations apply to TOKDEFs and TOKDECs; the "type" of a TOKEN may not have any familiar analogue in most HLLs, but the principle remains the same.

4.2.3. STRING

The SORT STRING is used in various constructs other than declarations and definitions. It is a first-class SORT with string_apply_token and string_cond. A primitive STRING is constructed from a TDFSTRING(k,n) which is an encoding of n integers,each of k bits, using make_string:

arg: TDFSTRING(k, n)
     -> STRING(k, n)

STRINGs may be concatenated using concat_string:

arg1: STRING(k, n)
arg2: STRING(k,m)
      -> STRING(k, n + m)

Being able to compose strings, including token applications etc, means that late-binding is possible in signature checking in definitions and declarations. This late-binding means that the representation of platform-dependent HLL types need only be fully expanded at install-time and hence the types could be expressed in their representational form on the specific platform.

  1. [b]

    The "tld" UNITs gives usage information for names to aid the linker, tld, to discover which namess have definitions and some usage information. The C producer also optionally constructs "diagnostics" UNITs (to give run-time diagnostic information).

  2. [c]

    There is a similar distinction between tags introduced to be locals of a procedure using identify and variable (see section 5.3.1).