8. The bit encoding of TDF

  1. 8.1. The Basic Encoding
  2. 8.2. Fundamental encodings
    1. 8.2.1. TDFINT
    2. 8.2.2. TDFBOOL
    3. 8.2.3. TDFSTRING
    4. 8.2.4. TDFIDENT
  3. 8.3. BITSTREAM
    1. 8.3.1. BYTESTREAM
    2. 8.3.2. BYTE_ALIGN
    3. 8.3.3. Extendable integer encoding
  4. 8.4. The TDF encoding
  5. 8.5. File Formats

This is a description of the encoding used for TDF.

Section 8.1 defines the basic level of encoding, in which integers consisting of a specified number of bits are appended to the sequence of bytes. Section 8.2 defines the second level of encoding, in which fundamental kinds of value are encoded in terms of integers of specified numbers of bits. Section 8.4 defines the third level, in which TDF is encoded using the previously defined concepts.

8.1. The Basic Encoding

TDF consists of a sequence of 8-bit bytes used to encode integers of a varying number of bits, from 1 to 32. These integers will be called basic integers.

TDF is encoded into bytes in increasing byte index, and within the byte the most significant end is filled before the least significant. Let the bits within a byte be numbered from 0 to 7, 0 denoting the least significant bit and 7 the most significant. Suppose that the bytes up to n-1 have been filled and that the next free bit in byte n is bit k. Then bits k+1 to 7 are full and bits 0 to k remain to be used. Now an integer of d bits is to be appended.

If d is less than or equal to k, the d bits will occupy bits k-d+1 to k of byte n, and the next free bit will be at bit k-d. Bit 0 of the integer will be at bit k-d+1 of the byte, and bit d-1 of the integer will be at bit k.

If d is equal to k+1, the d bits will occupy bits 0 to k of byte n and the next free bit will be bit 7 of byte n+1. Bit d-1 of the integer will be at bit k of the byte.

If d is greater than k+1, the most significant k+1 bits of the integer will be in byte n, with bit d-1 at bit k of the byte. The remaining d-k-1 least significant bits are then encoded into the bytes, starting at byte n+1, bit 7, using the same algorithm (i.e. recursively).

8.2. Fundamental encodings

This section describes the encoding of TDFINT, TDFBOOL, TDFSTRING, TDFIDENT, BITSTREAM, BYTESTREAM, BYTE_ALIGN and extendable integers.

8.2.1. TDFINT

TDFINT encodes non-negative integers of unbounded size. The encoding uses octal digits encoded in 4-bit basic integers. The most significant octal digit is encoded first, the least significant last. For all digits except the last the 4-bit integer is the value of the octal digit. For the last digit the 4-bit integer is the value of the octal digit plus 8.

8.2.2. TDFBOOL

TDFBOOL encodes a boolean, true or false. The encoding uses a 1-bit basic integer, with 1 encoding true and 0 encoding false.

8.2.3. TDFSTRING

TDFSTRING encodes a sequence containing n non-negative integers, each of k bits. The encoding consists of, first a TDFINT giving the number of bits, second a TDFINT giving the number of integers, which may be zero. Thirdly it contains n k-bit basic integers, giving the sequence of integers required, the first integer being first in this sequence.

8.2.4. TDFIDENT

TDFIDENT also encodes a sequence containing n non-negative integers. These integers will all consist of the same number of bits, which will be a multiple of 8. It is a property of the encoding of the other constructions that TDFIDENTS will start on either bit 7 or bit 3 of a byte and end on bit 7 or bit 3 of a byte. It thus has some alignment properties which are useful to permit fast copying of sections of TDF.

The encoding consists of, first a TDFINT giving the number of bits, second a TDFINT giving the number of integers, which may be zero. If the next free bit is not bit 7 of some byte, it is moved on to bit 7 of the next byte.

Thirdly it contains n k-bit integers.

If the next free bit is not bit 7 of some byte, it is moved on to bit 7 of the next byte.

8.3. BITSTREAM

It can be useful to be able to skip a TDF construction without reading through it. BITSTREAM provides a means of doing this.

A BITSTREAM encoding of X consists of a TDFINT giving the number of bits of encoding which are occupied by the X. Hence to skip over a BITSTREAM while decoding, one should read the TDFINT and then advance the bit index by that number of bits. To read the contents of a BITSTREAM encoding of X, one should read and ignore a TDFINT and then decode an X. There will be no spare bits at the end of the X, so reading can continue directly.

8.3.1. BYTESTREAM

It can be useful to be able to skip a TDF construction without reading through it. BYTESTREAM provides a means of doing this while remaining byte aligned, so facilitating copying the TDF. A BYTESTREAM will always start when the bit position is 3 or 7.

A BYTESTREAM encoding of X starts with a TDFINT giving a number, n. After this, if the current bit position is not bit 7 of some byte, it is moved to bit 7 of the next byte. The next n bytes are an encoding of X. There may be some spare bits left over at the end of X.

Hence to skip over a BYTESTREAM while decoding one should read a TDFINT, n, move to the next byte alignment (if the bit position is not 7) and advance the bit index over n bytes. To read a BYTESTREAM encoding of X one should read a TDFINT, n, and move to the next byte, b (if the bit position is not 7), and then decode an X. Finally the bit position should be moved to n bytes after b.

8.3.2. BYTE_ALIGN

BYTE_ALIGN leaves the bit position alone if it is 7, and otherwise moves to bit 7 of the next byte.

8.3.3. Extendable integer encoding

A d-bit extendable integer encoding enables an integer greater than zero to be encoded given d, a number of bits.

If the integer is between 1 and 2d - 1 inclusive, a d-bit basic integer is encoded.

If the integer, i, is greater than or equal to 2d, a d-bit basic integer encoding of zero is inserted and then i - 2d + 1 is encoded as a d-bit extendable encodin, and so on, recursively.

8.4. The TDF encoding

The descriptions of SORTS and constructors contain encoding information which is interpreted as follows to define the TDF encoding.

  • A TDF CAPSULE is an encoding of the SORT CAPSULE.

  • For each SORT a number of encoding bits, b, is specified. If this is zero, there will only be one construction for the class, and its encoding will consist of the encodings of its components, in the given order.

  • If the number of encoding bits, b, is not zero the SORT is described as extendable or as not extendable. For each construction there is an encoding number given. If the SORT is extendable, this number is output as an extendable integer. If the SORT is described as not extendable, the number is output as a basic integer. This is followed by the encodings of the components of the construction in the order given in the description of the construct.

  • For the classes which are named SLIST(x) - e.g. SLIST(UNIT) - the encoding consists of a TDFINT, n, followed by n encodings of x.

  • For the classes which are named LIST(x) - e.g. LIST(EXP) - the encoding consists of a 1-bit integer which will be 0, followed by an SLIST(x). The 1-bit integer is to allow for extensions to other representations of LISTs.

  • For the classes which are named OPTION( x) the encoding consists of a 1-bit basic integer. If this is zero, the option is absent and there is no more encoding. If the integer is 1, the option is present and an encoding of x follows.

  • BITSTREAMS occur in only two kinds of place. One is the constructions with the form x_cond, which are the install-time conditionals. For each of these the class encoded in the BITSTREAM is the same as the class which is the result of the x_cond construction. The other kind of place is as the token_args component of a construction with the form x_apply_token. This component always gives the parameters of the TOKEN. It can only be decoded if there is a token definition or a token declaration for the particular token being applied, i.e. for the token_value component of the construction. In this case the SORTS and hence the classes of the actual token arguments are given by the declaration or definition, and encodings of these classes are placed in sequence after the number of bits. If the declaration or definition are not available, the BITSTREAM can only be skipped.

  • BYTESTREAM X occurs in only one place, the encoding of the SORT UNIT. The SORT X is determined by the UNIT identification which is given for each of the relevant SORTS.

  • The tld UNIT is encoded specially. It is always the first UNIT in a Capsule and is used to pass information to the TDF linker. The first entry in a tld UNIT is a TDFINT giving the format of the remainder of the UNIT. Currently, the linker supports formats 0 and 1, but others may be added to give greater functionality while retaining compatibility.

    With format 0, the remainder of UNIT is identical to a now obsolete tld2 UNIT. With format 1, the remainder of the UNIT is as follows: If n is the number of EXTERN_LINK s in the external_linkage argument of make_capsule, the remainder consists of n sequences. These sequences are in the order given by external_linkage. Each element of a sequence consist of one TDFINT per linkable entity in the corresponding el of the make_extern_link in the same order. These integers will describe properties of the correponding external links. (In format 0, there are only two sequences, the first describing the token external links and the second describing the tag external links.)

    BitMeaning
    Bit 01 means “used in this capsule”
    0 means “not used in this capsule”
    Bit 11 means “declared in this capsule”
    0 means “not declared in this capsule”
    Bit 21 means “defined in this capsule”
    0 means “not defined in this capsule”
    Bit 31 means “may be multiply defined”
    0 means “is uniquely defined”

8.5. File Formats

There may be various kinds of files which contain TDF bitstream information. Each will start with a 4-byte “magic-number” identifying the kind of file followed by two TDFINTs giving the major and minor version numbers of the TDF involved.

A CAPSULE file will have a magic-number “TDFC”. The encoding of the CAPSULE will be byte-aligned following the version numbers.

A TDF library file will have a magic-number “TDFL”. These files are constructed by the TDF linker.

A TDF archive file will have a magic-number “TDFA”.

Other file formats introduced should follow a similar pattern.

The TDF linker will refuse to link TDF files with different major version numbers. The resulting minor version number is the maximum of component minor version numbers.