The TDF Linker - Program Overview

1. Introduction
2. Directory Structure
3. Hash Tables

Katherine Flavel, The TenDRA Project (copyeditor)
Steve Folkes, DRA

First published 1995.

Revision History

2010-02-21	kate	Moved out TLD as a standalone tool.
2007-12-02	kate	Converted to DocBook.
2007-12-02	kate	Converted to DocBook.
2007-10-02	kate	Moved out TLD to join ANDFutils.
1998-07-30	DERA	TLD 4.0#7; TenDRA 4.1.2 release.

1. Introduction

This document gives an overview of the linker source code. The first section describes the directory structure. The remaining sections describe how the various parts of the linker work.

2. Directory Structure

The sources to the new version of the TDF linker are spread over several directories. These directories are described below:

.

The top level directory contains the "main.[ce]" files that contain the top level function for the linker.

os-interface

This directory contains the low level interface to the operating system and compiler. All target dependent code should be in this directory. Much of this code is shared with SID, so not all of the functions will be used. Because this is intended to be shared code, it is documented in the header files.

library

This directory contains some generic library routines. It is similar to the "os-interface" directory except that all code in this directory should be target independent. Much of this code is shared with SID, so not all of the functions will be used. Because this is intended to be shared code, it is documented in the header files.

generated

This directory is where the automatically generated files are put. Currently this is just the error messages.

tdf

This is where the majority of the files that do the actual work are.

frontends

This is where the files that implement the frontends to the linker are stored.

This document doesn't describe the behaviour of the content of the "os-interface" and "library" directories, as it is documented in the header files. It also doesn't describe the behaviour of the content of the "generated" directory, as this is described (where necessary) in the files from which it is generated.

3. Hash Tables

This section describes the four types of hash table that are used by the linker. These data structures are fundamental to the linker, and most of the linker's behaviour involves adding things to the hash tables or writing out information stored in them.

The four types of hash table are the unit set table, the shape table, the name table, and the mapping table. Name tables only exist as part of entries in a shape table, and mapping tables only exist as part of entries in a unit set table.

When linking, there is one unit set table and one shape table into which the information read from capsules is stored. There is a second shape table into which the information from the library index is stored.

When building libraries, one unit set table and one shape table are used for reading capsules in the same way as for linking, but one shape table is used for each library that is read (they are only used for checking the validity of the index, and are discarded immediately after the library has been read).

For showing the library content, and extracting capsules from a library, only a single shape table is used to store the library index.

3.1. The Unit Set Table

The first hash table type is the unit set hash table, defined in "tdf/unit-table.[ch]". Each entry in the table (defined in "tdf/unit-entry.[ch]") has the following information:

next: This is a pointer to the next entry in the hash table.
key: This is the unit set name.
order: This is the order number of the unit set. When reading unit set names from a capsule, each unit set must have a higher order number than the preceding unit sets.
head
tail: These are the start and end pointers of the list of units in the unit set that this entry represents.

Each unit in the unit list contains the following information:

next: This is a pointer to the next unit in the list.
map_table: This is the unit mapping table. It contains the unit scope identifier limits (counts) and the unit scope to capsule scope identifier mapping tables for each shape in the capsule that the unit came from.
contents: This is the unit's TDF content.

3.2. The Shape Table

The second main hash table type is the shape table defined in "shape-table.[ch]". Each entry in the table (defined in "shape-entry.[ch]") has the following information:

next: This is a pointer to the next entry in the hash table.
key: This is the shape name.
names: This is a table that contains the external names for the shape.
id_count: This stores the next available capsule scope identifier for the shape.
non_empty: This is set to true during the capsule output routines if the shape is to be output (if it has either unit scope identifiers or capsule scope identifiers).
num_lib_names: This is used to cache the number of identifiers of the shape that are going to be written out to the library index.
head
tail: These store a list of name entries as they are added to the name table "names". As new names are added to the the name table, they are also added to this list. When trying to resolve undefined external names, the names are removed from the front of this list, and definitions are looked up in the libraries. When all such lists are empty, then all possible name resolution has been done.

3.3. Name Tables

The third hash table type is the name table defined in "name-table.[ch]". Each entry in the table (defined in "name-entry.[ch]") has the following information:

next

This is a pointer to the next entry in the hash table.

key

This is the name's name.

type

This is the type of the name. There are five types: "NT_INDIRECT", "NT_INDIRECT_CYCLING", "NT_INDIRECT_DONE", "NT_DIRECT" and "NT_PLACEHOLDER". The first three types are all indirect names (used by the renaming); the last type is also used by the renaming. The direct name type is the real name type.

When renaming information is read, it is entered into each name table of the appropriate shape. Initially, a place holder name is created for the new name, and an indirect name is created for the old name. The indirect name contains a pointer to the place holder name (actually, this may also be an indirect name if it has also been renamed).

Once all of the renamed names have been added to the table, the renamings are checked for cycles, and the indirect entries are set so that they all point to a place holder entry (so that it will not be necessary to follow chains of indirect entries). The other two indirect types are used by this process. After the process has completed, all entries in the table will be of type "NT_INDIRECT_DONE" or "NT_PLACEHOLDER". At this stage, there will be no direct entries.

Direct entries are added when capsules and libraries are read. Place holder entries will automatically turn into direct entries when a direct entry of the same name is added. Attempts to add an entry with the same name as an indirect entry will result in the entry it points to being used instead.

The name table hides all of the renaming: the functions that get entries from the name table follow indirect entries, and ignore place holder entries; the iteration function only iterates across direct entries. In this way, nothing outside of the name table structure needs to worry about renaming.

u.indirect

This contains a pointer to another entry that is to be used in place of this entry. It is only used if the entry is an indirect entry of some form.

u.direct.id

This contains the capsule scope identifier that has been allocated for this name. It is only used if the entry is a direct entry (as is the case for all other "u.direct.X" fields).

u.direct.use

This contains the name's usage information (as read from the linker information unit). In addition, an otherwise unused bit is used to indicate that the name is hidden.

u.direct.definition

This contains a pointer to the capsule object that defined the name. It is used for reporting the original definition for multiply defined names. It is also used to indicate that the name has a definition (it is set to the nil pointer if the name has no definition, or if the definition has been suppressed). It is only used for names not in a library index.

u.direct.lib_definition

This contains a pointer to the library capsule object that defined the name. It is only used for names in a library index, to indicate which capsule should be loaded to define the name. It is set to the nil pointer if the name has no library definition, or if the definition has been suppressed.

u.direct.list_next

This contains a pointer to the next name in the list of names of this shape (the list that the "head" and "tail" fields in the shape entry structure represent).

3.4. Mapping Tables

The last hash table type is the mapping table defined in "map-table.[ch]". Each entry in the table (defined in "map-entry.[ch]") has the following information:

next: This is a pointer to the next entry in the hash table.
key: This is the shape name.
count: This contains the unit scope identifier limit for the shape in the unit that contains the table that the entry is an entry in.
num_links: This contains the number of unit scope to capsule scope identifier mappings for the shape in the unit that contains the table that the entry is an entry in.
links: This contains the unit scope to capsule scope identifier mappings for the shape in the unit that contains the table that the entry is an entry in.