4. Disk reading and writing

  1. 4.1. Disk writing routines
  2. 4.2. Disk reading routines
  3. 4.3. Object printing routines
  4. 4.4. Aliasing
  5. 4.5. Application to calculus

One of the facilities which the homogeneous implementation of the type system described above allows for is the addition of persistence. Persistence in this context means allowing objects to be written to, and read from, disk. Also discussed in this section is the related topic of the object printing routines, which allow a human readable representation of objects of the type system to be output for debugging or other purposes.

The disk reading and writing routines are output into the files read_def.h and write_def.h respectively if the -d command-line option is passed to calculus. The object printing routines are output if the -p option is given, with additional code designed for use with run-time debuggers being added if the -a option is also given.

All of these routines use extra constructs output in the main output files (name.h and union_ops.h), but not normally accessible. The macro name_IO_ROUTINES should be defined in order to make these available for the disk reading and writing routines to use.

4.1. Disk writing routines

The disk writing routines output in write_def.h consist of a function:

static void WRITE_type(t);

for each type t mentioned within the algebra description, which writes an object of that type to disk.

Note that such routines are output not only for the basic types, such as unions and structures, but also for any composite types, such as pointers and lists, which are used in their definition. The type component of the name WRITE_type is derived from basic types t by using the short type name. For composite types it is defined recursively as follows:

LIST(t)->list_type PTR(t)->ptr_type STACK(t)->stack_type VEC(t)->vec_type VEC_PTR(t)->vptr_type

Such functions are defined for identity types and composite types involving identity types by means of macros which define them in terms of the identity definitions. WRITE_type functions for the primitive types should be provided by the user to form a foundation on which all the other functions may be built.

The user may wish to generate WRITE_type (or other disk reading and writing) functions for types other than those mentioned in the algebra definition. This can be done by means of a command-line option of the form -Einput where input is a file containing a list of the extra types required. In the notation used above the syntax for input is given by:

extra:
			type-list?

type-list:
			type;
			type-list type;

The WRITE_type functions are defined recursively in an obvious fashion. The user needs to provide the writing routines for the primitives already mentioned, plus support routines (or macros):

void WRITE_BITS(int, unsigned int);
void WRITE_DIM(name_dim);
void WRITE_ALIAS(unsigned int);

for writing a number of bits to disk, writing a vector dimension and writing an object alias.

Any of the WRITE_type functions may be overridden by the user by defining a macro WRITE_type with the desired effect. Note that the WRITE_type function for an identity can be overridden independently of the function for the identity definition. This provides a method for introducing types which are representationally the same, but which are treated differently by the disk reading and writing routines.

4.2. Disk reading routines

The disk reading routines output in read_def.h are exactly analogous to the disk writing routines. For each type t (except primitives) there is a function or macro:

static t READ_type(void);

which reads an object of that type from disk. The user must provide the READ_type functions for the primitive types, plus support routines:

unsigned int READ_BITS(int);
name_dim READ_DIM(void);
unsigned int READ_ALIAS(void);

for reading a number of bits from disk, reading a vector dimension and reading an object alias. The READ_type functions may be overridden by means of macros as before.

4.3. Object printing routines

The object printing routines output in print_def.h consist of a function or macro:

static void PRINT_type(FILE *, t, char *, int);

for each type t, which prints an object of type t to the given file, using the given object name and indentation value. The user needs to provide basic output routines:

void OUTPUT_type(FILE *, t);

for each primitive type. The PRINT_type functions may be overridden by means of macros as before.

The printing routines are under the control of three variables defined as follows:

static int print_indent_step = 4;
static int print_ptr_depth = 1;
static int print_list_expand = 0;

These determine the indentation to be used in the output, to what depth pointers are to be dereferenced when printing, and whether lists and stacks are to be fully expanded into a sequence of elements or just split into a head and a tail.

One application of these object printing routines is to aid debugging programs written using the calculus tool. The form of the type system implementation means that it is not easy to extract information using run-time debuggers without a detailed knowledge of the structure of this implementation. As a more convenient alternative, if both the -p and -a command-line options are given then calculus will generate functions:

void DEBUG_type(t);

defined in terms of PRINT_type, for printing an object of the given type to the standard output. Many debuggers have problems passing structure arguments, so for structure, vector and vector pointer types DEBUG_type takes the form:

void DEBUG_type(t *);

These debugging routines are only defined conditionally, if the standard macro NDEBUG is not defined.

4.4. Aliasing

An important feature of the disk reading and writing routines, namely aliasing, has been mentioned but not yet described. The problem to be faced is that many of the objects built up using type systems defined using calculus will be cyclic - they will include references to themselves in their own definitions. Aliasing is a mechanism for breaking such cycles by ensuring that only one copy of an object is ever written to disk, or that only one copy is created when reading from disk. This is done by associating a unique number as an alias for the object.

For example, when writing to disk, the first time the object is written the alias definition is set up. Consequently the alias number is written instead of the object it represents. Similarly when reading from disk, an alias may be associated with an object when it is read. When this alias is encountered subsequently it will always refer to this same object.

The objects on which aliasing can be specified are the union fields. A union field may be qualified by one or two hash symbols to signify that objects of that type should be aliased.

The two hash case is used to indicate that the user wishes to gain access to the objects during the aliasing mechanism. In the disk writing case, the object to be written, x say, is split into its components using the appropriate DECONS_union_field construct. Then the user-defined routine, or macro:

ALIAS_union_field(comp, ...., x);

(where comp ranges over all the union components) is called prior to writing the object components to disk.

Similarly in the disk reading case, the object being read, x, is initialised by calling the user-defined routine:

UNALIAS_union_field(x);

prior to reading the object components from disk. Each object component is then read into a local variable, comp. Finally the user-defined routine:

UNIFY_union_field(comp, ...., x);

(where comp ranges over all the union components) is called to assign these values to x before returning.

In the single hash case the object is not processed in this way. It is just written straight to disk, or has its components immediately assigned to it when reading from disk.

Note that aliasing is used, not just in the disk reading and writing routines, but also in the object printing functions. After calling any such function the user should call the routine:

void clear_name_alias(void);

to clear all aliases.

Aliases are implemented by adding an extra field to the objects to be aliased, which contains the alias number, if this has been assigned, or zero, otherwise. A list of all these extra fields is maintained. In addition to the routine clear_name_alias mentioned above, the user should provide support functions and variables:

unsigned int crt_name_alias;
void set_name_alias(name *, unsigned int);
name *find_name_alias(unsigned int);

giving the next alias number to be assigned, and routines for adding an alias to the list of all aliases, and looking up an alias in this list. Example implementations of these routines are given in the calculus program itself.

4.5. Application to calculus

As mentioned above, the calculus program itself is an example of its own application. It therefore contains routines for reading and writing a representation of an algebra to and from disk, and for pretty-printing the contents of an algebra. These may be accessed using the command-line options mentioned above.

If the -w command-line option is specified to calculus then it reads its input file, input, as normal, but writes a disk representation of the input algebra to output, which in this instance is an output file, rather than an output directory. An output file produced in this way can then be specified as an input file to calculus if the -r option is given. Finally the input algebra may be pretty-printed to an output file (or the standard output if no output argument is given) by specifying the -o option.