Validation of TenDRA Capability to Implement a Set of Commands for the Linux Operating System

i. Executive Summary
1. Objectives and Description
1. 1.1. Objectives
2. 1.2. General description
2. Description of Project phases
3. Project environment
4. Descriptions and Results of Project phases
5. Statistics
1. 5.1. Statistics for Linux APIs
2. 5.2. Statistics concerning changes in the original source code
6. Conclusion
A. Errors in the TenDRA compiler

François de Ferrière, Open Software Foundation Research Institute
Fred Roy, Open Software Foundation Research Institute
Katherine Flavel, The TenDRA Project (copyeditor)
Jeroen Ruigrok van der Werven, The TenDRA Project (copyeditor)

First published 1996.

Abstract

This report describes work done under contract to the Defence Research Agency (DRA) of the U.K. It is an extension of an earlier contract to assess the capability of the DRA TenDRA technology to express a fully portable operating system implementation.

Revision History

1998-07-30	DERA	TenDRA 4.1.2 release.
1996-07-30	OSFRI	Initial revision.

i. Executive Summary

During a first phase of the project the main goal was to examine the extent to which the TenDRA technology could be used to compile a complete operating sytem. This experiment was carried out on the Unixware operating system on an Intel 486 platform. Although Unix sources are known to be compiler dependent, it was found that most of the code could be compiled with no, or minor, modifications. Details of the results can be found in a report.

The goal of this second phase of the project was to study the feasibility of producing Unix commands in architecture neutral (ANDF) format which could be readily ported to different hardware platforms running the same operating system. The target system chosen for this second phase was Linux, which was available on both the Intel and DEC Alpha platforms (although the latter was incomplete at the beginning of the project).

The experiment carried out during this second phase was very successful. A significant amount of complex code was converted to the architecture neutral ANDF format, and the portability of this code was demonstrated. However, due to time constraints, the number of commands ported to both platforms was more limited than had been hoped. The project also provided some interesting lessons about the strengths and limitations of the ANDF/TenDRA technology and about API issues. This is the final report on work undertaken on the Linux operating system on the Intel/i386 and Digital/Alpha platforms.

1.1. Objectives

In previous work, we performed validation, performance and robustness testing of the TenDRA technology to ensure its capability to implement and fully bootstrap a UNIX-like operating system. We also provided an assessment of the capability of TenDRA technology to express a fully portable operating system implementation. This work was very successful, and the results are reported in a summary report.

However, though we originally planned to conduct this first experiment on two different architectures, an Intel/486 and a Sun/Sparc platforms running UnixWare, it was completed only for the Intel/486 platform. After discussions with DRA, it was then decided to focus the second part of the project on the Unix commands, and to switch from the UnixWare to the Linux operating system. The motivations for a revised plan were:

OSF is developing a Linux server for the Intel/486 and PowerPC platforms, and we would like to deliver the set of associated commands in ANDF format. In the event, native Linux commands for PowerPC became available in the meantime.
Repeating on the Sparc platform the work already done on the i486 platform would bring little added-value, while requiring a significant amount of work. The major benefit of the work on a second platform would be to demonstrate that a set of commands along with its API can be defined in ANDF format and then installed on two different platforms.

Thus, the objective of the second part of the project is the production in ANDF format of Linux commands, and their installation on two platforms. This will demonstrate the ability of the TenDRA technology to produce a set of architecture neutral commands and, at completion of the project, will provide a set of freely distributable commands in ANDF format.

The project, which lasted 9 months, started on July 1995, and was finished at the end of March 1996.

This report summarizes all the work done under the contract for this second part of the project.

1.2. General description

The commands, one part of a Unix system, are based on some standard APIs, XPG3 for example, plus some extensions, which together form the interface shared with the libraries against which they are built. The commands should not have any assembly code, unlike the other parts of a Unix system.

As for other software OSF already ported to ANDF, the port of the Linux commands is done in three steps:

The NAT-NAT step, which consists in rebuilding the commands with the native compilation chain, to ensure that they can be regenerated from their source files.
The DRA-NAT step, for which the TenDRA technology is used as a replacement of the native compilation chain to build the commands, using the native system header files, as for a classical compilation chain. This part involves dealing with discrepancies between the native and the TenDRA code generators.
The DRA-DRA step, which will consist in using the TenDRA technology as a portability tool. The API shared by the commands and libraries is defined, and used to produce the commands in architecture independent ANDF code. This code will be installed and validated on the selected machines.

We initially planned to conduct the experiment on the Intel/i386 and IBM/PowerPc platforms, both running the Linux operating system. However, the Linux system for the IBM/PowerPC platform was still under development at the time we needed it, in December 1995. So we decided to replace it by a Digital/Alpha platform, the only other platform for which a Linux port was sufficiently advanced at that time. However, it is a 64-bit platform, so this switch was more of a challenge, because of the fundamental change in data sizes. It provided additional tests of the TenDRA portability attributes.

2. Description of Project phases

2.1. Phase 1: level 1 commands on Intel/i386
2.2. Phase 2: level 1 commands on Digital/Alpha
2.3. Phase 3: level 2 commands on Intel/i386 and Digital/Alpha

In this section we specify the tasks which have been performed under this project.

The set of commands which were ported to ANDF was split into two subsets:

The level 1 subset corresponds to the level achieved with UnixWare at the end of the first project (about 100 commands out of 600). These commands conform to a standard interface (Posix or XPG) with some simple extensions. They do not use extensions which are difficult to port from one system to another. This set includes about 150 Linux commands.
The level 2 subset corresponds to the maximum which can be reasonably achieved for a given system. While we could expect about twice as many commands as for level 1, we only compiled about 75 more commands.

2.1. Phase 1: level 1 commands on Intel/i386

The objective is to produce a “level 1” set of Linux commands in ANDF format for the Intel platform. This requires the production of the associated API, also in ANDF format (token library).

The major tasks for this phase are:

2.1.1. T1. Linux installation.

Install the Linux system.
Install a compilation environment.
Install the Linux source code.

Prerequisite: Linux system for Intel.

Delivery: System running.

2.1.2. T2. TenDRA installation.

Install the TenDRA technology for Linux.

Prerequisite: TenDRA technology & T1.

Delivery: TenDRA installed.

2.1.3. T3. Level 1 commands port.

Define the level 1 set of Linux commands.
Compile the level 1 commands in NAT-NAT mode.
Compile the level 1 commands in DRA-NAT mode.

Prerequisite: Linux source code, T1 & T2.

Delivery: Level 1 commands compiled with TenDRA in native mode.

2.1.4. T4. Level 1 commands API definition.

Define the non-explicit API used by this set of commands. Machine dependent code issues will be addressed specifically.

Prerequisite: Linux source code.

Delivery: Set of ANDF header files for this level 1 API.

2.1.5. T5. ANDFization of Level 1 commands.

Produce the level 1 commands with the TenDRA technology, using the ANDF definition of the API defined in the previous task.

Prerequisite: Linux source code, T2 & T4.

Delivery: Level 1 commands in ANDF format.

2.1.6. T6. Level 1 commands API installation.

Build the token library for the level 1 commands API.

Prerequisite: T2 & T4.

Delivery: Token library for this level 1 API.

2.1.7. T7. Level 1 commands installation and validation.

Install the commands in ANDF format produced in task T5.
Validate the commands using adhoc tests.
Write a report that describes the results obtained and the problems encountered.

Prerequisite: T2, T5 & T6.

Delivery: Report on level 1 commands on Intel.

2.2. Phase 2: level 1 commands on Digital/Alpha

The objective is to validate that the commands produced during the first phase can be easily ported to the Alpha platform.

The major tasks, during this phase, are:

2.2.1. T8. Linux installation.

Install the Linux system.
Install a compilation environment, including header files and libraries.

Prerequisite: Linux system for Alpha.

Delivery: System running.

2.2.2. T9. TenDRA installation.

Adapt the TenDRA technology for Linux on the Alpha.
Install the TenDRA technology.

Prerequisite: TenDRA technology & T8.

Delivery: TenDRA installed.

2.2.3. T10. Level 1 commands API installation.

Build the token library for the level 1 commands API.

Prerequisite: T4 & T9.

Delivery: Token library for this level 1 API.

2.2.4. T11. Level 1 commands installation and validation.

NAT-NAT and DRA-NAT check.
Install the commands in ANDF format produced in task T5.
Validate the commands using adhoc tests.
Write a report that describes the results obtained and the problems encountered.

Prerequisite: T5, T9 & T10.

Delivery: Report on level 1 commands on Alpha.

2.3. Phase 3: level 2 commands on Intel/i386 and Digital/Alpha

The objective during this phase is to validate further the ANDF tools by trying to extend the ANDF commands to a broader set that will include some “difficult” cases.

The major tasks are:

2.3.1. T12. Level 2 commands port.

Define the level 2 set of Linux commands, by extension of the level 1 set.
Compile the level 2 commands in NAT-NAT mode on Intel.
Compile the level 2 commands in NAT-NAT mode on Alpha.
Compile the level 2 commands in DRA-NAT mode on Intel.
Compile the level 2 commands in DRA-NAT mode on Alpha.

Prerequisite: Linux source code, T1, T2, T8 & T9.

Delivery: Level 2 commands compiled with TenDRA in native mode.

2.3.2. T13. Level 2 commands API definition.

Extend the level 1 API to include the interfaces used by the level 2 commands.

Prerequisite: Linux source code & T4.

Delivery: Set of ANDF header files for this level 2 API.

2.3.3. T14. Level 2 commands ANDFization.

Produce the level 2 commands with the TenDRA technology, using the ANDF definition of the API defined in the previous task.

Prerequisite: Linux source code, T2, T9 & T13.

Delivery: Level 2 commands in ANDF format.

2.3.4. T15. Level 2 commands API installation.

Build the token library for the level 2 commands API on Intel.
Build the token library for the level 2 commands API on Alpha.

Prerequisite: T2, T9 & T13.

Delivery: Token library for this level 2 API.

2.3.5. T16. Level 2 commands installation and validation.

Install the commands in ANDF format produced in task T14 on Intel.
Install the commands in ANDF format produced in task T14 on Alpha.
Validate the commands using adhoc tests.
Write an intermediate report and a final report.

Prerequisite: T2, T9, T14 & T15.

Delivery: Intermediate report on level 2 commands, and final report.

3. Project environment

3.1. LINUX operating system
3.2. Hardware platforms and environment
3.3. TenDRA technology

3.1. LINUX operating system

Linux is a free Unix-like operating system, first developed by Linus Torvalds on an Intel platform; “official” releases exist since October 1991.

It has now been ported to Digital/Alpha and ports to other machines, including PowerPC andPowerMAC, are under way.

3.1.1. Linux on Intel/i386

There are many distributions of Linux for the Intel platform; we installed the Slackware distribution, based on the Linux 1.1 version.

We downloaded it from the ftp site: sunsite.unc.edu:/pub/Linux/distributions/slackware

The version we installed was Linux 1.1.59, available since October 1994. Since then, newer versions have been released, but we stuck to this version throughout the project since it worked well, and because all the packages it included were easily available in source form (see below).

3.1.2. Source code for Linux commands used by the project

We downloaded the source code of the Linux commands from the ftp site: sunsite.unc.edu:/pub/Linux/distributions/slackware/source.

This means that the versions of commands available on our development machine for Linux/i386 were matching the source code we used as base for the project, except in a few cases for which the source code had been revised.

3.1.3. Linux on Digital/Alpha

For the Alpha platform, Linux is available from the BLADE distribution, and more recently from the Red Hat distribution.

A 32-bit version of Linux/Alpha was first released in January 1995; then a 64-bit version was available in November 1995, which included most of the capabilities provided by the Linux/Intel system. We downloaded the BLADE_0.3 release, consisting of more than thirty floppy images, from the ftp site: ftp.digital.com:/pub/DEC/Linux-Alpha

Since December 1995, another Linux/Alpha distribution has become available from the RedHat company; it is built from the same components, but newer versions, as the BLADE release.

An interesting feature of the current Linux/Alpha ports, is that they provide rather extensive binary compatibility with Digital Unix. This compatibility has been used to cross-build on Digital Unix for Linux/Alpha, and also for a few features which were not available in the Linux/Alpha BLADE release.

3.2. Hardware platforms and environment

The Intel platform was an Intel/i486 PC machine, with most disk space available through NFS.

The Digital/Alpha platform was built specifically for this project, around a Digital AXPpci 33 motherboard. In fact, at the time we set-up the machine, the Linux/Alpha ports were only running on a few Alpha-based machines.

A Linux kernel had to be rebuilt for this machine in order to add support for the 3COM Ethernet board we used, and for the NFS-client capability. Most disk space was thus available through an NFS file system, shared with the Intel platform.

3.3. TenDRA technology

We started the project on an Intel platform with a snapshot of the TenDRA technology from April 1995, which included support for the Linux/Intel platform. This snapshot was based on the ANDF 3.1 specification.

We switched to the November 1995 TenDRA snapshot, based on ANDF 4.0, when we setup the second platform, in order to use the tools for the Digital Unix/Alpha platform. Because of the high degree of compatibility between Digital Unix and Linux on Alpha, we could use the TenDRA technology on a DigitalUnix/Alpha platform to cross-build executables for the Linux/Alpha platform.

The ANDF 4.0 intermediate file format is not upward compatible with the ANDF 3.1 one, which required that we rebuild the intermediate ANDF files for the Linux commands we had already built.

ANDF 4.0 contains increased capability, though not required by this project, and forms the basis for the X/Open preliminary specification XANDF.

4. Descriptions and Results of Project phases

4.1. Linux installation
1. 4.1.1. Linux/i386 installation (including the source code for commands)
2. 4.1.2. Linux/Alpha installation
4.2. TenDRA installation
1. 4.2.1. TenDRA installation on Intel/i386
2. 4.2.2. TenDRA installation on DEC/Alpha
4.3. Build environment with TenDRA
4.4. Definition of the API for the commands
4.5. ANDFization of the commands
4.6. Installation of the API for the commands
4.7. Installation and validation of the commands
1. 4.7.1. Miscellaneous problems encountered at validation
2. 4.7.2. Recent upgrades of our original source code for Linux commands

In the next paragraphs, we describe the way we accomplished the various tasks of the project and we summarize their results.

4.1. Linux installation

At the beginning of the project, we installed the Linux operating system release 1.1 on an Intel/i386 machine. In December 1995, after a few months of work on the Intel/i386 platform, we installed Linux on the second platform for the project, which is a Dec/Alpha. Linux was first released on this platform at the beginning of 1995.

4.1.1. Linux/i386 installation (including the source code for commands)

A machine with Linux 1.1.59 from the Slackware distribution, including the native compilation chain and libraries from GNU, was setup for the project.

The Linux system is available on several anonymous ftp sites. The one we used was at sunsite.unc.edu, where a distribution of the sources and binaries of the Intel/Linux commands from Slackware was available under the /pub/Linux/distributions/slackware directory. Note that the current Slackware distribution at the time of writing of this report is based on Linux 3.0.

In the Slackware Linux distribution for Intel/ix86, the delivery of the source code for commands is split into a large number of packages. The contents of each source package must be compiled and installed individually. For example, the awk(1) command, actually gawk(1), belongs to the bin package which contains 56 commands, while the bc(1) command belongs to the bc package which contains only this one command. Consequently, we did not download the whole set of sources for the Linux commands, but selected a few packages containing the sources of the commands we intended to build first. We also had a look at the Caldera Linux source distribution, and it appeared to be organized in the same way.

A Slackware Linux package for source code distribution is made of a compressed tar files (usually only one), optional patch file(s), and a shell script. The execution of this shell script installs the source files from the tar file(s), applies patches if necessary, optionally performs a self-configuration step, runs the makefile(s) for the compilation, and finally generates a binary package which holds the resulting executables. This procedure has been adapted to fit the NAT-NAT, DRA-NAT and DRA-DRA development steps on two platforms, as described in the section Setting up the build environment.

Note that each package has a private version number. Thus packages can be maintained and released independently. Moreover, some packages (e.g. the bin package) are a collection of several “subpackages”, each of which has its own version number.

4.1.2. Linux/Alpha installation

The Linux Operating System port to the Digital Alpha architecture started two years ago. The first user-installable distribution was available in January 95, from the BLADE distribution, and was a 32-bit version. Then came a 64-bit version which was made compatible with Digital Unix with respect to basic C language types:

Type	Size
`int`	32-bit
`long`	64-bit
pointer	64-bit

While it is still under development, Linux/Alpha is now robust and includes most of the capabilities provided by the Linux/Intel system.

The BLADE distribution was the first available distribution for Linux/Alpha. For the project, we retrieved the November 95 BLADE_0.3 release, based on the Linux 1.3 development kernel, at the following site:

ftp.digital.com:/pub/DEC/Linux-Alpha

This release consists of more than thirty 3.5'' floppy images (not including X-Window). The source code for the commands is not a part of this distribution. Since then, several new versions for the boot firmware, kernel, compiler and libraries have been released, but, as we encountered minor problems only with BLADE_0.3, we did not upgrade our system.

Since December 95, another distribution of Linux for Dec/Alpha is also available from the RedHat company; the current version is:

ftp.redhat.com:/pub/redhat/redhat-2.1/axp

This distribution includes all the source packages for the components it is made from, along with some fixes and additions, in both binary and source forms. It is possible to unpack a RedHat Linux/Alpha 2.1 set while not running the RedHat Linux, but, as a proprietary packaging format is used, one should install the packaging tools (rpm(1)) first.

At the time we setup the machine, Linux/Alpha was operational only on a few variants of Digital Alpha-based systems. So, we selected an entry level and rather inexpensive board, the Digital AXPpci 33 Alpha PC motherboard, around which we built a machine. Our Linux/Alpha system currently comprises the following:

an 8-slot enclosure with 200W power supply and fan
Digital AXPpci 33 motherboard, Windows NT (ARC) firmware, PS/2 style keyboard interface, 233 Mhz Alpha processor
2x16MB, 36-bit, 70ns SIMMs
256 KB, 20ns cache [optional part]
a Number 9 GXE VGA display adapter (ISA)
a dumb VGA display
a PS/2 style keyboard
a 3.5''/1440K floppy disk drive
a SCSI-2 hard disk (a DECpc 2.0GB disk from Digital)
a 3COM Ethernet Link-II (aka 3c503) controller (ISA).

We installed the BLADE_0.3 distribution on our machine, including the C compilation chain and libraries. In order to use the 3COM Ethernet board, we had to rebuild the kernel. We used almost all of the default kernel build parameters, except for the Ethernet adapter, for the settings for the TGA graphics support (switched to “no”) and for the NFS-client feature (selected). Note that a kernel rebuild takes more than half an hour on our system.

A very interesting feature of the current releases of Linux/Alpha is that they provide an almost perfect binary compatibility with Digital Unix. This was of great help to us, as will be described later.

Among the various updates to the Linux/Alpha boot loader, kernel, C compiler and libraries, commands, ..., which have been made by the Linux-Alpha development teams, we have used only a few:

upgrade of the sed command: some sed scripts used for modifying the system headers when building the APIs with TenDRA caused the original sed command to abort.
upgrade of a few system headers, extracted from the azstarnet inc-and-libs-0.38.tar.gz file.

These two updates were downloaded from the site

ftp.azstartnet.com:/pub/linux/axp

We encountered a few problems with the BLADE_0.3 release on the AXPpci Alpha board:

The floppy disk driver sometimes entered a time-out, as indicated by a console message.
Some shell scripts failed until a #!/bin/sh line, or equivalent, was inserted. According to a member of the Linux/Alpha development team, it is caused by the kernel command loader, and was fixed in new kernel releases. We worked-around this problem by patching a number of shell script files included with various source packages we were building on Linux/Alpha: we realized too late that it would have been preferable to upgrade the kernel.
Linux/Alpha failed to mount an NFS file system served by a HP-UX release 8 machine. Fortunately, this problem disappeared when using a server running Solaris or HP-UX release 9: we had to move our development tree to such a host.
As mentioned above, the sets of source files were shared through NFS between a Linux/Intel, a Digital Unix and a Linux/Alpha platform. Occasionally, Linux/Alpha lost access to a file that had been updated recently by another NFS client: the error message “Stale NFS file handle” was displayed. Unmounting/remounting the NFS file system usually cured such problems.
During kernel rebuilds, the compilation of at least one file failed because of lack of memory: in the makefiles for kernel rebuild, the gcc compiler is called with the -pipe option, which speeds up the build but is not safe when compiling large source files. We wrote a small shell script which redoes a compilation without the -pipe option. (This problem was fixed by a subsequent kernel release: tcpip.c was split into several parts...)

4.2. TenDRA installation

We first installed the TenDRA technology on the Linux/i386 platform, from the April 1995 snapshot. Later, we installed the November 1995 release, the first to include support for the Dec/Alpha machine, in order to start work on the Linux/Alpha platform. However, because this snapshot was not upward compatible with the previous TenDRA release, we also had to install it on the Linux/i386 platform. We did not upgrade to the February 1996 snapshot, though it is compatible with the previous one. We only used it in a few cases when we had a bug in a command and wanted to make sure that it was not due to a problem already fixed in TenDRA.

4.2.1. TenDRA installation on Intel/i386

The TenDRA snapshot from April 1995, based on TDF 3.1, included support for the Linux/i386 platform. So, the installation on our machine was straightforward. We only had to recompile the tcc driver, and to modify some environment and startup files to fine tune the level of checking.

When we started to work on the second platform, we had to install the November 1995 snapshot of the TenDRA technology (see below). This was DRA's first snapshot based on TDF 4.0, and it included significant changes to the installation procedure. We had some difficulties to install this snapshot, due to a few bugs in the new installation procedures, but once installed the technology appeared to work well.

4.2.2. TenDRA installation on DEC/Alpha

The TenDRA snapshot from November 1995 was the first snapshot with support for the Dec/Alpha platform, but was not upward compatible with the one installed on the Intel platform (TenDRA 4.0 versus TenDRA 3.1). So, the TenDRA snapshot from November 1995 was installed on both the Intel/i386 and the Dec/Alpha platforms.

DRA provide support for the DigitalUnix/Alpha platform, not for the Linux/Alpha one. However, we benefited from the compatibility between Digital Unix and Linux to solve this problem. We made three different installations of the TenDRA technology for Alpha, among which the 2nd was fully operational for Linux/Alpha:

First installation on native Digital Unix/alpha.
Second installation, still on Digital Unix, but for cross-development for Linux/Alpha (termed “lin_alpha_cross”).
Third installation on Linux/Alpha (termed “lin_alpha”).

The first installation was straightforward and worked very well. The main purpose was to ensure that the TenDRA technology worked correctly on Dec/Alpha.

For the second installation, we created a new target platform termed “lin_alpha_cross”. Most of the “lin_alpha_cross” files are shared with Digital Unix, using symbolic links to directories or files, since only a few files differ between the two targets. The main purpose of these changes was to use Linux/Alpha system header files when compiling, instead of the Digital Unix system header files, and Linux/Alpha libraries and startup files when link-editing. For example, we changed three files inside the <target_platform>/private/env directory named default, system and tcc_diag. For the same reason, we created a specific “lin_alpha_cross” subdirectory in lib/system to hold some replacement system header files when cross-compiling with the -Ysystem option (i.e. in DRA-NAT mode). The target dependent directories and files used when (cross-)building APIs for Linux/Alpha, e.g. located under the src/apis/libs directory, were also made specific.

Using this installation, we could cross-compile and cross-link on Digital Unix for Linux, without any problem. The binary compatibility between Linux/Alpha and Digital Unix was thus a key factor of success.

The third TenDRA installation for Linux/Alpha was readily derived from the previous one. We benefited again from the binary compatibility with Digital Unix: we ran the TenDRA compilation chain, built for Digital Unix, on top of Linux/Alpha, without the necessity to port it or recompile it. However, to do so, we had to copy and install the shared library tools of Digital Unix on Linux/Alpha because TenDRA uses shared libraries, for which there is currently no support in Linux/Alpha. The Digital Unix shared libraries mechanism works fine under Linux! This trick could have be avoided if we had re-linked the TenDRA tools under Digital Unix using its statically-linked libraries.

In order to use the Linux/Alpha native assembler and link-editor instead of those from Digital Unix, we wrote a front-end shell script to the TenDRA installer (trans). This shell script calls the actual trans tool with an option to output a source assembly file instead of the “binary assembly” files used by the Digital Unix as1 tool. Similarly, we wrote a front-end shell script which emulates the call made by tcc to as1 by a call to the Linux as tool. We give below the changes to the settings in <lin_alpha>/private/env/default for the third installation:

+TRANS "/..../linux/1.3.45/alpha/private/bin/trans.sh"
+AS1   "/..../linux/1.3.45/alpha/private/bin/as1.sh"
+AS    "/usr/bin/as -nocpp" # seems unused
+LD    "/usr/bin/ld -G8 -O1"

However, despite these modifications, the port of the TenDRA installer to Linux/Alpha could not be completed. In some cases, the TenDRA installer appeared to generate assembly instructions that are not recognized by the Linux/Alpha assembler. For example, the following lines could not be assembled properly by Linux/Alpha:

.extern __ctype_ 8 Error: Rest of line ignored. First ignored character is '8' stq $fp, 8($sp) Warning: Illegal operands bis $17,$17,$fp Warning: Illegal operands .frame $fp, 360, $26, 0 Error: bad absolute expression; zero assumed

We now understand it is not surprising that we were unable to use the TenDRA installer for Digital Unix/Alpha on Linux/Alpha. ANDF installer output needs to be tailored for different target operating systems according to the assembler and/or link editor interfaces supported by the target operating system. In our case we attempted to use the assembler interface, and the errors and warnings above are examples where this interface differs between Digital Unix/Alpha and Linux/Alpha. Debugger support and even some details of the procedure calling conventions may also need to be taken into account when tailoring an ANDF installer to a different operating system.

Since we had already set up a cross-development TenDRA environment for Linux/Alpha, hosted by Digital Unix, we continued to use it and discontinued use of the “lin_alpha” installation. We actually installed TenDRA on an NFS file server (used by the Linux/ i386, the Digital Unix and the Linux/Alpha platforms).

4.3. Build environment with TenDRA

4.3.1. Definition of the set of Linux commands

The definition of the set of commands has not been done once for all. It has been done on the Intel/i386 platform, during the first part of the project, in two steps, level1 and level2, as described in §2. Each step has been performed as an incremental process. Each time new commands were selected, a whole cycle of API definition (§4.4), command ANDFization (§4.5), API installation (§4.6) and command installation and validation (§4.7) was performed.

During the first phase of the project, a number of commands have been compiled in DRA-DRA mode on the Unixware platform (see Validation of TenDRA Capability to Implement a UNIX-like Operating System). We started by locating these commands in the packages from the Slackware Linux source distribution for Intel/ix86. Since these commands were among the simplest ones to ANDFize on Unixware, they were good candidates to start with. Provided that a full binary installation was made on our Linux/i386 platform, a command could be located in a package by searching for its name in the list of packages under the /var/adm/packages directory. This directory contains one text file per installed package, which records the names of the commands it contains (actually the relative installation path from / is provided for each command). When the name of the package which holds a command has been found, we just had to connect to the ftp site, find the directory of the same name in the Slackware source distribution, and download the files under this directory.

Among the 103 Unixware commands ANDfized during the previous phase of this project, we found 59 commands with similar name in Linux, scattered among 11 packages: bc, bin, bsdgames, diff, find, grep, gzip, sh_utils, txtutils and util. Moreover, these packages also contained some additional commands which appeared to be good candidates for easy ANDFization. However, we excluded a few of them which were compiled but not delivered in the binary package, or which seemed too dependent on the target platform (e.g. the fdformat command which formats floppy disks). We also selected four additional packages, tar, cpio, xlock and xgames, in order to complete the level 1 set of commands.

The definition of the level 2 set of commands was more difficult than for level 1, because we had to reject a number of commands, for various reasons discussed below.

First, we tried to include in the level 2 set of commands more commands from X11, as we successfully experimented the ANDFization of a few of them for level 1. However it appeared that this was not so easy: the sources for these commands had not been packaged by Slackware, but were provided inside a huge collection of sources named Xfree86 (from the Xfree86 Project, Inc.), itself derived from the X Consortium X11R6 code. An attempt to perform the first step of the build of Xfree86 on Linux/i386, which consisted in producing Makefiles from Imakefiles, failed. With some rewriting, we managed to produce a Makefile for a simple Xfree86 command, xclock, and successfully compiled it. However, we did not spend much time on understanding the installation procedures of Xfree86, and, since it would have taken us too much time per command to rewrite every Makefile, we set aside Xfree86.

Then we excluded one package, groff, because it was mostly coded in C++. We found also that some native Linux header files, used in several packages, offer BSD compatibility but in a way that could not be straightforwardly adapted to TenDRA. This issue is discussed in §4.4.

We have included in the level 2 set of commands some commands which represent quite large amounts of source code: m4, elvis (a vi clone), joe (another editor), less, perl and elm. The ultimate step of this experiment would have been to build “monsters” such as bash and emacs.

Finally, we evaluated the number of commands distributed with a Linux system, and we found about 700 executable binary files in the /usr/bin, /bin, /usr/X11/bin, /usr/openwin/bin, /usr/games, /sbin and /usr/sbin directories. We examined some of these commands in the Slackware packages, and concluded that there could be candidates for ANDFization. But we were limited by time constraints to include such commands. Also, we did not port to the 2nd platform, nor validate, all the commands operational on the 1st platform. Here again, time is the main reason why we did not complete the port. We estimate that an additional 1.5 engineer-month would have been sufficient to complete the task, apart from some commands which may have been difficult to port.

The level1 and level2 set of commands include 236 commands: these commands were installed and validated on Linux/i386 (cf. section 4.7 on this point).

In the list below, the commands in bold (149) have been installed and validated on both Linux/i386 and Linux/Alpha, as opposed to the commands listed in plain characters, which are available on the first platform only.

The few (13) commands listed in italic were ported to both platforms, but their validation failed, or was not completed, on the second platform.

Finally, in the “p/m” statements, p is the best-case number of commands we ported, while m is the maximum number of them with respect to a given Slackware Linux package.

Package	Total	Commands ported
aaa_base	9/9	fromdos, funzip, mtools, todos, unzip, unzipsfx, zip, zipnote, zipsplit.
ash	1/1	ash.
bc	1/1	bc.
bin	48/56	at, bban, bpe, chgrp, chmod, chown, compress, cp, crond, crontab, ctags, dd, df, dircolors, du, ed, elvis, elvprsv, elvrec, file, fiz, fmt, `gawk`, ginstall, indent, ln, ls, man, mkdir, mkfifo, mknod, mv, patch, ref, rm, rmdir, sed, shar, sysvbanner, time, touch, tput, unarj, unshar, uudecode, uuencode, which, zoo.
bsdgames	13/36	bcd, caesar, factor, fish, monop, morse, number, paranoia, ppt, primes, rain, worm, worms.
byacc	1/1	byacc.
cpio	2/2	`cpio, mt-GNU`.
diff	4/4	cmp, diff, diff3, sdiff.
elm	9/9	`answer`, elm, `elmalias`, fastmail, filter, frm, `newalias, newmail`, readmsg.
find	6/6	bigram, code, find, frcode, locate, xargs.
flex	1/1	flex.
getty	2/2	getty, uugetty.
grep	1/1	grep.
gzip	1/1	gzip.
ispell	6/6	buildash, icombine, ijoin, ispell, sq, unsq.
joe	2/2	joe, termidx.
less	2/2	less, lesskey.
m4	2/2	ansi2knr, m4.
perl	4/4	`a2p, perl4.036, sperl4.036, tperl4.036`.
ps	11/12	free, fuser, killall, ps, pstree, psupdate, tload, uptime, vmstat, w.procps, w.bassman.
rcs	8/8	ci, co, ident, merge, rcs, rcsdiff, rcsmerge, rlog.
sh_utils	24/24	basename, date, dirname, echo, env, expr, id, logname, nice, pathchk, printenv, printf, pwd, sleep, stty, su, tee, test, tty, uname, users, who, whoami, yes.
sudo	2/2	sudo.bin, visudo.
tar	3/3	tar, `rmt`, testpad.
tcpip	7/31 (from the net-tools subset)	arp, ifconfig, plipconfig, rarp, route, netstat, slattach.
txtutils	22/22	cat, cksum, comm, csplit, cut, expand, fold, head, join, nl, od, paste, pr, sort, split, sum, tac, tail, tr, unexpand, uniq, wc.
util	35/57	agetty, arch, banner, chfn, chroot, chsh, col, colcrt, colrm, column, ddate, frag, hexdump, hostname, ipcrm, ipcs, last, login, mesg, more, newgrp, passwd, rdev, readprofile, renice, rev, setsid, sln, strings, swapon, ul, vipw, wall, `zdump`, zic.
xgames	8/13	maze, xcolormap, spider, xtetris, xlander, xminesweep, xroach, xvier.
xlock	1/1	xlock.

4.3.2. Setting up the build environment

The environments for the NAT-NAT, DRA-NAT and DRA-DRA builds have been setup using similar to those used during the Unixware port.

One single reference source tree, then a dedicated work tree per (build, target platform). For the 1st target (Linux/i386), each work tree holds symbolic links to the source tree, while binaries are built inside a work tree as plain files. In addition, a procedure is used to replace a link to the source tree by a link to a patch tree when a source file has to be modified during the port to TenDRA. This is very similar to the environment we had on Unixware. The major difference is that each package has its own set of source/work/patch file trees. This is more modular but requires more manipulations.
For the 2nd target (Linux/Alpha): we usually created only a work tree for the DRA-DRA build. It initially contained source files only, which are symbolic links to their equivalent in the DRA-DRA/i386 work tree. By “source files” we mean here the Makefiles and the ANDF - .j - files having been generated from the original .c files by the TenDRA producer, during the DRA-DRA build for Linux/i386.
A shell script used as a pseudo cc (e.g. pseudo gcc) during the DRA-NAT and DRA-DRA builds. This avoids the necessity to modify most of the original makefiles when building the commands. The pseudo cc used during the build for the 2nd platform substitutes the (usually unique) input_file.c by input_file.j.

One specific feature of the sources and build procedures of the Linux commands is that they have often been designed to support a variety of target platforms and UNIX variants at source level. Thus, when building a command for the first time, there is usually a preliminary self-configuration step which examines the system header files, and produces a local header file (or a customized Makefile) which summarizes the target system peculiarities by means of #define (or -D) statements. We ran such self-configuration scripts before creating the NAT-NAT, DRA-NAT and DRA-DRA work trees: this assumes that our second platform for porting (Linux/Alpha) is to provide similar APIs to the 1st one (Linux/i386). Eventually, we had to revise the settings chose by the self-configuration.

4.3.3. NAT-NAT/i386 and DRA-NAT/i386 build problems

These two builds of the commands were only performed on the Linux/i386 platform, as a sanity check and cleanup of the source code.

We faced only one problem during the NAT-NAT/i386 build of a few commands.

Some header files (e.g. linux/autoconf.h) were not found when attempting to compile some administrative commands. To gain access to such headers, the preliminary step of a kernel rebuild can be done; alternatively, one could manually establish the proper symbolic links for /usr/include/linux and /usr/include/asm: they should point to their equivalent inside the /usr/src/linux/includedirectory.

We faced a limited number of problems during the DRA-NAT/i386 builds of the commands. We list these problems below:

The link-edit of some commands failed because one symbol was undefined: _alloca. In the native compiler (gcc), _alloca is implemented as a built-in function. In the TenDRA compiler, this can also be the case, provided that the header file alloca.h is explicitly included. So we modified the relevant source files to include this header file.
The source code for some commands appeared to use, through the inclusion of a system header file or under #ifdef i386 conditional instruction, some assembly code. The related commands were thus excluded from our set of commands, except for a few of them for which we found a C variant to the assembly code.
Re-declaration of an array, for which the dimension was computed using sizeof. The following code sums-up the problem:
```
extern int lnum[sizeof(short)];
int lnum[sizeof(short)]; /* bis */
```

We sent a Change Request, array_sizeof(262), concerning this problem, which applied to the apr-95 and nov-95 TenDRA releases. It has now been fixed.

Name conflict between a function and its arguments. The following code sums-up the problem:
```
char *fields(fields)
char *fields;
{ return fields; }
```

We sent a Change Request, func_var(262), concerning this problem, which applies to the apr-95 and nov-95 TenDRA releases. It has now been fixed.

Use of custom options of the native compiler (gcc), e.g. -fpcc_struct_return

This option was used in the Makefile for the getty package. The gcc man page says that this option provides intercallability with modules (e.g. library modules) compiled with a pcc compiler. We concluded that this was not relevant when compiling for a Linux target platform, since gcc is used to compile the libraries, and we ignored it. Similarly, we ignored, i.e. filtered out in our pseudo-gcc for DRA-NAT/DRA-DRA builds, many other gcc options such as -fomit-frame-pointer, -pipe, -g while we adapted to TenDRA style some others, such as -static (for gcc) to -Wl,-static (for tcc).

4.4. Definition of the API for the commands

We started the experiment with an xpg3 API, and decided to put all other symbols we needed in an extension API. However, after a few compilations of Linux commands, it became clear that most of the symbols we were adding to the extension API were in fact part of some other standard APIs, such as svid3.

So, we redefined our base API to be a merge between the xpg3, svid3, gcc and bsd_extn APIs delivered with TenDRA, limiting the extension API to symbols specific to the Linux commands interface. In fact, some of the symbols in the extension API are defined in the standard cose API, but since this API is very partially supported by Linux, and sometimes conflicts with definitions provided in other APIs, it was not worth including it in the base API.

For the level 2 set of commands, we downloaded some packages using a BSD-like interface, and we tried to include the symbol definitions for these commands in our extension API. However, this appeared to be very difficult, since we found that the Linux implementation of some BSD interfaces redefines symbols from the POSIX API, in an incompatible way. This is reflected in the Linux header files by conditional definitions, selected with the _BSD_SOURCE macro for example, or by replacement header files, such as bsd/signal.h instead of signal.h. The incompatible definitions we found were for the jmp_buf type, the setjmp(), getpgrp(), wait(), waitpid(), wait3() and wait4() functions, and finally the signal() function redefined as bsd_signal().

This problem could have been resolved by removing from our base API the conflicting symbols, and creating a conflict_posix and a conflict_bsd extension APIs with these symbols. The compilation of the commands based on a BSD-like API would have used the conflict_bsd API, in addition to the base and regular extension APIs, and would have been link-edited with the libbsd library provided by Linux. Since this would have taken a lot of time, we preferred not to modify our API and we set aside these commands, unless we found a simple work-around: selecting at build-time, or recoding to, a POSIX adherence for them (refer to next section).

Finally, in order to compile some X11 commands, a separate API including the x5_lib, x5_t, x5_mu, x5_aw andx5_mit standard APIs, has been created. Since Linux is based on X11R6, an extension API has also been created, which includes the few symbols we had to define for the X11 commands we built.

We found one inconsistency between the Linux header files and the standard API provided with TenDRA for the <sys/socket.h> header file, defined in the bsd_ext API: we had to change almost every use of caddr_t to struct sockaddr *. We also found a few inconsistencies between the Linux/i386 and Linux/Alpha header files, which have been resolved by some corrections to the Linux native header files, in the API definition, and in the source code for one command (more).

4.5. ANDFization of the commands

We encountered different kinds of problems when compiling with TenDRA the set of commands on the Linux/i386 platform. Among these problems, only one was related to a bug in the TenDRA technology, the others were either related to ANDF constraints, or to more general portability issues.

4.5.1. Dealing with ANDF constraints

We list below problems we encountered while ANDFizing Linux commands, which are related to the use of the TenDRA technology as a replacement to a classic compiler. We start with the only bug found in TenDRA during this process, then we roughly follow the order in which the various issues were encountered.

Redefinition of an API token as a macro

In the code below, alarm is defined as a macro, but it is also a token in our API:
```
#include <unistd.h> /* for alarm() */
extern int debug() ;
#define D_RUN 1
#define alarm(d) alarm(d); debug(D_RUN, "alarm set: %s:%u",\
    __FILE__, __LINE__)
long xx() { return alarm((long)5); }
```
This code is indeed illegal, but tdfc entered an infinite loop. The problem was reported to DRA as loop_tdfc_alarm(276), and has now been fixed.
Added missing startup macros

When the TenDRA compiler (tcc) was used, a number of startup flags, defined with the native compiler (gcc), were missing. The linux, __linux__, unix and __X11_P_HEADERS flags, plus a number of flags defined in the native features.h header file, such as _POSIX_SOURCE, were added in a startup file for tcc.
Added missing function prototypes and fixed type mismatches
We used a tcc option to warn about missing function prototypes, and we fixed them by either including the appropriate header files or adding their prototype for locally defined functions. We added casting on some calls to library functions. Then, every remaining undeclared symbol was added to the extension API. Note than more than half of the changes we made in the source code for Linux commands consisted in adding such prototypes.
Resolved one conflict with an API symbol

The function mkdir(), local to the file mtools, has been renamed to avoid a conflict with the API symbol defined in <sys/stat.h>.
Illegal use of target-dependent condition

In the code below, INT_MAX is a target dependent token, which cannot be used to conditionally define a preprocessor macro:
```
	#if (INT_MAX <= 65535)
  	#define longdiff(a, b) /* (definition 1 for the macro) */
  	#else
  	#define longdiff(a, b) /* (definition 2 for the macro) */
  	#endif
```
We fixed it by replacing the macro definition by a static functions:
```
static int longdiff(time_t a, time_t b) {
#if (INT_MAX <= 65535)
  	/* ... (definition 1 for the function) */
#else
  	/* ... (definition 2 for the function) */
#endif
}
```
This constraint arises from the way ANDF is used to achieve portability between targets which may have different values for INT_MAX. The constraint is that a target-dependent #if is permitted only where a statement is permissible, and both alternatives must be legal statement lists.

This constraint unfortunately prevents target-dependent macro definitions in the style shown above. DRA is currently considering whether the constraint may be eased in a subsequent version of TenDRA to permit certain well-formed cases such as this.
POSIX.1 or SVID interfaces versus BSD interfaces

Three functions of the bin package, time and crontab, and ash (a simple shell), were configured to use some BSD interfaces which had not been included into our API, as discussed in §4.4.

For the time command, we found that the support for POSIX interfaces was provided in the source code, so we used it.

Similarly, prior to building ash, we modified the related configuration file and Makefile, in order to elect svid3-like interfaces instead of the default bsd ones.

For the crontab command, we fixed the problem by removing in the source code some (simple) calls to the BSD wait4() function, and by using the XPG3 waitpid() function instead.
For a few commands which use the curses interface, e.g. bpe, we chose the svid3 variant instead of the BSD one (they are both supported by Linux). Makefiles for building these commands with TenDRA have been changed to use the libncurses library instead of the libcurses library at link-edit time. Note that the sources for the elvis editor (from the bin package) embed a small custom version of the curses interface.
For some commands, the initial self-configuration step performed prior to entering the actual build defines the path for another command, because the latter is called by the first one by means of exec() or system(). While Linux provides the <paths.h> header file for this purpose, we found some files which do not include this header file, and others which need to call a command for which there is no path definition in the regular header. An example of such a situation is elm, which calls an editor (e.g. vi). When we detected such situations, we either modified the source code to include and use <paths.h>, or we added a definition inside the alternate <paths.h> specified in our extension API.
Pointer/Integer conversion

The TenDRA compiler can be configured to issue a warning on every pointer/integer conversion. This is done with the pragma instruction:

#pragma TenDRA conversion analysis (int-pointer) warning

However, due to the very large number of occurrences of these warnings, we had to cancel this mode, and decided to postpone their analysis until after the validation step.

For example, we encountered uses of -1 (minus one) to give a special meaning to a pointer value, while only 0 (NULL) is accepted for this purpose. (Note: 64-bit issues are discussed later in this section.)
Underspecified type in svid3 API

We found one command which makes the assumption that the daddr_t type, defined in the svid3 API, is an arithmetic type. The source code casts a daddr_t value into an int, while daddr_t is defined in the API as:
```
+TYPE daddr_t;
```
We fixed this problem by stating that daddr_t is indeed an arithmetic type, which is correct for both Linux/i386 and Linux/Alpha. We initially modified the reference svid3 API, but later we did it more cleanly, moving the daddr_t definition to our extension API prior to changing it to:
```
+TYPE (int) daddr_t;
```
Also, prior to fixing this in the API, we found that casting an integer value to a daddr_t type was not rejected by the TenDRA compiler, while it is obviously illegal. A bug report has been sent to DRA, and this has now been fixed.
Recoding of a source file dealing with target platform byte ordering issues.

A source file used a local BYTEORDER macro, set-up during the initial self-configuration step of the build of the command, to support different byte ordering. However, Linux already provides for this purpose a __BYTE_ODER macro, defined in the <bytesex.h> header file. So, we added the __BYTE_ORDER symbol to our API, and replaced all the occurrences of the BYTEORDER macro by references to the __BYTE_ODER API macro. We also had to rewrite some code, because with TenDRA some instructions are illegal after a target dependent condition.
The termio and termios interfaces are both provided with Linux, and share the same set of macros to define indexes in the c_cc array from either the termio or termios structures. On Linux/i386, these indexes are the same for the two structures, while on Linux/Alpha they differ. TenDRA provides a way to support different variants of a same object, using version numbers, and this should have solved our problem. However, since we never used this feature before, we did not spend time to see how we could use it in our API. Instead, we made a temporary fix, consisting in renaming the constants in the termio interface. For two of the commands we ported, more and ispell, which use the termio interface, we changed their sources to reference the new macros.

4.5.2. Undocumented dependencies to the OS / the underlying hardware

Some commands are platform dependent, and are not easily (sometimes: not at all) portable from one platform to another. However, the Linux/i386 and Linux/Alpha OS's are very similar; furthermore, some hardware architectures built around the DEC Alpha chip are not much different from the Intel-based PC: for example, our Linux/Alpha platform includes ISA adapters for graphics and Ethernet. In such a favorable situation, the Linux/i386 ifconfig command (from the tcpip/net-tools subset), which displays hardware information on the network interfaces such as their (ISA) “Base address”, could probably have been easily ported to Linux/Alpha (/AXP pci). Also, the perl command, which includes optional support for undocumented system calls, may or may not be portable between two Linux platforms, depending on the system calls they implement.

On the other hand, changing the format for binary files, e.g. switching from a classic Linux a.out format to the Digital Unix “Extended COFF”, may require changes in some common commands such as strings or file. When using a classic compiler, some (or even all in a favorable case) of these changes may be hidden inside the system header files, e.g. <a.out.h>, but the TenDRA compilation chain, when used in DRA-DRA mode, is often more rigid.

4.5.3. Upgrade to TenDRA 4.0

When we had ANDFized the whole set of commands, we upgraded to TenDRA 4.0 in order to work on the Linux/Alpha platform (see §4.2.2). However, we had to ANDFize again the set of commands with the new TenDRA version, since it was not upward compatible with the previous one. In fact, we only re-ANDFized the commands we tried to install on the Dec/Alpha platform, at the time we needed them. We did not encounter any problem when doing this.

4.5.4. Holes in source code portability (64-bit vs 32-bit issues)

During the installation and validation of these commands on the second platform, we found a number of bugs related to portability problems, which are out of the scope of TenDRA. All these bugs were due to code assuming 32-bit platforms, which break on 64-bit platforms. Some of the bugs we found were already fixed in early Linux/Alpha releases, such as the Blade release, others were still there. We fixed the source of the commands, re-andfized them, and installed and validated these commands again on the two platforms. We give below the portability issues we encountered:

Wrong int <-> pointer conversion

On Linux/Alpha (and Digital Unix/Alpha), a pointer type is 8-bytes wide, so it cannot fit in an int type, which is only 4-bytes wide. Fortunately, the long type on Linux/Alpha is, as usual, as large as a pointer, and thus can be used as a replacement for an int, each time an explicit pointer<->integer is used. This is a common type of the portability fixes we had to make in the source code for Linux commands, subsequent to encountering a Linux/Alpha-only problem at validation time.
Incorrect assumptions on sizes of int, long and size_t

Although in many cases int and long types are equivalent, we give below three examples of code we found where it makes a difference:
```
/* #1 */ { int i; printf("i value is: %ld\n", i); }
/* #2 */ extern char *malloc(int);
/* #3 */ { long l ; printf("%08lx", l); }
```
All three work perfectly on Linux/i386, while they cause, or could cause, damage under Linux/Alpha.

In the first two lines the function, printf or malloc, will read respectively a long on the stack, which are 8 bytes wide, while only 4 bytes for an int would have been pushed. Note that the correct prototype for malloc is char *malloc(size_t), and that size_t is equivalent to long on Linux/Alpha. We fixed the error on such printf statements with a cast to long for an argument, and the error on malloc by replacing its local-and-wrong declaration by the inclusion of the <stdlib.h> header file.

In the third case, the instruction was used to print a fixed number of digits. However, a long, 8-bytes on an Alpha platform, may hold a value that prints up to 16 digits, thus putting unexpected digits in the output. In the code where we found the problem, the fix was to truncate the value to a 4-byte value.

4.6. Installation of the API for the commands

4.6.1. Installation of the API on Linux/i386

The API is made of two parts, a base API, which is a merge between xpg3, svid3, gcc and bsd_extn APIs, and an extension API, which completes the interface required for the commands (see §4.4).

The base API was installed on the Linux/i386 platform, without any problem. However, we left some tokens undefined when some parts of the API were not part of the actual Linux API. Then, the extension API was installed, as we extended it to cover more and more commands. These installations required a few patches to the system header files, some of which were already provided with the TenDRA snapshot. These installations went very well, with only a few problems, listed in a following paragraph. Then, when we moved to a new TenDRA snapshot to the Linux/Alpha platform, we re-installed the API, without any problem.

We made a small number of modifications to the API during the port of the commands on the Linux/Alpha platform. However, each modification we made required re-installation of some parts of the API.

4.6.2. Installation of the API on Linux/Alpha

As for the Linux/i386 platform, we had to apply some patches to the system header files in order to install the API. A number of these patches were actually identical to the patches we made on the Linux/i386 header files. So, instead of copying and editing, by hand, these files again, we chose to implement these patches by means of sed scripts, which could be applied to both Linux/i386 and Linux/Alpha header files. Most of these scripts are now common to the two platforms, although a few of them are specific to one. These sed scripts not only facilitate corrections to the system header files, but would also be useful if we need to upgrade from one Linux version to another.

The Linux/Alpha system header files do not differ much from the Linux/i386 ones. However, since the Linux/Alpha port was derived from a more recent Linux/i386 version (Linux 1.3) than the one we used (Linux 1.1), we could not clearly distinguish between the changes which come from standard Linux evolutions and those which have been introduced during the port of Linux to Digital Alpha. One important modification was that some definitions found on Linux/i386 in some <linux/*> or <sys/*> header files have now been moved into <asm/*> header files. We had to take such changes into account when adapting the Linux/i386 modifications to Linux/Alpha.

Finally, we did not port to Linux/Alpha the extension API to the TenDRA x5/* APIs. This would have required installation of X11 on our Linux/Alpha box, which consists of 28 additional floppy images! This part of the API was only required for 9 commands from our set of commands, which we did not install on Linux/Alpha.

4.6.3. API installation problems

We list below the problems we found when building the API on the Linux/i386 or the Linux/Alpha platforms, and the solution we adopted.

A macro, makedev, added to our extension API, was defined in the Linux/i386sys/sysmacros.h header file. This file contains the lines:
```
#define major(dev) ...
#define minor(dev) ...
#define makedev(major,minor) ...
```
The identifiers major and minor used to name the formal parameters of the makedev macro are also the name of two macros defined in this header file, which we included in our API. The clash on the names is reported as an error by tdfc when building the API. We did not check whether this was a TenDRA bug or not, but we bypassed the problem by using an alternate version of sys/sysmacros.h, in which the formal parameters of the makedev macro have been renamed.
The <bytesex.h> header file contained the following lines:
```
#undef __BYTE_ORDER
#define __BYTE_ORDER 1234
```
The #undef line prevents tcc from finding the definition of the __BYTE_ORDER constant. This is a constraint that applies only when building API token libraries. It is a necessary consequence of using C macro definitions to obtain ANDF token definitions. We bypassed this behaviour by commenting out the #undef line in a replacement header file.
The Linux/Alpha header files do not follow the standard APIs in some cases. For example, the Linux/Alpha <sys/stat.h> header file defines the field st_dev from the stat structure as unsigned int instead of dev_t as defined in XPG/3. Since dev_t is equivalent to unsigned int on Linux/Alpha, we were able to modify the system header file to use the correct type.
The Linux/Alpha header files sometimes use an int type, in places where a long type is used on Linux/i386. In such cases, we decided to patch the Linux/i386 header files to use an int type, since long and int are equivalent on a 32-bit platform.
The reverse situation, where Linux/i386 uses an int and Linux/Alpha uses a long, has also been found. In this case, we preferred to modify the API to accept both types, using the tspec +TYPE (int) ... notation.
On the Linux/i386 platform, we extended the API with some symbols from the <termio.h> header file. The <termio.h> system header file has changed between Linux/i386 and Linux/Alpha, and some definition were incompatible with the extension API. We eventually found a solution, which involved a fix to the Linux header files.
The Linux system supports two variants of the curses library, one defined by the <curses.h> header file for a BSD API, and the other defined by the <ncurses.h> header file for a svid3 API. We used the latter to build the API, since we do not support the BSD API.
The TenDRA svid3 API defines the constant RLIM_INFINITY, from the sys/resource.h header file, as follow:
```
+CONST int RLIM_INFINITY;
```
However, this constant is used to assign variables of type rlimit_t, which is, on Dec/Alpha, defined as a long, thus 8-bytes wide. So the problem was: while this constant was defined with the value 0x7fffffffffffffffL, it was actually truncated to fit within a (32-bit) int. We fixed this bug by replacing the definition of RLIM_INFINITY by:
```
+CONST rlimit_t RLIM_INFINITY;
```

4.7. Installation and validation of the commands

On the Linux/i386 platform, we used the TenDRA compiler to produce the ANDF files, and then translate them into binary executable files, in the same invocation of the compiler. A tcc option was used to preserve the intermediate ANDF files.

On the Linux/Alpha platform, we used the ANDF files produced on the Linux/i386 platform, and translated them into binary executable files. For this platform, we ran the TenDRA compiler on a Dec/Alpha platform as a cross-compiler for the Linux/Alpha platform (see §4.2.2).

In order to validate the commands we built, we used several different methods, depending on the commands.

We found that a very limited number of commands were packaged with some rather extensive self-validation tests, that we used to validate such commands.

We tested some other commands interactively (e.g. elvis, bpe, ispell, elm, ...). However, for most commands we had to write small tests. Even a basic test requires several shell script lines: it took us several weeks to write tests for >200 commands then to run them.

Finally, a small number of commands, actually 10, was not tested: getty/uugetty, sudo.bin/visudo, readprofile, swapon, mt-GNU, rmt, plipconfig and slattach. As none of these commands were actually installed on the 2nd platform, there is no real penalty.

On the Linux/i386 platform, 236 commands were installed, then validated. Conversely, on Linux/Alpha, we installed and validated only a subset of these commands: about 150, as previously mentioned. While we found only a few problems during the validation on Linux/i386, we faced a number of validation failures on Linux/Alpha, thus requiring much more investigation. Some of these problems were due to bugs in the TenDRA technology, and have all been fixed by now. Most of the others have already been discussed in previous sections.

We list below miscellaneous problems, encountered at validation time on either the 1st or the 2nd platforms, which are not really related to TenDRA, nor do they depend on portability issues in the original source code.

4.7.1. Miscellaneous problems encountered at validation

For two commands, sln and stty, we found different behaviour due to an undetected missing symbol in our API. For example, the sln command is written such that, depending on weather the S_ISLINK symbol is defined or not, it generates a symbolic link or a hard link. The missing symbol was defined in our API to fix the problem.
The more command on Linux/Alpha, of which the source code uses the non-POSIX termio interface, waits for 4 input characters before processing them. The reason is that in Linux/Alpha (as in Digital Unix) some of the termio.c_cc[] control characters overlap with some others, depending on the input mode being used (i.e. either the “canonical” mode or the “half-cooked” mode). So such control characters, which are the ones at VMIN and VTIME indexes, must always be updated when switching from one input mode to the other: we changed the source code for the more command accordingly.
We found two bugs in the Linux/Alpha libc library, in the getegid and times functions. The 1st one was already known and fixed in a later release, while the 2nd one was not (we received a fix for it a few days later, from a member of the Linux/Alpha project).
The csplit command, from the txtutils package, defines the memchr() function, also defined in the libc library. The validation of this command failed on Alpha, until we removed this local recoding for memchr() and used the regular library entry point instead.
The environment on Linux/Alpha for using the zic command was not correct, since that command is not available in the Linux/Alpha BLADE_0.3 distribution.
The environment on Linux/Alpha for running the bpe command was not correct, since we had used the (available) ncurses library when link-editing this command: at run-time the related terminfo data base was lacking. We simply had to replicate on Linux/Alpha the Linux/i386 terminfo files to cure this problem.

4.7.2. Recent upgrades of our original source code for Linux commands

We initially found on the net a limited number of patches for the commands in source form; such patches had been made during the port of Linux to Digital Alpha. For example, the following patch for the source code for the col command was found on ftp.azstarnet.com, but it is not a example of interest since we had preventively fixed it, i.e. during the initial ANDFization of the col command:

util-linux-2.5/text-utils/col.c:

+ #include <malloc.h>

Conversely, neither the BLADE_0.3 nor the BLADE_0.2 distributions include source code for commands; we discovered very recently that such code was available (on ftp.digital.com) for the very first, 32-bit Linux/Alpha distribution. Nevertheless, for three commands we used source code from the RedHat 2.1/beta and 2.1 Linux/Alpha distributions. This allowed us to fix incorrect behavior of the zic command; we also experimented without success (due to lack of time) some partial upgrades of the source code for zdump and cpio.

5. Statistics

5.1. Statistics for Linux APIs
5.2. Statistics concerning changes in the original source code

5.1. Statistics for Linux APIs

All the files required to create an API are located inside a special sub-tree of the TenDRA [4.x] distribution named src/apis/. First, the new API must be specified, in a dedicated language (the TenDRA tspec tool is provided to translate such API specification into a target-independent intermediate format). An API specification is split into files, each of which usually corresponding to a system header file available on the target platform(s); e.g.: stdio.h, sys/types.h, etc.

Secondly, there is often the need for slight changes within some of the system header files of a given target, in order to build (to install) the new API on it; such changes are either hard-coded inside a replacement version for the relevant headers, or automatically performed using scripts of text processing commands; the sed tool, along with sed scripts, are used to deal with common APIs (for UNIX-like systems) included in current TenDRA distributions. We think that the second method is more flexible; for example, the text processing commands to apply on the system header files in order to build successfully our custom APIs for Linux were in most cases the same among Linux/i386 and Linux/Alpha targets, so we have written a single set of sed scripts suitable for both of these platforms.

The following table gives an idea of the amount of work required to specify and to build the APIs used by the ~230 Linux commands which were ported to ANDF. Two types of data are reported: “number of files” and “number of lines”. The files are modules holding lines of either API specification or of sed commands. We excluded from our statistics the comment lines for both, because the ratio of comments versus actual tspec/sed constructs was unusually high.

We recall that our Base API for Linux was derived from existing “standard” APIs, i.e. mainly from XPG3 and SVID3, which themselves use parts of POSIX. Consequently, the number of tspec lines for this API which is shown below does not reflect the actual amount of lines of specification having been used: most of the specifications we wrote are similar to C language #include statements (tspec +IMPLEMENT or +USE constructs).

In our “Extension API 1”, we have not uncommonly rewritten specifications which had an equivalent (either identical or similar) within the reference SVID3 API or SPEC1070 API.

We recall also that we did not start from scratch when making the changes in Linux system headers in order to build our APIs, either for Linux/i386 or Linux/Alpha targets: the TenDRA Snapshot we started from (April 95) provided 11 sed scripts for this purpose, that were sufficient to build most of the XPG3 API for the Linux/i386 target.

The number of files given for “Changes in system headers” is the one for the 1st target, Linux/i386 (1.0): as already mentioned, the build of APIs for the 2nd target, Linux/Alpha (1.3), uses the same set of sed scripts. This means that the number of lines corresponds to the sed commands which apply either to a Linux/Intel system header, to a Linux/Alpha header or to both.

	Base API, specs	Extension API1, specs	Extension API2 (X11), specs	Changes in system headers (build)
files	65 (.h)	66 (.h)	3 (.h)	48 (sed scripts)
lines (excl. comments)	1381	684	18	254

Table 1. APIs for Linux / ~230 commands

5.2. Statistics concerning changes in the original source code

The following table lists the packages we installed and validated on both platforms. The first column gives the (Slackware Linux) name of the package, and the second gives the number of source files we dealt with during the ANDFization. The following column gives the number of files - actual source or Makefile - patched during either the initial ANDFization (keeping the i386 as target), or the port to the 2nd target (Alpha). The two last columns show the number of files specifically (re)patched during the port to Alpha, and finally the number of patched Makefiles (that for the overall project).

Packages	Source files	Patches `Total`	Patches `Alpha port`	Patches `Makefiles`
bin	455 ^[a]	86	19	6
sh_util	72	14	3	0
txtutils	50	8	2	1
util	71 ^[b]	51	22	3
diff	32	10	3	0
gzip	36	4	1	1
grep	16	7	4	1
find	57	12	2	0
bc	20	2	0	0
tar	37	10	2	1
rcs	28	6	0	0
byacc	20	8	0	0
m4	34	1	0	0
less	44	2	0	0
flex	46	3	0	0
ispell	40	21	15	1
elm	170	27	4	0

Total	1233	272 (22%)	77	14

Table 2. Packages ported to Linux/i386 and Linux/Alpha

^[a]
Contents of the bin package were partially ANDFized: 48 commands out of 56. Also, a subset only of these ANDFized commands was actually ported to Linux/Alpha: 37 (out of 48). So, the “number of source files” shown excludes source code for non-ANDFized commands. Conversely, the “number of patches, Alpha” would be greater if all the ANDFized commands had been ported to the second target.
^[b]
Contents of the Slackware Linux util package were partially ANDFized: 35 commands out of 57. Among these, only 30 were actually ported to Linux/Alpha.

The following table lists the packages we installed and validated on the Linux/i386 platform only. Similar information, except for the number of patches for the Linux/Alpha platform, is given.

Packages	Source files	Patches `(i386 only)`	Patches `Makefiles`
aaa_base	122	22	2
ash	49	15	1
bsdgames	47 ^[c]	35	11
cpio	44	22	1
getty	18	6	1
joe	87	26	0
perl	86	10	1
ps	35	6	1
sudo	9	6	0
tcpip/net-tools	26 ^[d]	12	0
xgames	105	28	3
xlock	37	4	1

Total	665	175	22

^[c]
The bsdgames package was partially ported to ANDF
^[d]
In the tcpip package, only the Net-tools subset was ported.

From the data shown in the table above , we can say that about 80% of the original source files did not require any change when ported to ANDF. We find also that we encountered, during the initial ANDFization and the validation on the 1st platform, more than 70% of the files requiring changes. In other words, we had missed approximately 30% of required changes. Note that in our project both of the platforms were running the same UNIX-like variant (Linux), so the last ratio could be greater in a less favorable case. Finally, since the Intel/i386 and Digital/Alpha feature different sizes for pointer type (and long type), while the byte ordering is the same, there is probably still incorrect code lying inside our ANDF files for Linux commands.

6. Conclusion

During this second part of the project, we greatly benefited from the experience acquired during the first part Validation of TenDRA Capability to Implement a UNIX-like Operating System. The build environment we used and the way we integrated the TenDRA technology were directly derived from the work on the Unixware platform. Also, the initial set of Linux commands we could reasonably port was derived from the commands we had already ANDFized for Unixware.

We finally ported and validated about 230 commands on the Linux/i386 platform, but we could port only about 150 of them to the Linux/Alpha platform. The main reason for not porting the 80 remaining commands on Linux/Alpha was that we were short of time. Although we are below the target of 300 commands to be ported to both platforms, we believe that the experiment was a success. A significant amount of sometimes complex code was ANDFized, and the portability of this code was demonstrated. The main reasons that the initial objective was not attained were:

The switch to the second platform was delayed. Initially, we had planned to use the microkernel based version of Linux on PowerPC, but this was not available in time.
Setting up the Alpha/Linux system took a considerable amount of time. This was mainly due to the fact that Linux has not been ported to “off-the-shelf” hardware, and the system had to be assembled from components (mother board, power supply, disks, etc.).
In the Linux/Alpha distribution we installed, a number of commands were missing, and others were still not fully validated. This meant that, porting commands from Intel to Alpha, we found several portability bugs which were not fixed, and sometimes not even known, in the Alpha command sources. Finding fixes or asking for help on the Linux/Alpha mailing list slowed down our work.

In the course of the project, Linux has evolved from Linux 1.1 to 1.3, and Linux commands were distributed on Alpha. However, we stuck to the Linux 1.1 commands, and we only used the Linux/Alpha distribution to install the system and to find fixes when we had problems with some commands.

Actually, many of the portability problems we found are now fixed on Linux/Alpha, and it would be much easier to port these commands now than it was a few months ago. However, the first source delivery of commands for Linux/Alpha was only available in December 95, and using this new set of commands would have required to re-do on the Intel platform the work (ANDFization, installation and validation) which was already done. Moreover, this first delivery was still not extensively validated and contained a number of bugs.

This project demonstrated that it is possible to build an API for a -rather large- set of commands. Moreover, this API is mostly based on standard APIs, xpg3, svid3, with a relatively small number of additions (e.g. a more complete bsd API than the TenDRA bsd_extn). However, installation of the standard APIs (Posix, Xpg3,...) with TenDRA has shown that Linux does not fully conform to these standards. In fact, the best support is for the bsd API, since a large number of commands come from bsd sources.

We used a 3-step approach in porting to ANDF, first compiling with the native compiler, second, using the TenDRA compiler with the native header files, and finally using TenDRA with abstract headers files and token libraries. Although we used the second step to find portability errors detected by the TenDRA compiler, this could be done in the third step. However, the third step which consists in making the source code compatible with the abstract API, is not so easy, and thus we found it easier to resolve portability checks in a separate step.

Most of the modifications we made to the source code for the commands came from missing function declarations and other trivial unportable coding, e.g. wrong assumptions on int and pointer sizes.

The TenDRA technology has proven capable of handling the differences between the two platforms. Although the same operating system runs on both platforms, these hardware platforms are sufficiently different to create difficulties. Standard APIs were successfully installed on both platforms, and the specific API developed for the commands could be designed to support the two different implementations. However, defining and installing an API is not an easy process and requires some sometimes extended knowledge of the TenDRA technology. We also detected a very small number of bugs in the TenDRA technology, in particular for the Dec/Alpha installer which has only been recently distributed by DRA.

A. Errors in the TenDRA compiler

The problems are classified according to their status at the end of March 1996. They were encountered on the April 95 or November 95 releases of TenDRA, and have all been fixed now.

CR95_359.FB::shift_to_sign

Error on an arithmetic shift left.

Fixed in November 95.
261 - array_sizeof

Error when declaring twice an array with sizeof(int) as dimension.

Fixed in tdfc:4.58
262 - func_var

Name conflict between a function and its arguments.

Fixed in TenDRA February 96.
276 - loop_tdfc_alarm

Tdfc loops when attempting to redefine as macro a tokenized function.

Fixed in tdfc 4:58
220 - fstp_esi

invalid assembler instruction fstp %esi

Fixed in TenDRA November 95.
225 - func_return_struct

Intercallability with a gcc-compiled library function which returns an 8-bytes structure. Bypassed with thw tcc option -Wt,-G1.

Fixed in TenDRA November 95.
335 - alpha_long_too_big

Use of a literal integer constant larger than 232.

Bypassed. Added a L suffix to such a value.
337 - Error in Alpha installer

Error message “internal error: Change_variety out of range”.

Fixed in TenDRA February 96.
338 - Error in SVID3 API (RLIM_INFINITY)

Illegal type for the API definition +CONST int RLIM_INFINITY ;

Fixed in the next major TenDRA release.
343 - Error in Linux installer (/tmp full)

The translation on Linux/Alpha of a file required more than 80MB in /tmp

Fixed in the TenDRA February 96.
344 - SVID3 API (daddr_t)

Casting an integer value to an unspecified type was not rejected.

Fixed in tdfc:4.98