1. sid's Organisation

  1. 1.1. The main Function
  2. 1.2. Adding a New Output Language: main_language_list
  3. 1.3. Code Organisation and Conventions

When you call sid, the main operations it performs are

  1. Reads the grammar .sid file and stores its internal representation.

  2. Reads the grammar output language specific .act file and complete the representation of the grammars with the action code. (After this step, sid only works on the internal representation.)

  3. Transforms and Optimises the Grammar. Most notably, it removes left recursion and tries to transform the context free grammar provided in an equivalent LL(1) grammar.

  4. Outputs the parser.

1.1. The main Function

Let's look at some parts of the code in main.c. In main, you can see HANDLE and WITH: these are macros that emulate an exception mechanism in C. These come from libexds.

The function main itself doesn't do much: it initializes some structures, calls main_init to process the command line options then calls main_1 to do all the interesting work. Now look at the function main_1, it calls in order (forgetting about all the initialisation stuff and the error handling stuff)

sid_parse_grammar()

parses the .sid file and converts it into an internal representation.

grammar_check_complete(&grammar)

verifies that all rules are accessible.

(*(main_language->input_proc))(output_closure, &grammar)

parses the action file and completes the internal representation of the grammar.

grammar_remove_left_recursion(&grammar)

TODO

grammar_compute_first_set(&grammar)

computes the first set of each rule in the grammar.

grammar_factor(&grammar)

TODO

grammar_simplify(&grammar)

TODO

grammar_compute_inlining(&grammar)

TODO

grammar_check_collisions(&grammar)

TODO

grammar_recompute_alt_names(&grammar)

TODO

(*(main_language->output_proc))(output_closure, &grammar)

outputs the parser in the chosen language.

You may wonder what main_language is. We explain it in the next section.

1.2. Adding a New Output Language: main_language_list

The global variable main_language allows us to easily modify the output language. It's a pointer to a structure called LangListT. This pointer will always point to an element in the table main_language_list, which contains callbacks for the various stages of processing.

The first member indicates the option name. The second one is a pointer to the initialisation function. The third one contains a pointer to the top input routine for the action file. The fourth one is an integer indicating the number of input file (2 for outputting C, 1 for test). Then we have the top output language specific output function and finally the number of outputted file. Don't remove the last line: it serves as a guard. To add a new output language, add a line to main_language_list and implement the new top level functions.

1.3. Code Organisation and Conventions

If you read this guide, it is probably because you want to modify sid. In this section, we say how sid is organised and how one should modify the code to keep the code readable.

sid defines many types for the internal representation of a grammar. These types are defined in the header files, begins with a majuscule and ends with T, e.g. RuleT. If a type, MytypeT is declared in myfile.h, then any function that directly touches the members of MytypeT begins with mytype_ and is defined in file.c. No other function should touch MytypeT directly. If you want to access an object of a certain type, do not access its members directly. Instead, use the interface declared in the header (the same header where the type is declared).