First published .

Revision History

kate

Combined C99 and C90 code generation. The applicable API is now selected by `#if` on `__STDC_VERSION__`.

kate

Removed -a; I can't think of any reason why you might want to generate standard C code eliding assertions, as opposed to compiling with `-DNDEBUG`. So now assertions are always generated.

kate

The dependency on libexds is now resolved internally for convenience of building.

kévin

Removed ##.

kévin

Implemented group merge.

kévin

Added ARGUMENT name:ctype;

kate

EOF is no longer permitted in groups.

Empty groups are not present in the lookup table.

Groups may now contain character \0.

Group inversion is supported for including subsets: GROUP a = "[^b]";

kate

Removed legacy support for tokens mapping to functions (and in particular, literally-embedded fragments of source code specified as string literals for function arguments). These were written in the form:

TOKEN "abc" -> f(...);

with various arguments permitted inside the parenthesies. These have been superseded by SID-like actions defined in the .lct file, which look like:

TOKEN "abc" -> <f>(...);

These provide a language-agnostic manner to achieve the same effect, keeping all language-specific portions in the .lct. file.

Any existing .lxi files ought to be updated to use these with an accompanying .lct file, instead.

kévin

Added the “test” language.

kate

The .lct headers are now output after the generated lexer's header is included. This is a convenience to permit the .lct headers to make use of symbols from the generated header (notablly LEXI_EOF in static support functions, for example).

kate

Added a new command line option, -i, for specifying a prefix for user-supplied interface functions separately from the -p prefix. Currently the only such user-supplied interfaces are *getchar() and *unknown_token.

The rationalle for introducing this option is to permit these interfaces to be shared from a common source whilst keeping the -p prefixes separate, as would be the case for a .lct file shared between many lexers each with their own .lxi specification.

kévin

Added .lct actions files. These are an analogue to sid's .act files.

Four predefined types have been added: TERMINAL, INTEGER, STRING and CHARACTER. Scopes (instructions_list) can now retain local names with their types.

kévin

Added [...) range markers for zones. This is mostly useful for identifiers.

kate

Added Graphivz “Dot” output.

kate

lexi_group() now returns a bool for C99.

kate

Added -a; assertions are now disabled by default.

kate

Added C99 as an output language.

kate

Unknown tokens are now provided by lexi_unknown_token, which forms part of the generated API. The character passed is removed, as it serves little purpose.

kate

LEX_EOF is now known as LEXI_EOF.

kate

Moved the token buffer into lexi_state. This introduces state to the buffer-manipulation API, which breaks backwards compatibility.

kate

The state for zones is now maintained as part of a slightly more formal interface; unfortunately the contents of this struct (though intended to be private) are visible for the convenience of allocation, but defining an instance of this struct on the user's side of the generated API allows for multiple instances of Lexi to run concurrently.

The API is now consistent for both with and without zones.

kate

Added -l for the output language.

kate

The group-querying functions are now behind a single interface, to which an enumeration of group identifiers is passed. This breaks compatibility for is_*(), which no longer exist. This helps provide a static set of functions should somebody wish to wrap calls to lexi-generated code; the contents of the enumeration scale with groups, rather than the size of the API.

This change brings the lookup table behind the API, making it static. The type of the lookup table is also removed from the public interface.

kate

Reworked prefix generation. All generated symbols should now be prefixed, which breaks compatibility.

kate

Removed single-file output; the generated header is now mandatory.

kate

Implemented an alternate keywords mechanism. This introduces a new function, lexi_keyword(), which returns an int representing the keyword found. The intention is that this would share a token ID along with Lexi's main interface.

Removed the -k option.

kate

Removed lookup_char() in favour of passing the character directly to the is_*() macros. This is an API change.

kévin

Added the -p prefix option.

kévin

Added COPYRIGHT and the -C option for specifying copyright files on the command line. The origional “first comment” behaviour is kept, if neither of these are specified.

kévin

Added [^...] for complements of groups.

kévin

Added #n and a literal string for function arguments.

kévin

Added optional support for a generated C header file.

kévin

Added support for groups in zones.

kévin

Added zones.

kate

Merged in support for up to (and including) 31 groups. This is a rework of a patch origionally submitted by Rob Andrews, but not included in our tag for Lexi 1.2. Notably it has been adapted to use stdint.h instead of assuming the size of various C types.

kate

Lexi now maintains its own internal fixed-length buffer when reading tokens. This obsoletes the unread_char() function, which is now no longer called by lexi.

This change aims to be backwards-compatible; it should not affect existing programs, though their unread_char() functions may now be redundant and can be removed if not used by their own routines.

kate

Tagged Lexi 1.3.

kate

Wrote the Lexi Users' Guide.

This was mostly reverse-engineered from the source, and by experimenting with Lexi.

kate

Added examples.

kate

Added -h.

kate

Moved out Lexi to a standalone tool.

asmodai

Tagged Lexi 1.2.

This corresponds to Lexi 1.2, which was developed privately by Rob after the 4.1.2 TenDRA release. We are skipping version 1.2 to avoid confusion with his version.

DERA

Lexi 1.1; TenDRA 4.1.2 release.