Jump to content

Draft:Tunnel Grammar Studio

From Wikipedia, the free encyclopedia
Written inC++
Operating systemWindows
Available inEnglish
Licenseproprietary
Websitewww.experasoft.com/en/products/tgs/

The Tunnel Grammar Studio (TGS) is an Integrated Development Environment (IDE) for generating parsing machines. Each parsing machine might have different modules, such as: supplier, scanner, lexer, parser, optimizer, and builder.

TGS is a parsing machine generator and it is used to develop stand alone parsing machines from a lexer and a parser grammars. The accepted grammar syntax is the Augmented Backus–Naur form (ABNF). Graphical representation of the developed grammars is available directly into the IDE. TGS handles parsing deterministically for some types of ambiguous grammars.

Parsing Machine

[edit]

The input of the generated parsing machines is in bits. The parsing machine is organized in different modules.

Token Types

[edit]

In the machine there are 4 token types:

  • Character token -- each token of this type contain a name consisting of a single character. These tokens are written as the tuple (t-character,x) for character x;
  • Sequence token -- these tokens are contain a name from a set of names and a lexeme consisting of at least one character. These tokens are written as the tuple (t-sequence,e) for lexeme e, such that |e|>0;
  • Limit token -- tokens, which are output by a module, when some of its technical limitations is reached, and the module cannot continue to perform. These tokens are written as the tuple (t-limit,Q,e) for a nonempty set of names Q and lexeme e;
  • EOF token -- the last token output by a module, written as (t-eof).

Supplier

[edit]

The module which is responsible for the supplying of the bytes to the parsing machine is the Supplier module. There can be different suppliers which "bring" the bytes from different byte sources: Stream Supplier which reads bytes from the file system, Network Supplier which downloads bytes from any network, and so on.

The generated code from TGS has an interface

Scanner

[edit]

The job of the scanner module is to decode the bits it receives from a supplier module into Unicode code points. The output date is a sequence of character tokens.

Lexer

[edit]

The job of the lexer module is to receive tokens from the previous module in the machine and to group, if possible, some of these tokens into new larger tokens.

Parser

[edit]

The parser receives tokens from the previous module in the machine, checks whether the tokens form a valid sentence in the language defined by the parser grammar, and outputs syntax structure construction commands.

Optimizer

[edit]

Between the parser and the builder modules there can be an optional optimizer that inputs and outputs commands. The optimizer can buffer a number of commands before sending them to the builder and can also change the commands before sending.

Builder

[edit]

The builder module receives the stream of commands from the previous module. The handling of the received commands is made by an object called an architect. There are two types of architects -- an abstract and a concrete.

Concrete Architect

[edit]

The concrete architect creates concrete syntax trees. A concrete tree does contain all of the necessary information for the user to be able to determine:

* which node in the tree was created according to a particular rule or a group in the grammar; 
* all of the nodes created according to an element having repetitions;
* 

Abstract Architect

[edit]

The abstract architect creates an abstract syntax tree, with a chosen level of abstraction.

Visitor Pattern

[edit]

Tunnel Parsing

[edit]

The generated PM source code (including the syntax tree) is object-oriented. The target language for each PM is C++98 with single or multi threaded (currently with Win32 API) run time and x86/64 CPU architecture support. The generated PM are operating online - each instance can process as much as input is available, pause and wait for more input and continue later when more is available.

Debugger

[edit]

Epsilon Grammar Studio has a an integrated debugger that visualises the PM progress step by step for an input string, and automata visualizer to help the developer to create grammars.

Grammar Analysis

[edit]

At compile time, a grammar analysis is performed detect grammar ambiguities and is the grammar deterministic or context free. All collisions between grammar elements are reported visually into the IDE.

References

[edit]
[edit]

Made referenced for the terms

[edit]