This class defines the basic structure of the Gobstones Lexer. It is parametrized, to allow for different variations of the language (both in different natural languages, like English or Spanish, and in the Domain Specific primitives the language handle) to be used.

On creation, an input program and the language definition must be provided. Particular version of the lexer can be created by subclasses.

Type Parameters

Hierarchy (view full)

Implements

Constructors

Accessors

  • get langWords(): Words
  • The Words used to access the language elements.

    Returns Words

  • get languageMods(): OptionsTable
  • The language options read during the source processing.

    Returns OptionsTable

  • get pendingAttributes(): OptionsTable
  • Gets the attributes read since they were last gotten, and resets the pending ones, as they are none pending.

    Returns OptionsTable

  • get warnings(): LexerGWarning[]
  • A getter for the warnings produced during the source processing.

    Returns LexerGWarning[]

Methods

  • Indicates if there are more proper tokens to be read.

    Returns boolean

  • Indicates if there are more tokens to be read.

    Returns boolean

  • It returns the next proper token in the source, reading characters from it, and skipping them in the source, leaving the source ready to read the next token. If the source has no more characters, it fails.

    PRECONDITION: this.hasNextGToken()

    Returns Token

    Throws

    GErrors.NoMoreInputErrorIn is there is no GToken next

  • It returns the next token in the source, reading characters from it, and skipping them, leaving the source ready to read the next token. If the source has no more characters, it fails.

    A token includes both proper tokens and filler tokens (usually whitespaces, comments, and pragmas). Filler tokens (fillers) are ignored by the parser, but this function provides all, in order to keep all characters from the input grouped into tokens.

    Filler tokens that have some usefulness are properly processed (pragmas are evaluated, and their effects carried on -- whether modifying the lexer behavior or adding attributes --, comments are added as attributes).

    By using this function other tools that need not to ignore filler tokens can be built. The parser should use nextGToken, that skips whitespaces and comments, and processes pragmas.

    PRECONDITION: this.hasNextToken()

    Returns Token

    Throws

    GErrors.NoMoreInputErrorIn is there is no Token next

  • Modifies the input to the given one. It resets the lexer to start the input at the beginning.

    Parameters

    • input: SourceInput

      with the new input to process.

      PRECONDITION: there is at least one input document.

    Returns void

    Throws

    GErrors.NoInputError if there are no documents in the input.

Implementation: Auxiliaries -- Pragma processing

  • Private

    Evaluates a language pragma, passing it as an attribute. If the language pragma is not known, it is ignored, and a warning is generated.

    Parameters

    • span: Span
    • value: string

    Returns void

  • Private

    Evaluates the given pragma, carrying its effect.

    Triggers a warning if the pragma is not one of those defined by the language.

    Parameters

    • span: Span
    • name: string
    • args: string[]

    Returns void

  • Private

    Verify that the option selected for the pragma is the same as the one in the WordsDef, failing with an error if that is not the case.

    PRECONDITION:

    • the selectedOption is the same as the baseOption

    Parameters

    • optionPragma: string
    • selectedOption: string
    • baseOption: string
    • span: Span

    Returns void

    Throws

    GErrors.WrongPragmaOptionError when the selected option is not the same as the base one

Implementation: Auxiliaries -- Processing part of tokens

  • Private

    Read an escape char from the _source if it is possible, or returns undefined if it is not. Emits a warning if escape character is not complete, or invalid.

    PRECONDITION:

    • the _source is not at the end of input or the end of a string
    • the first character in the _source is an escape sigil char according to _langWords

    Parameters

    • start: SourcePosition
    • tokenStrs: string[]

    Returns string

  • Private

    Read a chars from the _source until either an escape char or a string delimiter sigil is found, or either the end of string is reached.

    PRECONDITION:

    • there is a token String reading in process

    Returns string

  • Private

    Validates if the number according to the rules of the language. Emits a warning if the number is not valid.

    Parameters

    • tokenStr: string
    • start: SourcePosition
    • end: SourcePosition

    Returns void

Implementation: Auxiliaries -- Processing words

  • Private

    Read the end of file in the source. It just returns the EOD token, and advance the source.

    PRECONDITION: the source is not at the end of input, and it is at the end of a document

    Returns Token

  • Private

    Read a maximal group of regular chars at the beginning of the source, advancing it, producing the corresponding token. The token produced may be:

    • a keyword
    • a language primitive
    • a regular identifier, either upper or lower depending on the concrete token read and the concrete definition of the language given for the lexer instance.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a regular char different from a digit char

    Returns Token

  • Private

    Read a line comment at the beginning of the source, advancing it, producing the corresponding token. Add the comment to the pending attributes, to offer the parser the possibility to add it to the next token.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a line comment opener sigil

    Returns Token

  • Private

    Read a maximal group of digit chars at the beginning of the source, advancing it, producing the corresponding token. It also triggers a warning if the digits do not form a proper number token according to the rules.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a digit char

    Returns Token

  • Private

    Read a paragraph comment at the beginning of the source, advancing it, producing the corresponding token. Add the comment to the pending attributes, to offer the parser the possibility to add it to the next token.

    It also triggers a warning if the comment reaches the end of file without closing.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a paragraph comment opener sigil

    Returns Token

  • Private

    Read a pragma at the beginning of the source, advancing it, producing the corresponding token.

    Pragmas are evaluated, and their effects carried on -- whether modifying the lexer behavior or adding attributes.

    It also triggers a warning if the pragma is malformed.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a pragma opener sigil

    Returns Token

  • Private

    Read a string at the beginning of the source, advancing it, producing the corresponding token. Strings are munchers surrounded by string delimiter char, and the may contain any character different of that delimiter. In order to be able to use the string delimiter char inside the string, strings may contain escaped characters. The sigil to specify the occurrence of a escaped char, and the exact list of permitted escaped chars (including the string delimiter) are specified by the argument GBSWordsDef given on Lexer creation.

    Emits a warning if the string reaches the end of file without proper closing. Also emits a warning if there is an incomplete escaped char, or an invalid one.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a string delimiter char

    Returns Token

  • Private

    Read a maximal group of punctuation chars at the beginning of the source, advancing it, producing the corresponding token (it takes in consideration symbolic keywords).

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a punctuation char, different from a string delimiter char

    Returns Token

  • Private

    Read a maximal group of whitespaces at the beginning of the source, advancing it, and producing the corresponding token.

    PRECONDITIONS:

    • the source is not at the end of input
    • the source starts with a whitespace char

    Returns Token

  • Private

    Read the next word (whitespace, pragma, comment, symbol, number, or identifier), producing a token.

    Filler tokens that have some usefulness are properly processed (pragmas are evaluated, and their effects carried on -- whether modifying the lexer behavior or adding attributes --, comments are added as attributes).

    It triggers warnings for some tokens (pragmas, comments, and numbers) if they are ill-formed.

    PRECONDITION: the source is not at the end of input

    Returns Token

Implementation: Auxiliaries -- Side information

  • Private

    Adds a new attribute to the list of pending attributes, adding the new value to the previous.

    Parameters

    • key: string
    • args: string[]

    Returns void

  • Private

    A convenient abbreviation to produce warnings.

    Parameters

    Returns void

  • Private

    Adds a new option to the list of language options, replacing the old value if it exists.

    Parameters

    • key: string
    • args: string[]

    Returns void

Implementation: Properties

_langWords: Words

The Words object managing the definition of words for the Gobstones language to use for recognition of characters, symbols, and identifiers.

_languageMods: OptionsTable

The language options read from the source.

_pendingAttributes: OptionsTable

The attributes read from the source since the last time they were consulted.

_source: SourceReader

The [SourceReader] (https://gobstones.github.io/gobstones-core/modules/SourceReader.html), to use as source of tokens.

_warnings: LexerGWarning[]

Warnings generated by the reading process.