Go to repository

Class SourceReader

A SourceReader allows you to read input from some source, either one single document of content or several named or index source documents, in such a way that each character read registers its position in the source as a tuple index-line-column. That is, the main problem it solves is that of calculating the position of each character read by taking into account characters indicating the end-of-line. It also solves the problem of input divided among several documents, as it is usually the case with source code, and it provides a couple of additional features:

  • to use some parts of the input as extra annotations by marking them as non visible, so the input can be read as if the annotations were not there, and
  • to allow the relationship of parts of the input with identifiers naming "regions", thus making it possible for external tools to identify those parts with ease.

A SourceReader is created using a SourceInput and then SourcePositions can be read from it. Possible interactions with a SourceReader include:

  • peek a character, with peek,
  • check if a given strings occurs at the beginning of the text in the current document, without skipping it, with startsWith,
  • get the current position as a SourcePosition, with getPosition,
  • detect if the end of input was reached, with atEndOfInput,
  • detect if the end of the current document was reached, with atEndOfDocument,
  • skip one or more characters, with skip,
  • read some characters from the current document based on a condition, with takeWhile, and
  • manipulate "regions", with beginRegion and endRegion.

There also two global elements that can be accessed:

  • the array with the names of the documents composing the source, with documentsNames.
  • the characters used to indicate the ending of a line in a document, with lineEnders. When the SourceInput is composed of only one document or by several unnamed ones, the special prefix defaultDocumentNamePrefix is used to identify the documents (with a number suffix to differentiate them). This prefix can be accessed as well as a static property of the SourceReader.

When reading from sources with multiple documents of input, skipping moves inside a document until there are no more characters, then an end of document position is reached (a special position just after the last character of that document), and then a new document is started. Regions are reset at the beginning of each document. When the last document has been processed, and the last end of document has been skipped, the end of input is reached (a special position just after all the documents).

A SourceReader also has a special position, UnknownPosition, as a static member of the class, indicating that the position is not known.

Characters from the input are classified as either visible or non visible. Visible characters affect the line and column calculation, and, conversely, non visible characters do not. Characters are marked as visible by skipping over them normally; characters are marked as non visible by silently skip over them. Visibility of the input affect the information that positions may provide. When skipping characters, the EndOfDocument position must be skipped, although there is no character at that position, and thus, cannot be peeked. This position cannot be skipped as non visible, as every input document is known by the user.

Regarding regions, a "region" is some part of the input that has an ID (as a string). It is used in handling automatically generated code. A typical use is to identify parts of code generated by some external tool, in such a way as to link that part with the element generating it through region IDs. Regions are supposed to be nested, so a stack is used, but no check is made on their balance, being the user responsible for the correct pushing and popping of regions. When skipping moves from one source document to the next, regions are reset, as regions are not supposed to cross different documents of the input.

Example

This is a very basic example using all basic operations. A more complex program will use functions to organize the access with a logical structure, and it will also consider different inputs in the source. Just use this example to understand the behavior of operations -- common usage do NOT follow this structure.

 let pos: SourcePosition;
let str: string;
const reader = new SourceReader('program { Poner(Verde) }', '\n');
// ---------------------------------
// Read a basic Gobstones program
if (reader.startsWith('program')) { // ~~> true
pos = reader.getPosition(); // ~~> (1,1) as a SourcePosition, with no regions
// ---------------------------------
// Skip over the first token
reader.skip('program'); // Move 7 chars forward
// ---------------------------------
// Skip whitespaces between tokens
while (reader.startsWith(' ')) // ~~> true 1 time
{ reader.skip(); } // Move 1 char forward (' ')
// ---------------------------------
// Detect block start
if (!reader.startsWith('{')) // ~~> false (function returns true)
{ fail('Block expected'); }
reader.beginRegion('program-body'); // Push 'program-body' to the region stack
str = '';
// ---------------------------------
// Read block body (includes '{')
// NOTE: CANNOT use !startsWith('}') instead because
// !atEndOfDocument() is REQUIRED to guarantee precondition of peek()
while (!reader.atEndOfDocument() // false
&& reader.peek() !== '}') { document false 15 times
str += reader.peek(); // '{', ' ', 'P', ... 'd', 'e', ')', ' '
reader.skip(); // Move 15 times ahead
}
// ---------------------------------
// Detect block end
if (reader.atEndOfDocument()) // ~~> false
{ fail('Unclosed document'); }
// Add '}' to the body
str += reader.peek(); // ~~> '}'
pos = reader.getPosition(); // ~~> (1,24) as a SourcePosition,
// with region 'program-body'
reader.endRegion(); // Pop 'program-body' from the region stack
reader.skip(); // Move 1 char forward ('}')
// ---------------------------------
// Skip whitespaces at the end (none in this example)
while (reader.startsWith(' ')) // ~~> false
{ reader.skip(); } // NOT executed
// ---------------------------------
// Verify there are no more chars at input
if (!reader.atEndOfDocument()) // ~~> false (function returns true)
{ fail('Unexpected document chars after program'); }
reader.skip(); // Skips end of document,
// reaching next document or end of input
// ---------------------------------
// Verify there are no more input documents
if (!reader.atEndOfInput()) // ~~> false (function returns true)
{ fail('Unexpected additional inputs'); }
}

NOTE: as peek is partial, not working at the end of documents, each of its uses must be done after confirming that atEndOfDocument is false. For that reason document is better to use startsWith to verify if the input starts with some character (or string), when peeking for something specific.

The implementation of SourceReader keeps:

  • an object associating input document names to input document contents, _documents,
  • an object associating input document names to visible input document contents, _visibleDocumentContents,
  • an array of the keys of that object for sequential access, _documentsNames,
  • an index to the current input document in the array of inputs names, _documentIndex,
  • an index to the current visible input document in the array of inputs names (because it may be different from the document index), _charIndex,
  • the current line and column in the current input document, _line and _column,
  • a stack of strings representing the regions' IDs, _regions, and
  • the characters used to determine line ends, _lineEnders.

The object of _documents cannot be empty (with no input document), and all the SourceInput forms are converted to Record<string, string> for ease of access. The _charIndex either points to a valid position in an input document, or at the end of an input document, or the end of input was reached (that is, when there are no more input documents to read).

Line and column numbers are adjusted depending on which characters are considered as ending a line, as given by the property _lineEnders, and which characters are considered visible, as indicating by the user through skip. When changing from one document to the next, line and column numbers are reset.

The visible input is conformed by those characters of the input that has been skipped normally. As visible and non visible characters can be interleaved with no restrictions, it is better to keep a copy of the visible parts: characters are copied to the visible inputs attribute when skipped normally. Visible inputs always have a copy of those characters that have been processed as visible; unprocessed characters do not appear (yet) on visible inputs.

This class is tightly coupled with SourcePosition's implementations, because of instances of that class represent different positions in the source inputs kept by a SourceReader. The operations _documentNameAt, _visibleDocumentContentsAt, _fullDocumentContentsAt, _inputFromToIn, _fullInputFromTo and _fullDocumentContentsAt are meant to be used only by SourcePosition, to complete their operations, and so they are grouped as Protected.

The remaining auxiliary operations are meant for internal usage, to provide readability or to avoid code duplication. The auxiliary operation _cloneRegions is needed because each new position produced with getPosition need to have a snapshot of the region stack, and not a mutable reference.

Constructors

A new SourceReader is created from the given input. It starts in the first position of the first input document (if it is empty, starts in the end of document position of that document). Line enders must be provided, affecting the calculation of line and column for positions. If there are no line enders, all documents in the source input are assumed as having only one line.

PRECONDITION: there is at least one input document.

Parameters

  • input: SourceInput

    The source input. See SourceInput for explanation and examples of how to understand this parameter.

  • lineEnders: string = '\n'

    A string of which characters will be used to determine the end of a line.

Returns SourceReader

SourceReader/Errors.NoInputError if the arguments are undefined or has no documents.

Properties

Private
_charIndex: number

The current char index in the current input document.

INVARIANT:

  • if _documentIndex < _documentsNames.length then 0 <= _charIndex < _documents[_documentsNames[_documentIndex]].length
Private
_column: number

The current column number in the current input document.

INVARIANTS:

  • 0 <= _column
  • if _documentIndex < _documentsNames.length then _column < _documents[_documentsNames[_documentIndex]].length
Private
_documentIndex: number

The current input index. The current input is that in _documents[_documentsNames[_documentIndex]] when _documentIndex < _documentsNames.length.

INVARIANT: 0 <= _documentIndex <= _documentsNames.length

Private
_documents: Record<string, string>

The actual input, converted to a Record of document names to document contents.

INVARIANT: it is always and object (not a string).

Private
_line: number

The current line number in the current input document.

INVARIANTS:

  • 0 <= _line
  • if _documentIndex < _documentsNames.length then _line < _documents[_documentsNames[_documentIndex]].length
Private
_regions: string[]

The active regions in the current input document.

Private
_visibleDocumentContents: Record<string, string>

A copy of the visible parts of the input documents. A part is visible if it has been skipped, and that skip was not silent (see skip).

INVARIANTS:

  • it has the same keys as _documents
  • the values of each key are contained in the values of the corresponding key at _documents
Readonly
documentsNames: string[]

The names with which input documents are identified.

Readonly
lineEnders: string

The characters used to indicate the end of a line. These characters affect the calculation of line and column numbers for positions.

Function: Auxiliaries

Private

Gives a clone of the stack of regions. Auxiliary for /SourcePositions.SourceReader.getPosition | getPosition. It is necessary because regions of SourcePosition must correspond to those at that position and do not change with changes in reader state.

Returns string[]

Private

Answers if there are more chars in the current document.

PRECONDITION: this._hasMoreDocuments()

Returns boolean

Private

Answers if there are more input documents to be read.

Returns boolean

Private

Answers if the given char is recognized as an end of line indicator, according to the configuration of the reader.

Parameters

  • ch: string

Returns boolean

Private

Skips one char at the input.

If the skipping is silent, line and column do not change, usually because the input being read was added automatically to the original input (the default is not silent). If the skip is not silent, the input is visible, and thus it is added to the visible inputs. Skip cannot be silent on the EndOfDocument, so at EndOfDocument silent flag is ignored.

Its used by API operations to skip one or more characters.

PRECONDITION: !this.atEndOfInput() (not verified)

Parameters

  • silently: boolean

    A boolean indicating if the skip must be silent.

Returns void

Private

Skips the input positioning the reader at the start of the next document.

PRECONDITION: !this.atEndOfInput() && this.atEndOfDocument()

Returns void

Functions: Access

Answers if there are no more characters to read from the current document.

Returns boolean

Answers if there are no more characters to read from the input.

Returns boolean

Answers the current document name.

PRECONDITION: !this.atEndOfInput()

Returns string

SourceReader/Errors.InvalidOperationAtEOIError if the source reader is at EndOfDocument in the current position.

Gives the current position as a SourcePosition. See SourceReader documentation for an example.

NOTE: the special positions at the end of each input document, and at the end of the input can be accessed by /SourcePositions.SourceReader.getPosition, but they cannot be peeked.

Returns SourcePosition

Gives the current char of the current input document. See SourceReader for an example.

PRECONDITION: !this.atEndOfInput() && !this.atEndOfDocument

Returns string

SourceReader/Errors.InvalidOperationAtEODError if the source reader is at EndOfInput in the current position.

SourceReader/Errors.InvalidOperationAtEOIError if the source reader is at EndOfDocument in the current position.

Answers if the current input document at the current char starts with the given string. It does not split the given string across different input documents -- that is, only the current input document is checked. See SourceReader documentation for an example.

Parameters

  • str: string

    The string to verify the current input, starting at the current char.

Returns boolean

Functions: Auxiliaries

Gives the contents of either the full or visible input between two positions, depending on the visible argument. If from is not before to, the result is the empty string.

PRECONDITIONS:

  • both positions correspond to this reader (and so are >= 0 -- not verified)

Parameters

  • inputFrom: number
  • charFrom: number
  • inputTo: number
  • charTo: number
  • visible: boolean

Returns string

Functions: Modification

Pushes a region in the stack of regions. It does not work at the EndOfInput or the EndOfDocument (it does nothing).

Parameters

  • regionId: string

Returns void

Pops a region from the stack of regions. It does nothing if there are no regions in the stack.

Returns void

Skips the given number of chars in the input, moving forward. It may skip documents, considering the end of document as a 'virtual' char.

If the argument is a string, only its length is used (i.e. its contents are ignored). Negative numbers do not skip (are equivalent to 0). At the end of each input document, an additional skip is needed to start the next input document. This behavior allows the user to be aware of the ending of documents. Regions are reset at the end of each documents (the regions stack is emptied).

If the skipping is silent, line and column do not change, usually because the input being read was added automatically to the original input (the default is not silent). If the skip is not silent, the input is visible, and thus it is added to the visible inputs. The end of each input document cannot be skipped silently, and thus for that particular position, silently is ignored.

See SourceReader for an example of visible skips.

Parameters

  • howMuch: string | number = 1

    An indication of how many characters have to be skipped. It may be given as a number or as a string. In this last case, the length of the string is used (the contents are ignored). If it is not given, it is assumed 1.

  • silently: boolean = false

    A boolean indicating if the skip must be silent. If it is not given, it is assumed false, that is, a visible skip. If the skip is visible, the char is added to the visible input.

Returns void

Skips a variable number of characters on the current string of the input, returning the characters skipped. All contiguous characters from the initial position satisfying the predicate are read. It guarantees that the first character after skipping, if it exists, does not satisfy the predicate. It does not go beyond the end of the current document, if starting inside one.

Parameters

  • contCondition: ((ch: string) => boolean)

    A predicate on strings, indicating the chars to read.

      • (ch): boolean
      • Parameters

        • ch: string

        Returns boolean

  • silently: boolean = false

    A boolean indicating if the reading must be silent. If it is not given, it is assumed false, that is, a visible read. If the read is visible, the char is added to the visible input.

Returns string

The string read from the initial position until the character that do not satisfy the condition or the end of the current string.

Functions: Querying

Returns the full context of the corresponding source document after the position, up to the beginning of the given number of lines, or the beginning of the document, whichever comes first.

The char at the given position is the first one in the solution.

Parameters

  • docIndex: number
  • charIndex: number
  • lines: number

Returns string[]

Returns the document context of the requested document before the requested character index, up to the requested character index, divided in lines.

If the requested number of lines is 0, only the line that the character index belongs to is returned.

If the requested lines is greater than 0, then as many lines before the one containing the character index are returned. If there are less lines than the amount requested, then all the lines are returned up to the one containing the character index.

The line containing the character index is always returned, from the start, up to the character index itself (NOT including the element in that index).

INVARIANTS: * The index of the document is valid (not checked) * The number of lines is not lower than 0 (not checked)

Parameters

  • docIndex: number
  • charIndex: number
  • lines: number

Returns string[]

Gives the name of the input document at the given index. It is intended to be used only by SourcePositions.

PRECONDITION: index <= this._documentsNames.length (not verified)

As it is a protected operation, it is not expectable to receive invalid indexes. It is not taken into account which are the results if that happens.

Parameters

  • index: number

Returns string

Gives the contents of the input document at the given index, both visible and non-visible. It is intended to be used only by SourcePositions.

PRECONDITION: index < this._documentsNames.length (not verified)

As it is a protected operation, it is not expectable to receive invalid indexes. It is not taken into account which are the results if that happens.

Parameters

  • index: number

Returns string

Returns the next character in the reader.

Returns string

Gives the contents of the visible input document at the given index. It is intended to be used only by SourcePositions.

PRECONDITION: index < this._documentsNames.length (not verified).

As it is a protected operation, it is not expectable to receive invalid indexes. It is not taken into account which are the results if that happens.

Parameters

  • index: number

Returns string

Properties (Static)

Static Readonly
defaultDocumentNamePrefix: string = 'doc'

The string to use as a name for unnamed input documents. It is intended to be used only by instances.

Static Readonly
UnknownPosition: SourcePosition = ...

A special position indicating that the position is not known.