yex.parse.Parser

`yex.parse.Parser(source, bounded=Bounding.NO, level=RunLevel.EXECUTING, on_eof=OnEof.NONE, no_outer=False)` #

Interprets a TeΧ file, and expands its macros.

Takes a source, and iterates over it, returning the tokens with the macros expanded according to the definitions stored in the Document attached to that source.

By default, Parser will keep returning None forever, which is what you want if you're planning to do lookahead. If you're going to put this Parser into a for loop, you'll want to set on_eof=OnEof.EXHAUST.

It's fine to attach another Parser to the same source, and to run it even when this one is active.

Attributes:

Name	Type	Description
`source`	`typing.Union[yex.parse.Tokeniser, typing.TextIO, typing.List, str]`	the source
`doc`	`yex.Document`	the document we're helping create.
`bounded`	`yex.parse.parser.Bounding`	how far to run an Expander before we stop. If this is "balanced" or "single", it requires `on_eof="exhaust"`..
`level`	`yex.parse.parser.RunLevel`	the level to run at; see the documentation for RunLevel for further information. Default is RunLevel.EXECUTING.
`on_eof`	`yex.parse.parser.OnEof`	what to do if we reach the end of the file.
`no_outer`	`bool`	if True, attempting to call a macro which was defined as "outer" will cause an error. Defaults to False.
`location`	`typing.Union[yex.parse.Location, None]`	the current position of this expander, or None if we're not tracking a position.
`delegate`	`typing.Union[yex.parse.Expander, None]`	if this is not `None`, then when `next()` is called, it will return the next value from this Expander. When the Expander is exhausted, the field will be reset to None. The delegate should have `on_eof=OnEof.EXHAUST` unless you're into heavy wizardry and pain.
`running`	`bool`	True if we're still running; False if we've reached the end of the part we're looking at.
`is_expanding`	`bool`	whether this Expander is currently expanding tokens. If the runlevel is below EXPANDING, we are never expanding. If it's EXPANDING or higher, then we are expanding iff we are not forbidden to expand by a conditional. For example, even if level was EXPANDING, we wouldn't be expanding straight after `\iffalse`.

Source code in yex/parse/parser.py

def __init__(self,
             source,
             bounded = Bounding.NO,
             level = RunLevel.EXECUTING,
             on_eof = OnEof.NONE,
             no_outer = False,
             ):

    self.bounded = Bounding.normalise(bounded)
    self.on_eof  = OnEof.normalise(on_eof)
    self.level   = RunLevel.normalise(level)
    self.running = True

    if self.bounded in (Bounding.SINGLE, Bounding.BALANCED) and self.on_eof!=OnEof.EXHAUST:
        raise ValueError(
                'if bounded is "single" or "balanced", on_eof must be "exhaust"')

    self.no_outer       = no_outer

    self._bounded_limit = None
    self._delegate      = None

    if not hasattr(source, 'doc'):
        raise TypeError(
                "source must be something which can supply a Document, "
                f"such as a Tokeniser. You gave {source}, "
                f"which is a {type(source)}.\n\n"
                "You might like to look into using doc.open()."
                )
    self.source = source

    # For convenience, we allow direct access to some of
    # Tokeniser's methods.
    for name in [
            'eat_optional_char',
            'optional_string',
            'error_position',
            'exhaust_at_eol',
            ]:
        setattr(self, name, getattr(self.source, name))

    position_logger.source = self.source.source

    logger.debug("%s: ready; called from %s",
            self,
            yex.util.show_caller,
            )

`SPIN_LIMIT = 1000` `class-attribute` `instance-attribute` #

Maximum number of times we can allow a parser to return None before we give up on it.

`another(subclass=None, preserve_step_bounding=False, **kwargs)` #

Returns a parser like this one, with given changes to its behaviour.

The result will be a parser on the same Tokeniser. If there are no changes requested, or if the changes requested make no difference, the result will be this same Parser; otherwise it will be a new Parser.

Any setting specified in kwargs will be honoured, with the exception of bounded -- see below about that. All other settings will be copied from this Parser.

How bounded works: - If bounded is specified in kwargs, the new parser will have the specified value. - Otherwise, if preserve_step_bounding is True, and self.bounded=="step", the new parser will also have bounded="step". - Otherwise, the new parser will always have bounded="no".

Consider

This might be better suited to a factory method, "from_another", to produce an instance of the class it's called on.

Source code in yex/parse/parser.py

def another(self,
            subclass = None,
            preserve_step_bounding = False,
            **kwargs: Unpack[ParserArgs],
            ) -> Self:
    """
    Returns a parser like this one, with given changes to its behaviour.

    The result will be a parser on the same Tokeniser.
    If there are no changes requested, or if the changes requested
    make no difference, the result will be this same Parser;
    otherwise it will be a new Parser.

    Any setting specified in `kwargs` will be honoured,
    with the exception of `bounded` -- see below about that.
    All other settings will be copied from this Parser.

    How `bounded` works:
        - If `bounded` is specified in kwargs, the new parser
          will have the specified value.
        - Otherwise, if preserve_step_bounding is True, and
          `self.bounded=="step"`, the new parser will also
          have `bounded="step"`.
        - Otherwise, the new parser will always have `bounded="no"`.

    Consider:
        This might be better suited to a factory method, "from_another",
        to produce an instance of the class it's called on.
    """

    our_params = {
            'source': self.source,
            'bounded': self.bounded,
            'level': self.level,
            'on_eof': self.on_eof,
            'no_outer': self.no_outer,
            }
    new_params = our_params | kwargs
    if 'bounded' not in kwargs:
        if self.bounded==Bounding.STEP and preserve_step_bounding:
            pass
        else:
            new_params['bounded'] = Bounding.NO

    if subclass is None:
        subclass = self.__class__

    if not isinstance(new_params['source'], yex.parse.Tokeniser):
        new_params['source'] = yex.parse.Tokeniser(
                doc = self.doc,
                source = yex.parse.Source.from_value(
                    v=new_params['source'],
                    ),
                )

    if our_params==new_params and subclass==self.__class__:
        result = self
    else:
        logger.debug(
                ("%s: spawning a parser with changes: %s; "
                "called from %s"),
                self,
                kwargs,
                yex.util.show_caller,
                )
        if subclass!=self.__class__:
            logger.debug('  -- parent is %s, but child is %s',
            self.__class__, subclass)
        result = subclass(**new_params)

    return result

`eat_optional_spaces(level=RunLevel.DEEP)` #

Eats zero or more space tokens.

This is like Tokeniser.eat_optional_spaces(), except that it can also execute controls and active characters, then continue to consider the result.

Returns a list of the Tokens consumed.

Parameters:

Name	Type	Description	Default
`level`	`yex.parse.parser.RunLevel`	the runlevel to run at.	`yex.parse.parser.RunLevel.DEEP`

Source code in yex/parse/parser.py

def eat_optional_spaces(self,
                        level:RunLevel=RunLevel.DEEP,
                        ) -> List[Token]:
    """
    Eats zero or more space tokens.

    This is like Tokeniser.eat_optional_spaces(), except that it can
    also execute controls and active characters, then continue to
    consider the result.

    Returns a list of the Tokens consumed.

    Args:
        level: the runlevel to run at.
    """
    level = RunLevel.normalise(level)

    if level==RunLevel.DEEP:
        return self.source.eat_optional_spaces()

    result = []
    while True:
        result.extend(self.source.eat_optional_spaces())

        t = self.next(level=RunLevel.QUERYING, on_eof=OnEof.NONE)

        if t is None:
            return result
        elif isinstance(t, Token) and t.ch in string.whitespace:
            result.append(t.ch)
        elif isinstance(t, str) and t in string.whitespace:
            result.append(t)
        else:
            self.push(t)
            return result

`end()` #

Marks this Parser as finished.

Source code in yex/parse/parser.py

def end(self) -> None:
    """
    Marks this Parser as finished.
    """
    logger.debug(r'%s: we have reached an \end', self)
    self.source.pushback.clear()
    self.running = False

`get_digit_sequence(accept_ch, accept_decimal_point)` #

Reads and returns a series of symbols.

The result is taken from the next zero or more items. They are accepted if:

they are LETTER or OTHER tokens, and their "ch" property is in accept_ch; or
they are single-character strings, and they are in accept_ch.

This exists because if we read in the indexes of arrays using any other method, we risk \catcodeNN= affecting the way the symbol after the value which is assigned to \catcodeNN. See test_tokeniser_whitespace_after_control_words().

Tokens are represented in the result by their ch property. Strings are used directly.

Parameters:

Name	Type	Description	Default
`accept_ch`	`str`	the characters we can accept	required
`accept_decimal_point`	`bool`	if `True`, act as though `'.,'` were included in accept_ch, except that they can only be matched once.	required

Returns:

Source code in yex/parse/parser.py

def get_digit_sequence(self,
                       accept_ch:str,
                       accept_decimal_point:bool,
                       ) -> str:
    r"""
    Reads and returns a series of symbols.

    The result is taken from the next zero or more items.
    They are accepted if:

    - they are LETTER or OTHER tokens, and their "ch" property is
            in `accept_ch`; or
    - they are single-character strings, and they are in `accept_ch`.

    This exists because if we read in the indexes of arrays using
    any other method, we risk `\catcodeNN=` affecting the way the symbol
    *after* the value which is assigned to `\catcode`NN.
    See `test_tokeniser_whitespace_after_control_words()`.

    Tokens are represented in the result by their `ch` property.
    Strings are used directly.

    Args:
        accept_ch: the characters we can accept
        accept_decimal_point: if `True`, act as though `'.,'` were
            included in accept_ch, except that they can only
            be matched once.

    Returns:
    """

    DECIMAL_POINTS = '.,'
    original_accept_ch = accept_ch

    if accept_decimal_point:
        accept_ch += DECIMAL_POINTS

    logger.debug("%s: get_digit_sequence begins; accepting %s",
            self, accept_ch)

    result = ''
    exp = self.another(on_eof=OnEof.NONE, level=RunLevel.EXPANDING)

    while True:
        item = exp.next()

        if isinstance(item, (Letter, Other)) and item.ch in accept_ch:
            addendum = item.ch
            logger.debug("%s:   -- accepted token, so: %s", self, repr(result))
        elif (isinstance(item, str) and
                len(item)==1 and
                item in accept_ch):
            addendum = item
            logger.debug("%s:   -- accepted char, so: %s", self, repr(result))
        else:
            if isinstance(item, Space):
                logger.debug("%s:   -- ending on %s, so result is: %s",
                        self, repr(item), repr(result))
            else:
                logger.debug((
                    "%s:   -- ending on %s (will push), "
                    "so result is: %s"),
                             self, repr(item), repr(result))
                self.push(item)

            return result

        result += addendum
        if addendum in DECIMAL_POINTS:
            accept_ch = original_accept_ch

`next(**kwargs)` #

Returns the next item.

This is just like next() on an iterator, but with more options. (And indeed, our iterators are implemented in terms of this method.)

Args are as for another().

Raises:

Type	Description
`UnexpectedEOFError`	on unexpected end of file, or if `no_outer` finds the appropriate problem.

Source code in yex/parse/parser.py

def next(self,
        **kwargs,
        ) -> Any:
    r"""
    Returns the next item.

    This is just like next() on an iterator, but with more options.
    (And indeed, our iterators are implemented in terms of this method.)

    Args are as for another().

    Raises:
        UnexpectedEOFError: on unexpected end of file, or if
            `no_outer` finds the appropriate problem.
    """

    source = self._source_for_next.another(
            preserve_step_bounding = True,
            **kwargs)

    if source.level==RunLevel.DEEP:
        result = source._next_at_deep()
    elif source.level in [RunLevel.READING, RunLevel.EXPANDING]:
        result = source._next_at_reading_or_expanding()
    elif source.level in [RunLevel.EXECUTING, RunLevel.QUERYING]:
        result = source._next_at_executing_or_querying()
    else:
        assert False, f'unknown runlevel: {source.level}'

    assert (
            source.level<RunLevel.EXPANDING or
            not isinstance(result, yex.keyword.Array)), (
                    "next() was passed an Array; it should have "
                    "already been dereferenced to a Register."
                    )

    logger.debug("%s:     -- found %s",
            self, result)

    if self.bounded==Bounding.STEP:
        pass
    elif self.bounded!=Bounding.NO and self._bounded_limit is None:
        # This must be the first next() since we started.
        # Let's see whether we've been given a single item.

        if isinstance(result, BeginningGroup):
            # we need to read a balanced pair.
            self._bounded_limit = self.source.pushback.group_depth

            logger.debug(
                    "%s:        -- opens bounded expansion, read again",
                    self)
            result = self.next()
        elif self.bounded=='balanced':
            # First result wasn't a BeginningGroup,
            # but it should have been.
            raise yex.exception.NeededBalancedGroupError(
                    problem=result)
        else:
            # First result wasn't a BeginningGroup,
            # so we handle it and then stop.
            logger.debug("%s:  -- the only symbol in a bounded expansion",
                    self)
            self.running = False

    if self._bounded_limit is not None:
        if self.source.pushback.group_depth < self._bounded_limit:
            logger.debug(
                    ('%s: end of bounded expansion: group depth is %s, '
                    'which is below the starting limit, %s'
                        ),
                    self, self.source.pushback.group_depth,
                    self._bounded_limit,
                    )
            self.running = False
            result = None

    if result is None:

        if self._delegate is not None:
            logger.debug(
                    ('%s: delegate %s is all done; '
                    'carrying on with our own stuff'),
                    self, self._delegate,
                    )
            self._delegate = None
            return self.next(**kwargs)

        elif source.bounded==Bounding.STEP:
            return None

        elif source.on_eof==OnEof.RAISE:
            logger.debug("%s: unexpected EOF", self)
            raise yex.exception.UnexpectedEOFError()

        elif source.on_eof==OnEof.EXHAUST:
            raise StopIteration

    return result

`peek()` #

Returns the item which is next due to be returned by next(). If this would go past the end of the file, we return None, whatever the setting of on_eof.

Source code in yex/parse/parser.py

def peek(self) -> Any:
    """
    Returns the item which is next due to be returned by `next()`.
    If this would go past the end of the file, we return `None`,
    whatever the setting of `on_eof`.
    """
    result = self.next(
            on_eof = OnEof.NONE,
            )
    self.source.pushback.push(result)
    return result

`push(thing, clean_char_tokens=False, is_result=False)` #

Pushes back a token, a character, or anything else.

This is mostly just a wrapper for the push method in Tokeniser. But we do check for "beginning group" and "ending group" tokens, and adjust our fields accordingly.

All Parsers share pushback, and in general it's fine to push things through a parser when you received them from a different Parser. The only exception to this is when you're using balanced expansion: because we have to keep a count of balanced braces, you should remember to push Tokens back through the Parser that gave you them.

If you push bare characters, they will be converted by the source as it thinks appropriate.

Parameters:

Name	Type	Description	Default
`thing`	`yex.parse.tokeniser.Any`	whatever you're pushing back. Pushing None will be ignored. If this is a string, or a list specifically, it will be split into its members and pushed in reverse order. For example, pushing 'cat' is the same as pushing 't', then pushing 'a', then pushing 'c'.	required
`clean_char_tokens`	`bool`	if True, all bare characters will be converted to the Tokens for those characters.s (For example, 'T', 'e', 'X' -> ('T' 12) ('e' 12) ('X' 12).) The rules about how this is done are on p213 of the TeΧbook. If False, the characters will remain bare characters and the source will tokenise them as usual when it gets to them.	`False`
`is_result`	`bool`	If you're a control, and your job involves reading some data, then pushing a result, set this to True when you push the result. This will allow \expandafter to work correctly. If you're implemented through a decorator, and your result is pushed via returning it, you don't have to worry: the decorator will set is_result=True when it pushes your return values.	`False`

Raises:

Type	Description
`EOFError`	if this parser is exhausted.
`GoneBeforeTheBeginningError`	if we're bounded, and you push more BEGINNING_GROUP tokens than you've already received.

Source code in yex/parse/parser.py

def push(self,
         thing: Any,
         clean_char_tokens: bool = False,
         is_result:bool = False,
        ):
    r"""
    Pushes back a token, a character, or anything else.

    This is mostly just a wrapper for the `push` method in
    `Tokeniser`. But we do check for "beginning group"
    and "ending group" tokens, and adjust our fields accordingly.

    All Parsers share pushback, and in general it's fine to push
    things through a parser when you received them from a
    different Parser. The only exception to this is when
    you're using balanced expansion: because we have to keep a count of
    balanced braces, you should remember to push Tokens back
    through the Parser that gave you them.

    If you push bare characters, they will be converted by the
    source as it thinks appropriate.

    Args:
        thing: whatever you're pushing back.
            Pushing None will be ignored.
            If this is a string, or a list specifically, it
            will be split into its members and pushed in reverse order.
            For example, pushing 'cat' is the same as pushing 't',
            then pushing 'a', then pushing 'c'.

        clean_char_tokens: if True, all bare characters
            will be converted to the Tokens for those characters.s
            (For example, 'T', 'e', 'X' -> ('T' 12) ('e' 12) ('X' 12).)
            The rules about how this is done are on p213 of the TeXbook.
            If False, the characters will remain bare characters
            and the source will tokenise them as usual when it
            gets to them.

        is_result: If you're a control, and your job involves
            reading some data, then pushing a result, set this to True
            when you push the result. This will allow \expandafter
            to work correctly.

            If you're implemented through a decorator, and your result
            is pushed via returning it, you don't have to worry:
            the decorator will set is_result=True when it pushes your
            return values.

    Raises:
        EOFError: if this parser is exhausted.
        GoneBeforeTheBeginningError: if we're bounded, and you push more
            BEGINNING_GROUP tokens than you've already received.
    """


    if not self.running:
        raise EOFError()

    if not isinstance(thing, (str, list)):
        thing = [thing]

    if clean_char_tokens:

        def _clean(c):
            if isinstance(c, str):
                return Token.get(
                        ch=c,
                        location=self.source.location,
                        )
            else:
                return c

        thing = [_clean(c) for c in thing]

    self.source.pushback.push(thing)

    if self._bounded_limit is not None:
        if self.source.pushback.group_depth < self._bounded_limit:
            logger.debug(
                    '%s: group_depth is %d, but bounded_limit is %d',
                    self, self.pushback.group_depth,
                    self._bounded_limit)
            raise yex.exception.GoneBeforeTheBeginningError()

yex.parse.Parser

yex.parse.Parser(source, bounded=Bounding.NO, level=RunLevel.EXECUTING, on_eof=OnEof.NONE, no_outer=False) #

SPIN_LIMIT = 1000 class-attribute instance-attribute #

another(subclass=None, preserve_step_bounding=False, **kwargs) #

eat_optional_spaces(level=RunLevel.DEEP) #

end() #

get_digit_sequence(accept_ch, accept_decimal_point) #

next(**kwargs) #

peek() #

push(thing, clean_char_tokens=False, is_result=False) #

`yex.parse.Parser(source, bounded=Bounding.NO, level=RunLevel.EXECUTING, on_eof=OnEof.NONE, no_outer=False)` #

`SPIN_LIMIT = 1000` `class-attribute` `instance-attribute` #

`another(subclass=None, preserve_step_bounding=False, **kwargs)` #

`eat_optional_spaces(level=RunLevel.DEEP)` #

`end()` #

`get_digit_sequence(accept_ch, accept_decimal_point)` #

`next(**kwargs)` #

`peek()` #

`push(thing, clean_char_tokens=False, is_result=False)` #