Skip to content

yex.parse.Parser

yex.parse.Parser(source, bounded=Bounding.NO, level=RunLevel.EXECUTING, on_eof=OnEof.NONE, no_outer=False) #

Interprets a TeΧ file, and expands its macros.

Takes a source, and iterates over it, returning the tokens with the macros expanded according to the definitions stored in the Document attached to that source.

By default, Parser will keep returning None forever, which is what you want if you're planning to do lookahead. If you're going to put this Parser into a for loop, you'll want to set on_eof=OnEof.EXHAUST.

It's fine to attach another Parser to the same source, and to run it even when this one is active.

Attributes:

Name Type Description
source typing.Union[yex.parse.Tokeniser, typing.TextIO, typing.List, str]

the source

doc yex.Document

the document we're helping create.

bounded yex.parse.parser.Bounding

how far to run an Expander before we stop. If this is "balanced" or "single", it requires on_eof="exhaust"..

level yex.parse.parser.RunLevel

the level to run at; see the documentation for RunLevel for further information. Default is RunLevel.EXECUTING.

on_eof yex.parse.parser.OnEof

what to do if we reach the end of the file.

no_outer bool

if True, attempting to call a macro which was defined as "outer" will cause an error. Defaults to False.

location typing.Union[yex.parse.Location, None]

the current position of this expander, or None if we're not tracking a position.

delegate typing.Union[yex.parse.Expander, None]

if this is not None, then when next() is called, it will return the next value from this Expander. When the Expander is exhausted, the field will be reset to None. The delegate should have on_eof=OnEof.EXHAUST unless you're into heavy wizardry and pain.

running bool

True if we're still running; False if we've reached the end of the part we're looking at.

is_expanding bool

whether this Expander is currently expanding tokens.

If the runlevel is below EXPANDING, we are never expanding. If it's EXPANDING or higher, then we are expanding iff we are not forbidden to expand by a conditional.

For example, even if level was EXPANDING, we wouldn't be expanding straight after \iffalse.

Source code in yex/parse/parser.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
def __init__(self,
             source,
             bounded = Bounding.NO,
             level = RunLevel.EXECUTING,
             on_eof = OnEof.NONE,
             no_outer = False,
             ):

    self.bounded = Bounding.normalise(bounded)
    self.on_eof  = OnEof.normalise(on_eof)
    self.level   = RunLevel.normalise(level)
    self.running = True

    if self.bounded in (Bounding.SINGLE, Bounding.BALANCED) and self.on_eof!=OnEof.EXHAUST:
        raise ValueError(
                'if bounded is "single" or "balanced", on_eof must be "exhaust"')

    self.no_outer       = no_outer

    self._bounded_limit = None
    self._delegate      = None

    if not hasattr(source, 'doc'):
        raise TypeError(
                "source must be something which can supply a Document, "
                f"such as a Tokeniser. You gave {source}, "
                f"which is a {type(source)}.\n\n"
                "You might like to look into using doc.open()."
                )
    self.source = source

    # For convenience, we allow direct access to some of
    # Tokeniser's methods.
    for name in [
            'eat_optional_char',
            'optional_string',
            'error_position',
            'exhaust_at_eol',
            ]:
        setattr(self, name, getattr(self.source, name))

    position_logger.source = self.source.source

    logger.debug("%s: ready; called from %s",
            self,
            yex.util.show_caller,
            )

SPIN_LIMIT = 1000 class-attribute instance-attribute #

Maximum number of times we can allow a parser to return None before we give up on it.

another(subclass=None, preserve_step_bounding=False, **kwargs) #

Returns a parser like this one, with given changes to its behaviour.

The result will be a parser on the same Tokeniser. If there are no changes requested, or if the changes requested make no difference, the result will be this same Parser; otherwise it will be a new Parser.

Any setting specified in kwargs will be honoured, with the exception of bounded -- see below about that. All other settings will be copied from this Parser.

How bounded works: - If bounded is specified in kwargs, the new parser will have the specified value. - Otherwise, if preserve_step_bounding is True, and self.bounded=="step", the new parser will also have bounded="step". - Otherwise, the new parser will always have bounded="no".

Consider

This might be better suited to a factory method, "from_another", to produce an instance of the class it's called on.

Source code in yex/parse/parser.py
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
def another(self,
            subclass = None,
            preserve_step_bounding = False,
            **kwargs: Unpack[ParserArgs],
            ) -> Self:
    """
    Returns a parser like this one, with given changes to its behaviour.

    The result will be a parser on the same Tokeniser.
    If there are no changes requested, or if the changes requested
    make no difference, the result will be this same Parser;
    otherwise it will be a new Parser.

    Any setting specified in `kwargs` will be honoured,
    with the exception of `bounded` -- see below about that.
    All other settings will be copied from this Parser.

    How `bounded` works:
        - If `bounded` is specified in kwargs, the new parser
          will have the specified value.
        - Otherwise, if preserve_step_bounding is True, and
          `self.bounded=="step"`, the new parser will also
          have `bounded="step"`.
        - Otherwise, the new parser will always have `bounded="no"`.

    Consider:
        This might be better suited to a factory method, "from_another",
        to produce an instance of the class it's called on.
    """

    our_params = {
            'source': self.source,
            'bounded': self.bounded,
            'level': self.level,
            'on_eof': self.on_eof,
            'no_outer': self.no_outer,
            }
    new_params = our_params | kwargs
    if 'bounded' not in kwargs:
        if self.bounded==Bounding.STEP and preserve_step_bounding:
            pass
        else:
            new_params['bounded'] = Bounding.NO

    if subclass is None:
        subclass = self.__class__

    if not isinstance(new_params['source'], yex.parse.Tokeniser):
        new_params['source'] = yex.parse.Tokeniser(
                doc = self.doc,
                source = yex.parse.Source.from_value(
                    v=new_params['source'],
                    ),
                )

    if our_params==new_params and subclass==self.__class__:
        result = self
    else:
        logger.debug(
                ("%s: spawning a parser with changes: %s; "
                "called from %s"),
                self,
                kwargs,
                yex.util.show_caller,
                )
        if subclass!=self.__class__:
            logger.debug('  -- parent is %s, but child is %s',
            self.__class__, subclass)
        result = subclass(**new_params)

    return result

eat_optional_spaces(level=RunLevel.DEEP) #

Eats zero or more space tokens.

This is like Tokeniser.eat_optional_spaces(), except that it can also execute controls and active characters, then continue to consider the result.

Returns a list of the Tokens consumed.

Parameters:

Name Type Description Default
level yex.parse.parser.RunLevel

the runlevel to run at.

yex.parse.parser.RunLevel.DEEP
Source code in yex/parse/parser.py
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
def eat_optional_spaces(self,
                        level:RunLevel=RunLevel.DEEP,
                        ) -> List[Token]:
    """
    Eats zero or more space tokens.

    This is like Tokeniser.eat_optional_spaces(), except that it can
    also execute controls and active characters, then continue to
    consider the result.

    Returns a list of the Tokens consumed.

    Args:
        level: the runlevel to run at.
    """
    level = RunLevel.normalise(level)

    if level==RunLevel.DEEP:
        return self.source.eat_optional_spaces()

    result = []
    while True:
        result.extend(self.source.eat_optional_spaces())

        t = self.next(level=RunLevel.QUERYING, on_eof=OnEof.NONE)

        if t is None:
            return result
        elif isinstance(t, Token) and t.ch in string.whitespace:
            result.append(t.ch)
        elif isinstance(t, str) and t in string.whitespace:
            result.append(t)
        else:
            self.push(t)
            return result

end() #

Marks this Parser as finished.

Source code in yex/parse/parser.py
1074
1075
1076
1077
1078
1079
1080
def end(self) -> None:
    """
    Marks this Parser as finished.
    """
    logger.debug(r'%s: we have reached an \end', self)
    self.source.pushback.clear()
    self.running = False

get_digit_sequence(accept_ch, accept_decimal_point) #

Reads and returns a series of symbols.

The result is taken from the next zero or more items. They are accepted if:

  • they are LETTER or OTHER tokens, and their "ch" property is in accept_ch; or
  • they are single-character strings, and they are in accept_ch.

This exists because if we read in the indexes of arrays using any other method, we risk \catcodeNN= affecting the way the symbol after the value which is assigned to \catcodeNN. See test_tokeniser_whitespace_after_control_words().

Tokens are represented in the result by their ch property. Strings are used directly.

Parameters:

Name Type Description Default
accept_ch str

the characters we can accept

required
accept_decimal_point bool

if True, act as though '.,' were included in accept_ch, except that they can only be matched once.

required

Returns:

Source code in yex/parse/parser.py
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
def get_digit_sequence(self,
                       accept_ch:str,
                       accept_decimal_point:bool,
                       ) -> str:
    r"""
    Reads and returns a series of symbols.

    The result is taken from the next zero or more items.
    They are accepted if:

    - they are LETTER or OTHER tokens, and their "ch" property is
            in `accept_ch`; or
    - they are single-character strings, and they are in `accept_ch`.

    This exists because if we read in the indexes of arrays using
    any other method, we risk `\catcodeNN=` affecting the way the symbol
    *after* the value which is assigned to `\catcode`NN.
    See `test_tokeniser_whitespace_after_control_words()`.

    Tokens are represented in the result by their `ch` property.
    Strings are used directly.

    Args:
        accept_ch: the characters we can accept
        accept_decimal_point: if `True`, act as though `'.,'` were
            included in accept_ch, except that they can only
            be matched once.

    Returns:
    """

    DECIMAL_POINTS = '.,'
    original_accept_ch = accept_ch

    if accept_decimal_point:
        accept_ch += DECIMAL_POINTS

    logger.debug("%s: get_digit_sequence begins; accepting %s",
            self, accept_ch)

    result = ''
    exp = self.another(on_eof=OnEof.NONE, level=RunLevel.EXPANDING)

    while True:
        item = exp.next()

        if isinstance(item, (Letter, Other)) and item.ch in accept_ch:
            addendum = item.ch
            logger.debug("%s:   -- accepted token, so: %s", self, repr(result))
        elif (isinstance(item, str) and
                len(item)==1 and
                item in accept_ch):
            addendum = item
            logger.debug("%s:   -- accepted char, so: %s", self, repr(result))
        else:
            if isinstance(item, Space):
                logger.debug("%s:   -- ending on %s, so result is: %s",
                        self, repr(item), repr(result))
            else:
                logger.debug((
                    "%s:   -- ending on %s (will push), "
                    "so result is: %s"),
                             self, repr(item), repr(result))
                self.push(item)

            return result

        result += addendum
        if addendum in DECIMAL_POINTS:
            accept_ch = original_accept_ch

next(**kwargs) #

Returns the next item.

This is just like next() on an iterator, but with more options. (And indeed, our iterators are implemented in terms of this method.)

Args are as for another().

Raises:

Type Description
UnexpectedEOFError

on unexpected end of file, or if no_outer finds the appropriate problem.

Source code in yex/parse/parser.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
def next(self,
        **kwargs,
        ) -> Any:
    r"""
    Returns the next item.

    This is just like next() on an iterator, but with more options.
    (And indeed, our iterators are implemented in terms of this method.)

    Args are as for another().

    Raises:
        UnexpectedEOFError: on unexpected end of file, or if
            `no_outer` finds the appropriate problem.
    """

    source = self._source_for_next.another(
            preserve_step_bounding = True,
            **kwargs)

    if source.level==RunLevel.DEEP:
        result = source._next_at_deep()
    elif source.level in [RunLevel.READING, RunLevel.EXPANDING]:
        result = source._next_at_reading_or_expanding()
    elif source.level in [RunLevel.EXECUTING, RunLevel.QUERYING]:
        result = source._next_at_executing_or_querying()
    else:
        assert False, f'unknown runlevel: {source.level}'

    assert (
            source.level<RunLevel.EXPANDING or
            not isinstance(result, yex.keyword.Array)), (
                    "next() was passed an Array; it should have "
                    "already been dereferenced to a Register."
                    )

    logger.debug("%s:     -- found %s",
            self, result)

    if self.bounded==Bounding.STEP:
        pass
    elif self.bounded!=Bounding.NO and self._bounded_limit is None:
        # This must be the first next() since we started.
        # Let's see whether we've been given a single item.

        if isinstance(result, BeginningGroup):
            # we need to read a balanced pair.
            self._bounded_limit = self.source.pushback.group_depth

            logger.debug(
                    "%s:        -- opens bounded expansion, read again",
                    self)
            result = self.next()
        elif self.bounded=='balanced':
            # First result wasn't a BeginningGroup,
            # but it should have been.
            raise yex.exception.NeededBalancedGroupError(
                    problem=result)
        else:
            # First result wasn't a BeginningGroup,
            # so we handle it and then stop.
            logger.debug("%s:  -- the only symbol in a bounded expansion",
                    self)
            self.running = False

    if self._bounded_limit is not None:
        if self.source.pushback.group_depth < self._bounded_limit:
            logger.debug(
                    ('%s: end of bounded expansion: group depth is %s, '
                    'which is below the starting limit, %s'
                        ),
                    self, self.source.pushback.group_depth,
                    self._bounded_limit,
                    )
            self.running = False
            result = None

    if result is None:

        if self._delegate is not None:
            logger.debug(
                    ('%s: delegate %s is all done; '
                    'carrying on with our own stuff'),
                    self, self._delegate,
                    )
            self._delegate = None
            return self.next(**kwargs)

        elif source.bounded==Bounding.STEP:
            return None

        elif source.on_eof==OnEof.RAISE:
            logger.debug("%s: unexpected EOF", self)
            raise yex.exception.UnexpectedEOFError()

        elif source.on_eof==OnEof.EXHAUST:
            raise StopIteration

    return result

peek() #

Returns the item which is next due to be returned by next(). If this would go past the end of the file, we return None, whatever the setting of on_eof.

Source code in yex/parse/parser.py
830
831
832
833
834
835
836
837
838
839
840
def peek(self) -> Any:
    """
    Returns the item which is next due to be returned by `next()`.
    If this would go past the end of the file, we return `None`,
    whatever the setting of `on_eof`.
    """
    result = self.next(
            on_eof = OnEof.NONE,
            )
    self.source.pushback.push(result)
    return result

push(thing, clean_char_tokens=False, is_result=False) #

Pushes back a token, a character, or anything else.

This is mostly just a wrapper for the push method in Tokeniser. But we do check for "beginning group" and "ending group" tokens, and adjust our fields accordingly.

All Parsers share pushback, and in general it's fine to push things through a parser when you received them from a different Parser. The only exception to this is when you're using balanced expansion: because we have to keep a count of balanced braces, you should remember to push Tokens back through the Parser that gave you them.

If you push bare characters, they will be converted by the source as it thinks appropriate.

Parameters:

Name Type Description Default
thing yex.parse.tokeniser.Any

whatever you're pushing back. Pushing None will be ignored. If this is a string, or a list specifically, it will be split into its members and pushed in reverse order. For example, pushing 'cat' is the same as pushing 't', then pushing 'a', then pushing 'c'.

required
clean_char_tokens bool

if True, all bare characters will be converted to the Tokens for those characters.s (For example, 'T', 'e', 'X' -> ('T' 12) ('e' 12) ('X' 12).) The rules about how this is done are on p213 of the TeΧbook. If False, the characters will remain bare characters and the source will tokenise them as usual when it gets to them.

False
is_result bool

If you're a control, and your job involves reading some data, then pushing a result, set this to True when you push the result. This will allow \expandafter to work correctly.

If you're implemented through a decorator, and your result is pushed via returning it, you don't have to worry: the decorator will set is_result=True when it pushes your return values.

False

Raises:

Type Description
EOFError

if this parser is exhausted.

GoneBeforeTheBeginningError

if we're bounded, and you push more BEGINNING_GROUP tokens than you've already received.

Source code in yex/parse/parser.py
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
def push(self,
         thing: Any,
         clean_char_tokens: bool = False,
         is_result:bool = False,
        ):
    r"""
    Pushes back a token, a character, or anything else.

    This is mostly just a wrapper for the `push` method in
    `Tokeniser`. But we do check for "beginning group"
    and "ending group" tokens, and adjust our fields accordingly.

    All Parsers share pushback, and in general it's fine to push
    things through a parser when you received them from a
    different Parser. The only exception to this is when
    you're using balanced expansion: because we have to keep a count of
    balanced braces, you should remember to push Tokens back
    through the Parser that gave you them.

    If you push bare characters, they will be converted by the
    source as it thinks appropriate.

    Args:
        thing: whatever you're pushing back.
            Pushing None will be ignored.
            If this is a string, or a list specifically, it
            will be split into its members and pushed in reverse order.
            For example, pushing 'cat' is the same as pushing 't',
            then pushing 'a', then pushing 'c'.

        clean_char_tokens: if True, all bare characters
            will be converted to the Tokens for those characters.s
            (For example, 'T', 'e', 'X' -> ('T' 12) ('e' 12) ('X' 12).)
            The rules about how this is done are on p213 of the TeXbook.
            If False, the characters will remain bare characters
            and the source will tokenise them as usual when it
            gets to them.

        is_result: If you're a control, and your job involves
            reading some data, then pushing a result, set this to True
            when you push the result. This will allow \expandafter
            to work correctly.

            If you're implemented through a decorator, and your result
            is pushed via returning it, you don't have to worry:
            the decorator will set is_result=True when it pushes your
            return values.

    Raises:
        EOFError: if this parser is exhausted.
        GoneBeforeTheBeginningError: if we're bounded, and you push more
            BEGINNING_GROUP tokens than you've already received.
    """


    if not self.running:
        raise EOFError()

    if not isinstance(thing, (str, list)):
        thing = [thing]

    if clean_char_tokens:

        def _clean(c):
            if isinstance(c, str):
                return Token.get(
                        ch=c,
                        location=self.source.location,
                        )
            else:
                return c

        thing = [_clean(c) for c in thing]

    self.source.pushback.push(thing)

    if self._bounded_limit is not None:
        if self.source.pushback.group_depth < self._bounded_limit:
            logger.debug(
                    '%s: group_depth is %d, but bounded_limit is %d',
                    self, self.pushback.group_depth,
                    self._bounded_limit)
            raise yex.exception.GoneBeforeTheBeginningError()