Ubuntu Manpage: Marpa::R2::Scanless::R - Scanless interface recognizers

Provided by: libmarpa-r2-perl_2.086000~dfsg-10_amd64

Name

       Marpa::R2::Scanless::R - Scanless interface recognizers

Synopsis

           my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
           my $self = bless { grammar => $grammar }, 'My_Actions';
           $self->{recce} = $recce;

           if ( not defined eval { $recce->read($p_input_string); 1 }
               )
           {
               ## Add last expression found, and rethrow
               my $eval_error = $EVAL_ERROR;
               chomp $eval_error;
               die $self->show_last_expression(), "\n", $eval_error, "\n";
           } ## end if ( not defined eval { $event_count = $recce->read...})

           my $value_ref = $recce->value( $self );
           if ( not defined $value_ref ) {
               die $self->show_last_expression(), "\n",
                   "No parse was found, after reading the entire input\n";
           }

           package My_Actions;
           sub do_parens    { shift; return $_[1] }
           sub do_add       { shift; return $_[0] + $_[2] }
           sub do_subtract  { shift; return $_[0] - $_[2] }
           sub do_multiply  { shift; return $_[0] * $_[2] }
           sub do_divide    { shift; return $_[0] / $_[2] }
           sub do_pow       { shift; return $_[0]**$_[2] }
           sub do_first_arg { shift; return shift; }
           sub do_script    { shift; return join q{ }, @_ }

About this document

       This page is the reference document for the recognizer objects of Marpa's SLIF (Scanless interface).

Internal and external scanning

       The Scanless interface is so-called because it does not require the application to supply a scanner
       (lexer).  The SLIF contains its own lexer, one whose use is integrated into its syntax.  In this
       document, use of the SLIF's internal scanner is called internal scanning.

       The SLIF allows applications that find it useful to do their own scanning.  When an application bypasses
       the SLIF's internal scanner and does its own scanning, this document calls it external scanning.  An
       application can use external scanning to supplement internal scanning, or to replace the SLIF's internal
       scanner entirely.

Locations

   Input stream locations
       An input stream location is the offset of a codepoint in the input stream.  When the input stream is
       being treated as a string, input stream location corresponds to Perl's pos() location.  In this document,
       the word "location" refers to location in the input stream unless otherwise specified.

   Negative locations
       Several methods allow locations and lengths to be specified as negative numbers.  A negative location is
       a location counted from the end, so that -1 means location before the last character of the string, -2
       the location before the second to last character of a string, etc.  A negative length indicates a
       distance to a location counted from the end.  A length of -1 indicates the distance to the end of the
       string, -2 indicates the distance to the location just before the last character of the string, etc.

       For example, suppose that we are dealing with input stream locations.  The span ("0, -1") is the entire
       input stream.  The span ("-1, -1") is the last character of input stream.  The span ("-2, -1") is the
       last two characters of the input stream.  The span ("-2, 1") is the second to last character of the input
       stream.

   G1 locations
       In addition to input stream location, the SLIF also tracks G1 location.  G1 location starts at zero, and
       increases by exactly one as each lexeme is read.  G1 location is usually not the same as input stream
       location.  There is also a concept of G1 length, which is simply length calculated in terms of G1
       locations.

       G1 location can be ignored most of the time, but it does become relevant to a small degree when dealing
       with ambiguous terminals, and to a greater degree when tracing the G1 grammar.  (For those more familiar
       with Marpa's internals, the G1 location is the G1 Earley set index.)

   Current location
       The SLIF tracks the current location in the input stream, more usually simply called the current
       location.  Locations are zero-based, so that location 0 is the start of the input stream.  A location is
       said to point to the character after it, if there is such a character.  For example, location 0 points to
       the first character of the input stream, unless the stream is of zero length, in which case there is no
       first character.

       A current location equal to the length of the input stream indicates EOS (end of stream).  In a zero
       length stream, location 0 is EOS.  The EOS location never points to a character.

       In the SLIF, when the current input stream location moves, it does not necessarily advance -- it can skip
       forward, or can be positioned to an earlier location.  The application can skip sections of the input
       stream.  The application is also free to revisit spans of the input stream as often as it wants.

       Here are the guarantees:

       •   Initially, the current location is 0.

       •   The current location will never be negative.

       •   The current location will never be greater than EOS.

   Literals and G1 spans
       Often  it  is  useful  to  find  the  literal  substring  of  the input which corresponds to a span of G1
       locations.  If an application reads  the  input  monotonically  within  the  G1  span  this  presents  no
       complications.

       "Monotonically"  here  means  that, for the G1 span "$g1_start, $g1_length", the application reads the G1
       locations in sequence and one-by-one, starting at $g1_start and ending at  "$g1_start+$g1_length".   This
       is the usual case.

       Reading  the input monotonically is the default, and by far the most common case.  But Marpa applications
       are free to skip forward in the stream, to jump backward, to reread the same input multiple times,  etc.,
       etc.   It  is  entirely possible the final input stream location of a G1 span will be before the start of
       the G1 span.

       In precise terms, the substring returned for a G1 span "$g1_start, $g1_length" is determined as  follows:
       The  string will start at the first input stream location in the span for G1 location "$g1_start+1".  The
       end  of  the  string  will  be  at  the  last  input  stream  location  in  the  span  for  G1   location
       "$g1_start+$g1_length".   When  an  application  moves  backward  in the input, the end of the string, as
       calculated above, may be before the start of the string.  When the end of a string is before  its  start,
       the substring returned will be the zero-length string.

       Applications which do not read monotonically, but which also want to associate spans of G1 locations with
       the  input stream, may need to reassemble the input based on their own ideas.  The "literal()" method can
       assist in this process.

How internal scanning works

The SLIF always starts scanning using the read() method. Pedantically, this means scanning always begins
with a phase of internal scanning. But that first phase may be of zero length, and after that, internal
scanning does not have to be resumed.

Internal scanning can be resumed with the resume() method. Both the read() and resume() methods require
the application to specify a span in the input stream. The read() method sets the input stream, and that
input stream is the one used by all resume() method calls for that recognizer.

In what follows, the term "internal scanning method" refers to either the read() or the resume() method.
After an internal scanning method, the current location will indicate how far in the input stream the
internal scanning method actually read. If the internal scanning method paused before EOS, the current
location will be the one at which it paused. If the internal scanning method pauses at EOS, the current
location will be EOS. The return value of the read() and the resume() method is the current location.

EOS
The location of EOS depends on the $start and $length arguments to the last internal scanning method, and
on the length of the input string.

• If the $length argument of the last internal scanning method was non-negative, EOS will be at
"$start+$length".

• If the $length argument was negative, EOS will be at "$length + 1 + length $input_string".

• The default length for the internal scanning methods is always -1, so that the default EOS is always
at "length $input_string", the end of the input string.

Pauses in internal scanning
When a read() and the resume() method pauses, one of more of the following occurred.

• A named event

One or more named events may have triggered. Named events are created by named event statements.
They can also be created by lexeme pseudo-rules. Named events may be queried using the events()
method().

• A unnamed lexeme pause event

A lexeme pause that is not a named event may have triggered. Lexeme pauses are created by lexeme
pseudo-rules. Applications can always name lexeme pause events, using the event adverb, and are
strongly encouraged to do so. If all lexeme pauses are named, the check for unnamed events can be
omitted. The presence or absence of an unnamed lexeme pause event may be checked for using the
lexeme_pause() method.

• EOS

EOS may have been reached. This may be checked for by comparing the current location with the
expected EOS.

The input stream

       For  error  message and other purposes, even external lexemes are required to correspond to a span of the
       input stream.  An external scanner must set  up  a  relationship  to  the  input  stream,  even  if  that
       relationship is completely artificial.

       One way to do this is to put an artificial preamble in front of the input stream.  For example, the first
       7  characters  of  the  input  stream  could  be  a preamble containing the characters ""NO TEXT"".  This
       preamble could be immediately followed by what is seen as the text from a more natural point of view.  In
       this case, the initial call to the read() method could  take  the  form  "$slr->read($input_string,  7)".
       Lexemes  corresponding  to  the  artificial  preamble  would  be  read  using  a  method  call similar to
       "$slr->lexeme_read($symbol_name, 0, 7, $value)".

Constructor

           my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );

       The new() method is the constructor for SLIF recognizers.  The new() constructor accepts a hash of  named
       arguments.  The "grammar" named argument is required.  All other named arguments are optional.

       The following named arguments are allowed:

   end
       Most  users  will  want  to ignore this argument.  It is an advanced argument, mainly for use in testing.
       The "end" named argument specifies the parse end, as a G1 location.  The default is for the parse to  end
       where  the input did, so that the parse returned is of the entire input.  The "end" named argument is not
       allowed once a parse series has begun.

   grammar
       The "new" method is required to have a "grammar" named argument.  Its  value  must  be  an  SLIF  grammar
       object.

   max_parses
       If  non-zero, causes a fatal error when that number of parse results is exceeded.  "max_parses" is useful
       to limit CPU usage and output length when testing and debugging.  Stable and production applications  may
       prefer to count the number of parses, and take a less Draconian response when the count is exceeded.

       The  value  must  be  an  integer.   If it is zero, there will be no limit on the number of parse results
       returned.  The default is for there to be no limit.

   ranking_method
       The value must be a string: one  of  ""none"",  ""rule"",  or  ""high_rule_only"".   When  the  value  is
       ""none"", Marpa returns the parse results in arbitrary order.  This is the default.  The "ranking_method"
       named argument is not allowed once evaluation has begun.

       The  ""rule""  and ""high_rule_only"" ranking methods allows the user to control the order in which parse
       results are returned by the "value" method, and to exclude some parse results from the parse series.  For
       details, see the document on parse order.

   semantics_package
       Sets the semantic package for the recognizer.  The setting of this argument  takes  precedence  over  any
       package  implied by the blessing of the per-parse arguments to the SLIF recognizer's value() method.  The
       semantics package is used when resolving action names to fully qualified Perl names.  For more details on
       the SLIF semantics, see the document on SLIF semantics.

   too_many_earley_items
       The "too_many_earley_items" argument is optional, and very few applications will need it.  If  specified,
       it  sets  the  Earley item warning threshold to a value other than its default.  If an Earley set becomes
       larger than the Earley item warning threshold, a recognizer event is generated, and a warning is  printed
       to the trace file handle.

       Marpa  parses from any BNF, and can handle grammars and inputs which produce very large Earley sets.  But
       parsing that involves very large Earley  sets  can  be  slow.   Large  Earley  sets  are  something  most
       applications can, and will wish to, avoid.

       By  default, Marpa calculates an Earley item warning threshold for the G1 recognizer based on the size of
       the G1 grammar, and for each L0 recognizer based on the size of the L0 grammar.  The  default  thresholds
       will  never  be less than 100.  The default is the result of considerable experience and almost all users
       will be happy with it.

       If the Earley item warning threshold is changed from its default, the change applies to both L0 and G1 --
       currently there is no way to set them separately.  If the Earley item warning threshold is set to  0,  no
       recognizer  event  is  generated,  and  warnings  about large Earley sets are turned off.  An Earley item
       threshold warning almost always indicates a serious issue, and turning these warnings off will rarely  be
       what an application wants.

   trace_terminals
       If  non-zero,  traces the lexemes -- those tokens passed from the L0 parser to the G1 parser.  This named
       argument is the best way to follow what the L0 parser is doing, and it is also very helpful  for  tracing
       the G1 parser.

   trace_values
       The  trace_values named argument is a numeric trace level.  If the numeric trace level is 1, Marpa prints
       tracing information as values are computed in the evaluation stack.  A  trace  level  of  0  turns  value
       tracing off, which is the default. Traces are written to the trace file handle.

   trace_file_handle
       The  value is a file handle.  Trace output and warning messages go to the trace file handle.  By default,
       the trace file handle is inherited from the grammar.

Basic mutators

   read()
           $recce->read($p_input_string);

           $recce->read( \$string, 0, 0 );

       Given a pointer to an input stream, read() parses it according to the grammar.  Only  a  single  call  to
       read() is allowed for a scanless recognizer.

       read()  recognizes  optional  second and third arguments.  The second argument is a location in the input
       stream at which internal scanning will start.  The third argument is the length of  the  section  of  the
       input  stream  to  be scanned before pausing.  The default start location is zero.  The default length is
       -1.  Negative locations and lengths have the standard interpretation, as described above.

       Start location and length can both be zero.  This pauses internal scanning immediately and can be used to
       hand complete control of scanning over to an external scanner.

       Completion named events can occur during the read() method.  When a named event occurs, the read() method
       pauses.  Named events can be queried using the Scanless recognizer's events() method.  The read()  method
       also pauses as specified with the Scanless DSL's pause adverb.

       On  failure,  throws  an  exception.   The  call is considered successful if it ended because a parse was
       found, or because internal scanning was paused.  On success, read() returns the  location  in  the  input
       stream at which internal scanning ended.  This value may be zero.

   series_restart()
           $slr->series_restart( { end => $i } );

       The  series_restart()  method  ends the current parse series, and starts another.  It allows, as optional
       arguments, hashes of named arguments for the SLIF recognizer.  These named arguments can be any of  those
       allowed by the set() method.

       series_restart()  resets  all  the  named  arguments to their defaults.  An application that wants a non-
       default named argument to have effect in each of its parse series must respecify it at the  beginning  of
       each  parse  series.  series_restart() is particularly useful for the "end" and "semantics_package" named
       arguments, which cannot be changed once  a  parse  series  is  underway.   To  change  their  values,  an
       application must start a new parse series.

   set()
           $slr->set( { max_parses => 42 } );

       This  method  allows  the named arguments to be changed after an SLIF grammar is created.  Currently, the
       arguments that may be changed are "end", "max_parses", "semantics_package" and "trace_file_handle".

   value()
           my $value_ref = $recce->value( $self );

       The "value" method call evaluates the next parse tree in the parse series, and returns a reference to the
       parse result for that parse tree.  If there are no more parse trees, the "value" method returns "undef".

       Because Marpa parses ambiguous grammars, every parse is a series of  zero  or  more  parse  trees.   This
       series  of zero or more parse trees is called a parse series.  There are zero parse trees if there was no
       valid parse of the input according to the grammar.

       The value() method allows one, optional argument.  This argument can be a Perl scalar of  any  kind,  but
       the most useful possibilities are references (blessed or unblessed) to hashes or array.  If provided, the
       argument  of the value() method explicitly specifies the per-parse argument for the parse tree.  The per-
       parse argument will be the first argument of all Perl semantics closures, and can be used to  share  data
       within  the  tree,  when  that  data  does  not  conveniently  fit  into the bottom-up flow of parse tree
       evaluation.  Symbol tables are one example of the kind of data which parses often require, but  which  it
       is not convenient to accumulate bottom-up.

       If  the  "semantics_package"  named argument of the SLIF recognizer was not specified, Marpa will use the
       package into which the per-parse argument was blessed as the semantics package -- the package in which to
       look for the parse's Perl semantic closures.  In this case, Marpa will regard the per-parse arguments  of
       all  calls  in the same parse series as the source of the semantics package, and it will require that the
       calls be consistent -- each call must have a per-parse argument, and that  per-parse  argument   must  be
       blessed into the semantics package.

Mutators for external scanning

   activate()
               $slr->activate($_, 0) for @events;

       The  activate()  method  allows  the  recognizer to deactivate and reactivate named events.  Named events
       allow the recognizer to stop for external scanning at conveniently defined locations.  Named  events  can
       be defined for the prediction and completion of non-zero-length symbols, and nulled events can be defined
       to trigger when zero-length symbols are recognized.

       The  activate() method takes two arguments.  The first is the name of an event, and the second (optional)
       argument is 0 or 1.  If the argument is 0, the event is deactivated.  If the argument is 1, the event  is
       reactivated.   An  argument  of  1  is the default.  but, since an SLIF recognizer always starts with all
       defined events activated, 0 will probably be more common as the second argument to activate()

       Location 0 events are triggered in the SLIF recognizer's constructor, before the activate() method can be
       called.  This means that currently there is no way to deactivate location zero events.

       The overhead imposed by events can be reduced by using the activate() method.  But making many  calls  to
       the  the  activate() method purely for efficiency purposes will be counter-productive.  Also, deactivated
       events still impose some overhead, so if an event is never used it should be commented out  in  the  SLIF
       DSL.

   lexeme_alternative()
                   if ( not defined $recce->lexeme_alternative($token_name) ) {
                       die
                           qq{Parser rejected token "$long_name" at position $start_of_lexeme, before "},
                           substr( $string, $start_of_lexeme, 40 ), q{"};
                   }

       The  lexeme_alternative()  method allows an external scanner to read ambiguous tokens.  Most applications
       will prefer the simpler lexeme_read().

       lexeme_alternative() takes one or two arguments.  The first argument, which is required, is the name of a
       symbol to be read at the current location.  The second argument, which is optional, is the value  of  the
       symbol.  The value argument is interpreted as described for lexeme_read().

       Any number of tokens may be read using lexeme_alternative() without advancing the current location.  This
       allows  an  application  to  use ambiguous tokens.  To complete reading at a G1 location, and advance the
       current G1 location to the next G1 location, use the lexeme_complete() method.

       On success, returns a non-negative number.  Returns "undef" if the  token  was  rejected.   Failures  are
       thrown as exceptions.

   lexeme_complete()
                   next TOKEN
                       if $recce->lexeme_complete( $start_of_lexeme,
                               ( length $lexeme ) );

       The lexeme_complete() method allows an external scanner to read ambiguous tokens.  Most applications will
       prefer the simpler lexeme_read().

       The  lexeme_complete()  method requires two arguments, a input stream start location and a length.  These
       are interpreted as described for the corresponding second and  third  arguments  to  lexeme_read().   The
       lexeme_complete()  method  completes  the  reading  of alternative tokens at the current G1 location, and
       advances the current G1 location by one.  Current location in the input stream is moved to  the  location
       after the new lexeme, as indicated by the arguments.

       Completion named events can occur during the lexeme_complete() method.  Named events can be queried using
       the Scanless recognizer's events() method.

       Return  value:  On  success,  lexeme_complete()  returns  the  new  current location.  This will never be
       location zero, because a succesful call of lexeme_complete() always advances the location.   On  unthrown
       failure, lexeme_complete() returns 0.

   lexeme_read()
           $re->lexeme_read( 'lstring', $start, $length, $value ) // die;

       The lexeme_read() method reads a single, unambiguous, lexeme.  It takes four arguments, only the first of
       which  is  required.   The  first  argument  is the lexeme's symbol name.  The second and third arguments
       specify the span in the input stream to be associated with the lexeme.  The last argument  indicates  its
       value.

       The  second  and  third  arguments are, respectively, the start and length of a span in the input stream.
       The start defaults to the current location.  If the pause span is defined, and the  start  of  the  pause
       lexeme  is  the same as the current location, length defaults to the length of the pause span.  Otherwise
       length defaults to -1.

       Negative values are allowed and are interpreted as described above.  This span will  be  treated  as  the
       section  of  the  input  stream  that  corresponds  to  the  tokens  read  at the current location.  This
       correspondence may be artificial, but a span must always be specified.

       The fourth argument specifies the value of the lexeme.  If the value argument  is  omitted,  the  token's
       value  will  be  a string containing the corresponding substring of the input stream.  Omitting the value
       argument does not have the same effect as passing an explicit Perl "undef".  If the value argument is  an
       explicit Perl "undef", the value of the lexeme will be a Perl "undef".

           $slr->lexeme_read($symbol, $start, $length, $value)

       is the equivalent of

           $slr->lexeme_alternative($symbol, $value)
           $slr->lexeme_complete($start, $length)

       Current location in the input stream is moved to the place where read() paused or, if it never pauses, to
       "$start+$length".  Current G1 location is advanced by one.

       Completion named events can occur during the lexeme_read() method.  Named events can be queried using the
       Scanless recognizer's events() method.

       Return  value:  On  success, lexeme_read() returns the new current location.  This will never be location
       zero, because lexemes cannot be zero length.  If the token was rejected,  returns  a  Perl  "undef".   On
       other unthrown failure, returns 0.

   resume()
           my $re = Marpa::R2::Scanless::R->new(
               {   grammar           => $parser->{grammar},
                   semantics_package => 'MarpaX::JSON::Actions'
               }
           );
           my $length = length $string;
           for (
               my $pos = $re->read( \$string );
               $pos < $length;
               $pos = $re->resume()
               )
           {
               my ( $start, $length ) = $re->pause_span();
               my $value = substr $string, $start + 1, $length - 2;
               $value = decode_string($value) if -1 != index $value, '\\';
               $re->lexeme_read( 'lstring', $start, $length, $value ) // die;
           } ## end for ( my $pos = $re->read( \$string ); $pos < $length...)
           my $per_parse_arg = bless {}, 'MarpaX::JSON::Actions';
           my $value_ref = $re->value($per_parse_arg);
           return ${$value_ref};

       The  resume()  method  takes two arguments, a start location and a length.  The default start location is
       the current location.  The default length is -1.  Negative arguments are interpreted as described above.

       The resume() method resumes the SLIF's internal scanning, as described above.

       Completion named events can occur during the resume() method.  When a named event  occurs,  the  resume()
       method  pauses.   Named  events  can  be  queried  using  the Scanless recognizer's events() method.  The
       resume() method also pauses as specified with the Scanless DSL's pause adverb.

       On success, resume() moves the current location to where it paused, or to the EOS.  The return  value  is
       the new current location.  On unthrown failure, resume() return a Perl "undef".

Accessors

   ambiguity_metric()
           my $ambiguity_metric = $slr->ambiguity_metric();

       Returns 1 if there is an unambiguous parse, and 2 or greater if there is a ambiguous parse.  Returns 0 if
       called before parsing.  Returns 0 or less than zero on other unthrown failure.

   current_g1_location()
           my $current_g1_location = $slr->current_g1_location();

       Returns the current G1 location.

   events()
               EVENT:
               for my $event ( @{ $slr->events() } ) {
                   my ($name) = @{$event};
                   push @actual_events, $name;
               }

       The  events() method takes no arguments, and returns an array of event descriptors.  It returns the empty
       array if there were no event.

       Each named event descriptor is a reference to an array of one, and potentially more, elements.  The first
       element of every named event descriptor is a string containing  the  name  of  the  event,  and  this  is
       typically the only element.  In certain cases, there could be other elements of a named event descriptor,
       which will be as described for the type of named event.  Named events are described in the SLIF DSL.

       Events  occur during the the Scanless recognizer's read(), resume(), lexeme_complete(), and lexeme_read()
       methods.  Any subsequent call to an SLIF recognizer mutator may clear the list of triggered  events,  The
       assumption  is  that  an application interested in events will call the events() method almost as soon as
       control is returned to it.

       Named events are returned in order by type.  Completion events are  first.   They  are  followed  by  the
       nulled  events.   These are in turn followed by prediction events.  Within each type, the order of events
       is arbitrary.

       Applications may find it convenient to turn specific events off, temporarily or permanently.  Events  may
       be activated or deactivated with the SLIF recognizer's activate() method.

   exhausted()
           my $exhausted_status = $slr->exhausted();

       The  exhausted  method returns a Perl true if parsing in a SLIF recognizer is exhausted, and a Perl false
       otherwise. Parsing is exhausted when the recognizer will not accept any further input.

       An attempt to read input into an exhausted parser causes an exception to be thrown.  The exception is all
       that most applications require,  but  this  method  allows  the  recognizer's  exhaustion  status  to  be
       discovered directly.

   g1_location_to_span()
               my ( $span_start, $span_length ) =
                   $slr->g1_location_to_span($g1_location);

       G1  locations  do  not  correspond  to  a  single  input  stream  location,  but  to a span of them.  The
       g1_location_to_span() method returns an array of two elements, representing a span in the  input  stream.
       The first element of the array is the input stream location where the span starts.  The second element of
       the  array  is  the  length  of  the span.  As a special case, the input stream span for G1 location 0 is
       always (0,0).

       Sometimes it is convenient to think of G1 location as corresponding to a single  input  stream  location.
       When  this is the case, what is usually intended is the last input stream location of the span.  The last
       input stream location of the span will always be "$span_start+$span_length".

   input_length()
           my $input_length = $slr->input_length();

       The input_length() method accepts no arguments, and returns the length of the input stream.

   last_completed()
           sub show_last_expression {
               my ($self) = @_;
               my $recce = $self->{recce};
               my ( $g1_start, $g1_length ) = $recce->last_completed('Expression');
               return 'No expression was successfully parsed' if not defined $g1_start;
               my $last_expression = $recce->substring( $g1_start, $g1_length );
               return "Last expression successfully parsed was: $last_expression";
           } ## end sub show_last_expression

           my ( $g1_start, $g1_length ) = $recce->last_completed('Expression');

       Given the name of a symbol, returns the start G1 location and the length in  G1  locations  of  the  most
       recent  match.   If  there  was more than one most recent match, it returns the longest.  If there was no
       match, returns the empty array in array context and a Perl false in scalar context.

   line_column()
           my ( $start, $span_length ) = $re->pause_span();
           my ( $line,  $column )      = $re->line_column($start);

       The line_column() method accepts one, optional, argument: a location in the input stream.   The  location
       defaults to the current location.  line_column() returns the corresponding line and column position, as a
       2-element  array.   The  first  element  of the array is the line position, and the second element is the
       column position.

       Numbering of lines and columns is 1-based, following UNIX editor tradition.  Except at EOF, the line  and
       column  will  be  that of an actual character.  At EOF the line number will be that of the last line, and
       the column number will be that of the last column plus one.  Applications which want to treat  EOF  as  a
       special case can test it for using the pos() method and the input_length() method.

       A  line  is  considered  to  end with any newline sequence as defined in the Unicode Specification 4.0.0,
       Section 5.8.  Specifically, a line ends with one of the following:

       •   a LF (line feed U+000A);

       •   a CR (carriage return, U+000D), when it is not followed by a LF;

       •   a CRLF sequence (U+000D,U+000A);

       •   a NEL (next line, U+0085);

       •   a VT (vertical tab, U+000B);

       •   a FF (form feed, U+000C);

       •   a LS (line separator, U+2028) or

       •   a PS (paragraph separator, U+2029).

   literal()
           my $literal_string = $re->literal( $start, $span_length );

       The literal() method accepts two arguments, the start location and length of a span in the input  stream.
       It returns the substring of the input stream corresponding to that span.

   pause_lexeme()
          my $lexeme = $re->pause_lexeme();

       The  pause_lexeme() method accepts no arguments, and returns the name of the lexeme which caused the most
       recent pause.  The pause lexeme is initially undefined and it is reset to undefined at the  beginning  of
       each call to the read() or resume() methods.

       More than one lexeme may cause a pause.  When this is the case, all the causal lexemes will be acceptable
       to  the  G1  grammar,  and all the causal lexemes will have the same lexeme priority.  When more than one
       lexeme causes a pause, the choice of  pause  lexeme  is  arbitrary.   Applications  may  not  rely  on  a
       particular choice, or on that choice being repeated, even when the choice is made in similar or identical
       circumstances.

       Not  every  pause  is  caused  by  a  lexeme.   A pause often occurs because of the length argument of an
       internal scanning method.  When the most recent pause was not caused by a lexeme,  the  pause  lexeme  is
       undefined.  pause_lexeme() returns a Perl "undef" when the pause lexeme is undefined.

   pause_span()
           my ( $start, $length ) = $re->pause_span();

       The  pause_span()  method  accepts  no arguments, and returns the "pause span" as a 2-element array.  The
       "pause span" is the start location and length of the lexeme which caused  the  most  recent  pause.   The
       pause  span  is  initially  undefined  and  it is reset to undefined at the beginning of each call to the
       read() or resume() methods.

       A pause is not always caused by a lexeme -- internal  scanning  may  be  paused  because  of  the  length
       argument  of an internal scanning method.  When the most recent pause was not caused by a lexeme, no span
       can be associated with it, and the pause span is undefined.  pause_span() returns a Perl "undef"  if  the
       pause span is undefined.

   pos()
           my $pos = $slr->pos();

       The pos() method accepts no arguments, and returns the current input stream location.

   progress()
           my $progress_output = $slr->progress();

       Returns  an  array  that  describes  the progress of a parse at a location.  With no argument, progress()
       reports progress at the current location.  If a G1 location is given as its argument, progress()  reports
       progress at that G1 location.  The G1 location may be negative.  An argument of -X will be interpreted as
       location  N+X+1,  where  N  is  the current G1 location.  In other words, an argument of -1 indicates the
       current G1 location, an argument of -2 indicates the G1 location just before the current one, etc.

       The progress reports returned by the progress() method identify rules by their G1 rule ID.  G1 rules  IDs
       can be converted to a list of the rule's symbols using the rule() method of the SLIF grammar.  Details on
       progress reports can be found in their own document.

   show_progress()
           my $show_progress_output = $slr->show_progress();

       Shows the progress of the G1 parse.  For a description of its output, see Marpa::R2::Progress.

       With  no  arguments, the string contains reports for the current location.  If locations are specified as
       arguments to show_progress(), they need to be G1 locations.

       With a single integer argument N, the string contains reports  for  G1  location  N.   With  two  numeric
       arguments,  N and M, the arguments are interpreted as the start and end points of a range of G1 locations
       and the returned string contains reports for all locations in the range.

       If an argument is negative, -N, it indicates  the  Nth  location  counting  backward  from  the  furthest
       location  of  the parse.  For example, if 42 was the furthest G1 location, -1 would be G1 location 42 and
       -2 would be location 41.  For example, the method call "$recce->show_progress(-3,  -1)"  returns  reports
       for  the last three G1 locations of the parse.  The method call "$recce->show_progress(0, -1)" will print
       progress reports for the entire parse.

       Locations are G1 locations instead of string offsets, for two reasons.  First, G1  parse  state  is  only
       defined  at  the  start  of  parsing,  and  at the end of a non-discarded lexeme.  Therefore many strings
       offsets will not have a G1 parse state.  Second, SLIF recognizers using external scanning are allowed  to
       rescan the same string repeatedly.  Therefore, a single string offset may have many G1 parse states.

   substring()
           my $last_expression = $recce->substring( $g1_start, $g1_length );

       Given  a  G1  span -- that is, a G1 start location and a length in G1 locations -- the substring() method
       returns a substring of the input stream.  A G1 length of zero will produce the zero-length string.

       The substring of the input stream is determined on the assumption that the application  reads  the  input
       monotonically.  When this is not the case, the substring is determined as described above.

   terminals_expected()
           my @terminals_expected = @{$slr->terminals_expected()};

       Returns  a  reference  to a list of strings, where the strings are the names of the lexemes acceptable at
       the current location.  The presence of a lexeme in this list means that lexeme will be acceptable in  the
       next call of the resume() method.

       This  is  highly useful for Ruby Slippers parsing.  A more fine-tuned approach is to identify the lexemes
       of interest and create "predicted symbol" events for them.

Discouraged methods

       Methods in this section continue to be supported, but their use is discouraged in favor of other,  better
       solutions.  New applications should avoid using discouraged methods.

   event()
                   my $event    = $slr->event($event_ix);

       Use  of  this  method  is discouraged in favor of the more efficient events() method.  The event() method
       requires one argument, an event index.  It returns a descriptor of the named event with that index, or  a
       Perl  "undef" if there is no such event.  For more details on events, see the description of the events()
       method.

   last_completed_range()
       Use of this method is  discouraged  in  favor  of  "last_completed()".   Given  the  name  of  a  symbol,
       last_completed_range()  returns the G1 start and G1 end locations of the most recent match.  If there was
       more than one most recent match, last_completed_range() returns the longest.   If  there  was  no  match,
       last_completed_range() returns the empty array in array context and a Perl false in scalar context.

   range_to_string()
       Use  of  this  method  is discouraged in favor of "substring()".  Given a G1 start and a G1 end location,
       range_to_string()  returns  the  substring  of  the  input  stream  that  is  between   the   two.    The
       range_to_string()  method  assumes  that the application read forward smoothly in the input stream, while
       reading the sequence of G1 locations.  When that is not the case, range_to_string() behaves in  much  the
       same way as described above for "substring()".

Copyright and License

         Copyright 2014 Jeffrey Kegler
         This file is part of Marpa::R2.  Marpa::R2 is free software: you can
         redistribute it and/or modify it under the terms of the GNU Lesser
         General Public License as published by the Free Software Foundation,
         either version 3 of the License, or (at your option) any later version.

         Marpa::R2 is distributed in the hope that it will be useful,
         but WITHOUT ANY WARRANTY; without even the implied warranty of
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
         Lesser General Public License for more details.

         You should have received a copy of the GNU Lesser
         General Public License along with Marpa::R2.  If not, see
         http://www.gnu.org/licenses/.

perl v5.40.0                                       2024-12-07                        Marpa::R2::Scanless::R(3pm)