Ubuntu Manpage: PPR::X - Pattern-based Perl Recognizer

NAME

       PPR::X - Pattern-based Perl Recognizer

VERSION

       This document describes PPR::X version 0.001009

SYNOPSIS

           use PPR::X;

           # Define a regex that will match an entire Perl document...
           my $perl_document = qr{

               # What to match            # Install the (?&PerlDocument) rule
               (?&PerlEntireDocument)     $PPR::X::GRAMMAR

           }x;

           # Define a regex that will match a single Perl block...
           my $perl_block = qr{

               # What to match...         # Install the (?&PerlBlock) rule...
               (?&PerlBlock)              $PPR::X::GRAMMAR
           }x;

           # Define a regex that will match a simple Perl extension...
           my $perl_coroutine = qr{

               # What to match...
               coro                                           (?&PerlOWS)
               (?<coro_name>  (?&PerlQualifiedIdentifier)  )  (?&PerlOWS)
               (?<coro_code>  (?&PerlBlock)                )

               # Install the necessary subrules...
               $PPR::X::GRAMMAR
           }x;

           # Define a regex that will match an integrated Perl extension...
           my $perl_with_classes = qr{

               # What to match...
               \A
                   (?&PerlOWS)       # Optional whitespace (including comments)
                   (?&PerlDocument)  # A full Perl document
                   (?&PerlOWS)       # More optional whitespace
               \Z

               # Add a 'class' keyword into the syntax that PPR::X understands...
               (?(DEFINE)
                   (?<PerlKeyword>

                           class                              (?&PerlOWS)
                           (?&PerlQualifiedIdentifier)        (?&PerlOWS)
                       (?: is (?&PerlNWS) (?&PerlIdentifier)  (?&PerlOWS) )*+
                           (?&PerlBlock)
                   )

                   (?<kw_balanced_parens>
                       \( (?: [^()]++ | (?&kw_balanced_parens) )*+ \)
                   )
               )

               # Install the necessary standard subrules...
               $PPR::X::GRAMMAR
           }x;

DESCRIPTION

       The PPR::X module provides a single regular expression that defines a set of independent subpatterns
       suitable for matching entire Perl documents, as well as a wide range of individual syntactic components
       of Perl (i.e. statements, expressions, control blocks, variables, etc.)

       The regex does not "parse" Perl (that is, it does not build a syntax tree, like the PPI module does).
       Instead it simply "recognizes" standard Perl constructs, or new syntaxes composed from Perl constructs.

       Its features and capabilities therefore complement those of the PPI module, rather than replacing them.
       See "Comparison with PPI".

INTERFACE

   Importing and using the Perl grammar regex
       The PPR::X module exports no subroutines or variables, and provides no methods. Instead, it defines a
       single package variable, $PPR::X::GRAMMAR, which can be interpolated into regexes to add rules that
       permit Perl constructs to be parsed:

           $source_code =~ m{ (?&PerlEntireDocument)  $PPR::X::GRAMMAR }x;

       Note that all the examples shown so far have interpolated this "grammar variable" at the end of the
       regular expression. This placement is desirable, but not necessary. Both of the following work
       identically:

           $source_code =~ m{ (?&PerlEntireDocument)   $PPR::X::GRAMMAR }x;

           $source_code =~ m{ $PPR::X::GRAMMAR   (?&PerlEntireDocument) }x;

       However, if the grammar is to be extended, then the extensions must be specified before the base grammar
       (i.e. before the interpolation of $PPR::X::GRAMMAR). Placing the grammar variable at the end of a regex
       ensures that will be the case, and has the added advantage of "front-loading" the regex with the most
       important information: what is actually going to be matched.

       Note too that, because the PPR::X grammar internally uses capture groups, placing $PPR::X::GRAMMAR
       anywhere other than the very end of your regex may change the numbering of any explicit capture groups in
       your regex.  For complete safety, regexes that use the PPR::X grammar should probably use named captures,
       instead of numbered captures.

   Error reporting
       Regex-based parsing is all-or-nothing: either your regex matches (and returns any captures you
       requested), or it fails to match (and returns nothing).

       This can make it difficult to detect why a PPR::X-based match failed; to work out what the "bad source
       code" was that prevented your regex from matching.

       So the module provides a special variable that attempts to detect the source code that prevented any call
       to the "(?&PerlStatement)" subpattern from matching. That variable is: $PPR::X::ERROR

       $PPR::X::ERROR is only set if it is undefined at the point where an error is detected, and will only be
       set to the first such error that is encountered during parsing.

       Note that errors are only detected when matching context-sensitive components (for example in the middle
       of a "(?&PerlStatement), as part of a "(?&PerlContextualRegex)", or at the end of a
       "(?&PerlEntireDocument")".  Errors, especially errors at the end of otherwise valid code, will often not
       be detected in context-free components (for example, at the end of a "(?&PerlStatementSequence), as part
       of a "(?&PerlRegex)", or at the end of a "(?&PerlDocument")".

       A common mistake in this area is to attempt to match an entire Perl document using:

           m{ \A (?&PerlDocument) \Z   $PPR::X::GRAMMAR }x

       instead of:

           m{ (?&PerlEntireDocument)   $PPR::X::GRAMMAR }x

       Only the second approach will be able to successfully detect an unclosed curly bracket at the end of the
       document.

       "PPR_X::ERROR" interface

       If it is set, $PPR::X::ERROR will contain an object of type PPR::X::ERROR, with the following methods:

       "$PPR::X::ERROR->origin($line, $file)"
           Returns a clone of the PPR::X::ERROR object that now believes that the source code parsing failure it
           is  reporting  occurred  in  a  code  fragment starting at the specified line and file. If the second
           argument is omitted, the file name is not reported in any diagnostic.

       "$PPR::X::ERROR->source()"
           Returns a string containing the specific source code that could not be parsed as a Perl statement.

       "$PPR::X::ERROR->prefix()"
           Returns a string containing all the source code preceding the code that could not be parsed. That is:
           the valid code that is the preceding context of the unparsable code.

       "$PPR::X::ERROR->line( $opt_offset )"
           Returns an integer which is the line number at which the unparsable  code  was  encountered.  If  the
           optional  "offset"  argument is provided, it will be added to the line number returned. Note that the
           offset   is   ignored   if   the   PPR::X::ERROR   object   originates   from   a   prior   call   to
           "$PPR::X::ERROR->origin" (because in that case you will have already specified the correct offset).

       "$PPR::X::ERROR->diagnostic()"
           Returns  a  string  containing  the diagnostic that would be returned by "perl -c" if the source code
           were compiled.

           Warning: The diagnostic is obtained by partially eval'ing the source code. This means  that  run-time
           code  will  not  be executed, but "BEGIN" and "CHECK" blocks will run. Do not call this method if the
           source code that created this error might also have non-trivial compile-time side-effects.

       A typical use might therefore be:

           # Make sure it's undefined, and will only be locally modified...
           local $PPR::X::ERROR;

           # Process the matched block...
           if ($source_code =~ m{ (?<Block> (?&PerlBlock) )  $PPR::X::GRAMMAR }x) {
               process( $+{Block} );
           }

           # Or report the offending code that stopped it being a valid block...
           else {
               die "Invalid Perl block: " . $PPR::X::ERROR->source . "\n",
                   $PPR::X::ERROR->origin($linenum, $filename)->diagnostic . "\n";
           }

   Decommenting code with PPR_X::decomment()
       The module provides (but does not export) a decomment() subroutine that can remove  any  comments  and/or
       POD from source code.

       It  takes  a  single  argument: a string containing the course code.  It returns a single value: a string
       containing the decommented source code.

       For example:

           $decommented_code = PPR::X::decomment( $commented_code );

       The subroutine will fail if the argument wasn't valid Perl code, in which case  it  returns  "undef"  and
       sets $PPR::X::ERROR to indicate where the invalid source code was encountered.

       Note  that, due to separate bugs in the regex engine in Perl 5.14 and 5.20, the decomment() subroutine is
       not available when running under these releases.

   Examples
       Note: In each of the following examples, the subroutine slurp() is used to acquire the source code from a
       file whose name is passed as its argument. The slurp() subroutine is just:

           sub slurp { local (*ARGV, $/); @ARGV = shift; readline; }

       or, for the less twisty-minded:

           sub slurp {
               my ($filename) = @_;
               open my $filehandle, '<', $filename or die $!;
               local $/;
               return readline($filehandle);
           }

       Validating source code

         # "Valid" if source code matches a Perl document under the Perl grammar
         printf(
             "$filename %s a valid Perl file\n",
             slurp($filename) =~ m{ (?&PerlEntireDocument)  $PPR::X::GRAMMAR }x
                 ? "is"
                 : "is not"
         );

       Counting statements

         printf(                                        # Output
             "$filename contains %d statements\n",      # a report of
             scalar                                     # the count of
                 grep {defined}                         # defined matches
                     slurp($filename)                   # from the source code,
                         =~ m{
                               \G (?&PerlOWS)           # skipping whitespace
                                  ((?&PerlStatement))   # and keeping statements,
                               $PPR::X::GRAMMAR            # using the Perl grammar
                             }gcx;                      # incrementally
         );

       Stripping comments and POD from source code

         my $source = slurp($filename);                    # Get the source
         $source =~ s{ (?&PerlNWS)  $PPR::X::GRAMMAR }{ }gx;  # Compact whitespace
         print $source;                                    # Print the result

       Stripping comments and POD from source code (in Perl v5.14 or later)

         # Print  the source code,  having compacted whitespace...
           print  slurp($filename)  =~ s{ (?&PerlNWS)  $PPR::X::GRAMMAR }{ }gxr;

       Stripping everything "except" comments and POD from source code

         say                                         # Output
             grep {defined}                          # defined matches
                 slurp($filename)                    # from the source code,
                     =~ m{ \G ((?&PerlOWS))          # keeping whitespace,
                              (?&PerlStatement)?     # skipping statements,
                           $PPR::X::GRAMMAR             # using the Perl grammar
                         }gcx;                       # incrementally

   Available rules
       Interpolating $PPR::X::GRAMMAR in a regex makes all of the following rules available within that regex.

       Note that other rules not listed here may also be added, but these are all considered  strictly  internal
       to  the PPR::X module and are not guaranteed to continue to exist in future releases. All such "internal-
       use-only" rules have names that start with "PPR_X_"...

       "(?&PerlDocument)"

       Matches a valid Perl document,  including  leading  or  trailing  whitespace,  comments,  and  any  final
       "__DATA__" or "__END__" section.

       This  rule  is  context-free, so it can be embedded in a larger regex.  For example, to match an embedded
       chunk of Perl code, delimited by "<<<"...">>>":

           $src = m{ <<< (?&PerlDocument) >>>   $PPR::X::GRAMMAR }x;

       "(?&PerlEntireDocument)"

       Matches an entire valid Perl document, including leading or trailing whitespace, comments, and any  final
       "__DATA__" or "__END__" section.

       This  rule  is not context-free. It has an internal "\A" at the beginning and "\Z" at the end, so a regex
       containing "(?&PerlEntireDocument)" will only match if:

       (a) the "(?&PerlEntireDocument)" is the sole top-level element of  the  regex  (or,  at  least  the  sole
           element of a single top-level "|"-branch of the regex),

       and
       (b) the entire string being matched contains only a single valid Perl document.

       In general, if you want to check that a string consists entirely of a single valid sequence of Perl code,
       use:

           $str =~ m{ (?&PerlEntireDocument)  $PPR::X::GRAMMAR }

       If  you  want  to  check  that  a string contains at least one valid sequence of Perl code at some point,
       possibly embedded in other text, use:

           $str =~ m{ (?&PerlDocument)  $PPR::X::GRAMMAR }

       "(?&PerlStatementSequence)"

       Matches zero-or-more valid Perl statements, separated by optional POD sequences.

       "(?&PerlStatement)"

       Matches a single valid Perl statement, including:  control  structures;  "BEGIN",  "CHECK",  "UNITCHECK",
       "INIT", "END", "DESTROY", or "AUTOLOAD" blocks; variable declarations, "use" statements, etc.

       "(?&PerlExpression)"

       Matches  a  single valid Perl expression involving operators of any precedence, but not any kind of block
       (i.e. not control structures, "BEGIN" blocks, etc.) nor any trailing  statement  modifier  (e.g.   not  a
       postfix "if", "while", or "for").

       "(?&PerlLowPrecedenceNotExpression)"

       Matches  an  expression at the precedence of the "not" operator.  That is, a single valid Perl expression
       that involves operators above the precedence of "and".

       "(?&PerlAssignment)"

       Matches an assignment expression.  That is, a single valid Perl expression involving operators above  the
       precedence of comma ("," or "=>").

       "(?&PerlConditionalExpression)" or "(?&PerlScalarExpression)"

       Matches  a conditional expression that uses the "?"...":" ternary operator.  That is, a single valid Perl
       expression involving operators above the precedence of assignment.

       The alterative name comes from the fact that anything matching this rule is what most people think of  as
       a single element of a comma-separated list.

       "(?&PerlBinaryExpression)"

       Matches  an  expression  that  uses  any  high-precedence binary operators.  That is, a single valid Perl
       expression involving operators above the precedence of the ternary operator.

       "(?&PerlPrefixPostfixTerm)"

       Matches a term with optional prefix and/or postfix unary operators and/or a  trailing  sequence  of  "->"
       dereferences.   That  is,  a  single  valid  Perl  expression involving operators above the precedence of
       exponentiation ("**").

       "(?&PerlTerm)"

       Matches a simple high-precedence term within a  Perl  expression.   That  is:  a  subroutine  or  builtin
       function  call;  a  variable  declaration;  a  variable  or typeglob lookup; an anonymous array, hash, or
       subroutine constructor; a quotelike or numeric literal; a regex match; a substitution; a transliteration;
       a "do" or "eval" block; or any other expression in surrounding parentheses.

       "(?&PerlTermPostfixDereference)"

       Matches a sequence of array- or hash-lookup brackets,  or  subroutine  call  parentheses,  or  a  postfix
       dereferencer  (e.g.  "->$*"),  with  explicit  or implicit intervening "->", such as might appear after a
       term.

       "(?&PerlLvalue)"

       Matches any variable or parenthesized list of variables that could be assigned to.

       "(?&PerlPackageDeclaration)"

       Matches the declaration of any package (with or without a defining block).

       "(?&PerlSubroutineDeclaration)"

       Matches the declaration of any named subroutine (with or without a defining block).

       "(?&PerlUseStatement)"

       Matches a "use <module name> ...;" or "use <version number>;" statement.

       "(?&PerlReturnStatement)"

       Matches a "return <expression>;" or "return;" statement.

       "(?&PerlReturnExpression)"

       Matches a "return <expression>" as an expression without trailing end-of-statement markers.

       "(?&PerlControlBlock)"

       Matches an "if", "unless", "while", "until", "for", or "foreach" statement, including its block.

       "(?&PerlDoBlock)"

       Matches a "do"-block expression.

       "(?&PerlEvalBlock)"

       Matches a "eval"-block expression.

       "(?&PerlTryCatchFinallyBlock)"

       Matches an "try" block, followed by an option "catch" block, followed by  an  optional  "finally"  block,
       using the built-in syntax introduced in Perl v5.34 and v5.36.

       Note  that  if  your  code  uses  one  of  the many CPAN modules (such as "Try::Tiny" or "TryCatch") that
       provided try/catch behaviours prior to Perl v5.34, then you  will  most  likely  need  to  override  this
       subrule to match the alternate "try"/"catch" syntax provided by your preferred module.

       For  example,  if  your  code  uses  the  "TryCatch" module, you would need to alter the PPR::X parser by
       explicitly redefining the subrule for "try" blocks, with something like:

           my $MATCH_A_PERL_DOCUMENT = qr{

               \A (?&PerlEntireDocument) \Z

               (?(DEFINE)
                   # Redefine this subrule to match TryCatch syntax...
                   (?<PerlTryCatchFinallyBlock>
                           try                                  (?>(?&PerlOWS))
                           (?>(?&PerlBlock))
                       (?:                                      (?>(?&PerlOWS))
                           catch                                (?>(?&PerlOWS))
                       (?: \( (?>(?&PPR_X_balanced_parens)) \)    (?>(?&PerlOWS))  )?+
                           (?>(?&PerlBlock))
                       )*+
                   )
               )

               $PPR::X::GRAMMAR
           }xms;

       Note that the popular "Try::Tiny" module actually implements "try"/"catch"  as  a  normally  parsed  Perl
       subroutine  call  expression, rather than a statement.  This means that the unmodified PPR::X grammar can
       successfully parse all the module's constructs.

       However, the unmodified PPR::X grammar may misclassify some "Try::Tiny" usages  as  being  built-in  Perl
       v5.36  "try"  blocks followed by an unrelated call to the "catch" subroutine, rather than identifying the
       "try" and "catch" as a single expression containing two subroutine calls.

       If that difference in interpretation  matters  to  you,  you  can  deactivate  the  built-in  Perl  v5.36
       "try"/"catch" syntax entirely, like so:

           my $MATCH_A_PERL_DOCUMENT = qr{
               \A (?&PerlEntireDocument) \Z

               (?(DEFINE)
                   # Turn off built-in try/catch syntax...
                   (?<PerlTryCatchFinallyBlock>   (?!)  )

                   # Decanonize 'try' and 'catch' as reserved words ineligible for sub names...
                   (?<PPR_X_X_non_reserved_identifier>
                       (?! (?> for(?:each)?+ | while   | if    | unless | until | given | when   | default
                           |   sub | format  | use     | no    | my     | our   | state  | defer | finally
                           # Note: Removed 'try' and 'catch' which appear here in the original subrule
                           |   (?&PPR_X_X_named_op)
                           |   [msy] | q[wrxq]?+ | tr
                           |   __ (?> END | DATA ) __
                           )
                           \b
                       )
                       (?>(?&PerlQualifiedIdentifier))
                       (?! :: )
                   )

               )

               $PPR::X::GRAMMAR
           }xms;

       For more details and options for modifying PPR::X grammars in this way, see also the documentation of the
       "PPR_X" module.

       "(?&PerlStatementModifier)"

       Matches  an  "if",  "unless",  "while",  "until",  "for", or "foreach" modifier that could appear after a
       statement. Only matches the modifier, not the preceding statement.

       "(?&PerlFormat)"

       Matches a "format" declaration, including its terminating "dot".

       "(?&PerlBlock)"

       Matches a "{"..."}"-delimited block containing zero-or-more statements.

       "(?&PerlCall)"

       Matches a call to a subroutine or built-in function.  Accepts all  valid  call  syntaxes,  either  via  a
       literal  names  or a reference, with or without a leading "&", with or without arguments, with or without
       parentheses on any argument list.

       "(?&PerlAttributes)"

       Matches a list of colon-preceded attributes,  such  as  might  be  specified  on  the  declaration  of  a
       subroutine or a variable.

       "(?&PerlCommaList)"

       Matches  a  list of zero-or-more comma-separated subexpressions.  That is, a single valid Perl expression
       that involves operators above the precedence of "not".

       "(?&PerlParenthesesList)"

       Matches a list of zero-or-more comma-separated subexpressions inside a set of parentheses.

       "(?&PerlList)"

       Matches either a parenthesized or  unparenthesized  list  of  comma-separated  subexpressions.  That  is,
       matches anything that either of the two preceding rules would match.

       "(?&PerlAnonymousArray)"

       Matches  an  anonymous  array  constructor.  That is: a list of zero-or-more subexpressions inside square
       brackets.

       "(?&PerlAnonymousHash)"

       Matches an anonymous hash constructor.  That is: a  list  of  zero-or-more  subexpressions  inside  curly
       brackets.

       "(?&PerlArrayIndexer)"

       Matches a valid indexer that could be applied to look up elements of a array.  That is: a list of or one-
       or-more subexpressions inside square brackets.

       "(?&PerlHashIndexer)"

       Matches  a  valid indexer that could be applied to look up entries of a hash.  That is: a list of or one-
       or-more subexpressions inside curly brackets, or a simple bareword indentifier inside curley brackets.

       "(?&PerlDiamondOperator)"

       Matches anything in angle brackets.  That is: any "diamond" readline (e.g. "<$filehandle>"  or  file-grep
       operation (e.g. "<*.pl>").

       "(?&PerlComma)"

       Matches a short (",") or long ("=>") comma.

       "(?&PerlPrefixUnaryOperator)"

       Matches any high-precedence prefix unary operator.

       "(?&PerlPostfixUnaryOperator)"

       Matches any high-precedence postfix unary operator.

       "(?&PerlInfixBinaryOperator)"

       Matches any infix binary operator whose precedence is between ".." and "**".

       "(?&PerlAssignmentOperator)"

       Matches any assignment operator, including all op"=" variants.

       "(?&PerlLowPrecedenceInfixOperator)"

       Matches "and", <or>, or "xor".

       "(?&PerlAnonymousSubroutine)"

       Matches an anonymous subroutine.

       "(?&PerlVariable)"

       Matches any type of access on any scalar, array, or hash variable.

       "(?&PerlVariableScalar)"

       Matches  any  scalar variable, including fully qualified package variables, punctuation variables, scalar
       dereferences, and the $#array syntax.

       "(?&PerlVariableArray)"

       Matches any array variable, including fully qualified package variables, punctuation variables, and array
       dereferences.

       "(?&PerlVariableHash)"

       Matches any hash variable, including fully qualified package variables, punctuation variables,  and  hash
       dereferences.

       "(?&PerlTypeglob)"

       Matches a typeglob.

       "(?&PerlScalarAccess)"

       Matches  any  kind  of variable access beginning with a "$", including fully qualified package variables,
       punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.

       "(?&PerlScalarAccessNoSpace)"

       Matches any kind of variable access beginning with a "$", including fully  qualified  package  variables,
       punctuation  variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.
       But does not allow spaces between the components of the variable access (i.e. imposes the same constraint
       as within an interpolating quotelike).

       "(?&PerlScalarAccessNoSpaceNoArrow)"

       Matches any kind of variable access beginning with a "$", including fully  qualified  package  variables,
       punctuation  variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.
       But does not allow spaces or arrows between the components of the variable access (i.e. imposes the  same
       constraint as within a "<...>"-delimited interpolating quotelike).

       "(?&PerlArrayAccess)"

       Matches  any kind of variable access beginning with a "@", including arrays, array dereferences, and list
       slices of arrays or hashes.

       "(?&PerlArrayAccessNoSpace)"

       Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and  list
       slices  of  arrays  or  hashes.   But does not allow spaces between the components of the variable access
       (i.e. imposes the same constraint as within an interpolating quotelike).

       "(?&PerlArrayAccessNoSpaceNoArrow)"

       Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and  list
       slices  of  arrays or hashes.  But does not allow spaces or arrows between the components of the variable
       access (i.e. imposes the same constraint as within a "<...>"-delimited interpolating quotelike).

       "(?&PerlHashAccess)"

       Matches any kind of variable access beginning with a "%", including hashes, hash  dereferences,  and  kv-
       slices of hashes or arrays.

       "(?&PerlLabel)"

       Matches a colon-terminated label.

       "(?&PerlLiteral)"

       Matches a literal value.  That is: a number, a "qr" or "qw" quotelike, a string, or a bareword.

       "(?&PerlString)"

       Matches  a  string literal.  That is: a single- or double-quoted string, a "q" or "qq" string, a heredoc,
       or a version string.

       "(?&PerlQuotelike)"

       Matches any form of quotelike operator.  That is: a single-  or  double-quoted  string,  a  "q"  or  "qq"
       string, a heredoc, a version string, a "qr", a "qw", a "qx", a "/.../" or "m/.../" regex, a substitution,
       or a transliteration.

       "(?&PerlHeredoc)"

       Matches  a  heredoc  specifier.   That  is:  just  the  initial "<<TERMINATOR>" component, not the actual
       contents of the heredoc on the subsequent lines.

       This rule only matches a heredoc specifier if that specifier is correctly followed on the  next  line  by
       any heredoc contents and then the correct terminator.

       However,  if  the  heredoc  specifier is correctly matched, subsequent calls to either of the whitespace-
       matching rules ("(?&PerlOWS)" or "(?&PerlNWS)") will also consume the trailing heredoc contents  and  the
       terminator.

       So, for example, to correctly match a heredoc plus its contents you could use something like:

           m/ (?&PerlHeredoc) (?&PerlOWS)  $PPR::X::GRAMMAR /x

       or, if there may be trailing items on the same line as the heredoc specifier:

           m/ (?&PerlHeredoc)
              (?<trailing_items> [^\n]* )
              (?&PerlOWS)

              $PPR::X::GRAMMAR
           /x

       Note that the saeme limitations apply to other constructs that match heredocs, such a "(?&PerlQuotelike)"
       or "(?&PerlString)".

       "(?&PerlQuotelikeQ)"

       Matches a single-quoted string, either a '...' or a "q/.../" (with any valid delimiters).

       "(?&PerlQuotelikeQQ)"

       Matches a double-quoted string, either a "..."  or a "qq/.../" (with any valid delimiters).

       "(?&PerlQuotelikeQW)"

       Matches a "quotewords" list.  That is a "qw/ list of words /" (with any valid delimiters).

       "(?&PerlQuotelikeQX)"

       Matches a "qx" system call, either a `...` or a "qx/.../" (with any valid delimiters)

       "(?&PerlQuotelikeS)" or "(?&PerlSubstitution)"

       Matches  a  substitution  operation.   That  is:  "s/.../.../"  (with  any valid delimiters and any valid
       trailing modifiers).

       "(?&PerlQuotelikeTR)" or "(?&PerlTransliteration)"

       Matches a transliteration operation.  That is: "tr/.../.../" or "y/.../.../" (with any  valid  delimiters
       and any valid trailing modifiers).

       "(?&PerlContextualQuotelikeM)" or "(?&PerContextuallMatch)"

       Matches a regex-match operation in any context where it would be allowed in valid Perl.  That is: "/.../"
       or "m/.../" (with any valid delimiters and any valid trailing modifiers).

       "(?&PerlQuotelikeM)" or "(?&PerlMatch)"

       Matches  a  regex-match operation.  That is: "/.../" or "m/.../" (with any valid delimiters and any valid
       trailing modifiers) in any context (i.e. even in places where it would not normally be allowed  within  a
       valid piece of Perl code).

       "(?&PerlQuotelikeQR)"

       Matches a "qr" regex constructor (with any valid delimiters and any valid trailing modifiers).

       "(?&PerlContextualRegex)"

       Matches  a  "qr"  regex  constructor  or  a  "/.../"  or  "m/.../"  regex-match operation (with any valid
       delimiters and any valid trailing modifiers) anywhere where either would be allowed in valid Perl.

       In other words: anything capable of matching within valid Perl code.

       "(?&PerlRegex)"

       Matches a "qr" regex constructor or a "/.../" or "m/.../" regex-match operation in any context (i.e. even
       in places where it would not normally be allowed within a valid piece of Perl code).

       In other words: anything capable of matching.

       "(?&PerlBuiltinFunction)"

       Matches the name of any builtin function.

       To match an actual call to a built-in function, use:

           m/
               (?= (?&PerlBuiltinFunction) )
               (?&PerlCall)
           /x

       "(?&PerlNullaryBuiltinFunction)"

       Matches the name of any builtin function that never takes arguments.

       To match an actual call to a built-in function that never takes arguments, use:

           m/
               (?= (?&PerlNullaryBuiltinFunction) )
               (?&PerlCall)
           /x

       "(?&PerlVersionNumber)"

       Matches any number or version-string that can be used as a  version  number  within  a  "use",  "no",  or
       "package" statement.

       "(?&PerlVString)"

       Matches a version-string (a.k.a v-string).

       "(?&PerlNumber)"

       Matches  a  valid  number,  including binary, octal, decimal and hexadecimal integers, and floating-point
       numbers with or without an exponent.

       "(?&PerlIdentifier)"

       Matches a simple, unqualified identifier.

       "(?&PerlQualifiedIdentifier)"

       Matches a qualified or unqualified identifier, which may use either "::" or "'" as  internal  separators,
       but only "::" as initial or terminal separators.

       "(?&PerlOldQualifiedIdentifier)"

       Matches  a  qualified  or  unqualified  identifier, which may use either "::" or "'" as both internal and
       external separators.

       "(?&PerlBareword)"

       Matches a valid bareword.

       Note that this is not the same as an simple identifier, nor the same as a qualified identifier.

       "(?&PerlPod)"

       Matches a single POD section containing any contiguous set of POD directives, up to the first  "=cut"  or
       end-of-file.

       "(?&PerlPodSequence)"

       Matches any sequence of POD sections, separated and /or surrounded by optional whitespace.

       "(?&PerlNWS)"

       Match  one-or-more  characters  of  necessary whitespace, including spaces, tabs, newlines, comments, and
       POD.

       "(?&PerlOWS)"

       Match zero-or-more characters of optional whitespace, including spaces,  tabs,  newlines,  comments,  and
       POD.

       "(?&PerlOWSOrEND)"

       Match  zero-or-more  characters  of optional whitespace, including spaces, tabs, newlines, comments, POD,
       and any trailing "__END__" or "__DATA__" section.

       "(?&PerlEndOfLine)"

       Matches a single newline ("\n") character.

       This is provided mainly to allow newlines to be "hooked" by redefining "(?<PerlEndOfLine>)" (for example,
       to count lines during a parse).

       "(?&PerlKeyword)"

       Match a pluggable keyword.

       Note that there are no pluggable keywords in the default PPR::X regex; they must be  added  by  the  end-
       user.  See the following section for details.

   Extending the Perl syntax with keywords
       In  Perl  5.12  and later, it's possible to add new types of statements to the language using a mechanism
       called "pluggable keywords".

       This mechanism (best accessed via CPAN modules such as "Keyword::Simple" or "Keyword::Declare") acts like
       a limited macro facility. It detects when a statement begins with a  particular,  pre-specified  keyword,
       passes  the  trailing  text  to an associated keyword handler, and replaces the trailing source code with
       whatever the keyword handler produces.

       For example, the Dios module uses this mechanism to add keywords such as "class", "method", and "has"  to
       Perl 5, providing a declarative OO syntax. And the Object::Result module uses pluggable keywords to add a
       "result" statement that simplifies returning an ad hoc object from a subroutine.

       Unfortunately, because such modules effectively extend the standard Perl syntax, by default PPR::X has no
       way of successfully parsing them.

       However,  when  setting  up  a regex using $PPR::X::GRAMMAR it is possible to extend that grammar to deal
       with new keywords...by defining a rule named "(?<PerlKeyword>...)".

       This rule is always tested as the first option within  the  standard  "(?&PerlStatement)"  rule,  so  any
       syntax declared within effectively becomes a new kind of statement. Note that each alternative within the
       rule must begin with a valid "keyword" (that is: a simple identifier of some kind).

       For example, to support the three keywords from Dios:

           $Dios::GRAMMAR = qr{

               # Add a keyword rule to support Dios...
               (?(DEFINE)
                   (?<PerlKeyword>

                           class                              (?&PerlOWS)
                           (?&PerlQualifiedIdentifier)        (?&PerlOWS)
                       (?: is (?&PerlNWS) (?&PerlIdentifier)  (?&PerlOWS) )*+
                           (?&PerlBlock)
                   |
                           method                             (?&PerlOWS)
                           (?&PerlIdentifier)                 (?&PerlOWS)
                       (?: (?&kw_balanced_parens)             (?&PerlOWS) )?+
                       (?: (?&PerlAttributes)                 (?&PerlOWS) )?+
                           (?&PerlBlock)
                   |
                           has                                (?&PerlOWS)
                       (?: (?&PerlQualifiedIdentifier)        (?&PerlOWS) )?+
                           [\@\$%][.!]?(?&PerlIdentifier)     (?&PerlOWS)
                       (?: (?&PerlAttributes)                 (?&PerlOWS) )?+
                       (?: (?: // )?+ =                       (?&PerlOWS)
                           (?&PerlExpression)                 (?&PerlOWS) )?+
                       (?> ; | (?= \} ) | \z )
                   )

                   (?<kw_balanced_parens>
                       \( (?: [^()]++ | (?&kw_balanced_parens) )*+ \)
                   )
               )

               # Add all the standard PPR::X rules...
               $PPR::X::GRAMMAR
           }x;

           # Then parse with it...

           $source_code =~ m{ \A (?&PerlDocument) \Z  $Dios::GRAMMAR }x;

       Or, to support the "result" statement from "Object::Result":

           my $ORK_GRAMMAR = qr{

               # Add a keyword rule to support Object::Result...
               (?(DEFINE)
                   (?<PerlKeyword>
                       result                        (?&PerlOWS)
                       \{                            (?&PerlOWS)
                       (?: (?> (?&PerlIdentifier)
                           |   < [[:upper:]]++ >
                           )                         (?&PerlOWS)
                           (?&PerlParenthesesList)?+      (?&PerlOWS)
                           (?&PerlBlock)             (?&PerlOWS)
                       )*+
                       \}
                   )
               )

               # Add all the standard PPR::X rules...
               $PPR::X::GRAMMAR
           }x;

           # Then parse with it...

           $source_code =~ m{ \A (?&PerlDocument) \Z  $ORK_GRAMMAR }x;

       Note  that,  although  pluggable  keywords  are  only available from Perl 5.12 onwards, PPR::X will still
       accept "(&?PerlKeyword)" extensions under Perl 5.10.

   Extending the Perl syntax in other ways
       Other modules (such as "Devel::Declare" and "Filter::Simple") make it possible to extend Perl  syntax  in
       even  more  flexible ways.  The PPR::X module provides support for syntactic extensions more general than
       pluggable keywords.

       PPR::X allows any of its public rules to be redefined in a particular regex. For  example,  to  create  a
       regex that matches standard Perl syntax, but which allows the keyword "fun" as a synonym for "sub":

           my $FUN_GRAMMAR = qr{

               # Extend the subroutine-matching rules...
               (?(DEFINE)
                   (?<PerlStatement>
                       # Try the standard syntax...
                       (?&PerlStdStatement)
                   |
                       # Try the new syntax...
                       fun                               (?&PerlOWS)
                       (?&PerlOldQualifiedIdentifier)    (?&PerlOWS)
                       (?: \( [^)]*+ \) )?+              (?&PerlOWS)
                       (?: (?&PerlAttributes)            (?&PerlOWS) )?+
                       (?> ; | (?&PerlBlock) )
                   )

                   (?<PerlAnonymousSubroutine>
                       # Try the standard syntax
                       (?&PerlStdAnonymousSubroutine)
                   |
                       # Try the new syntax
                       fun                               (?&PerlOWS)
                       (?: \( [^)]*+ \) )?+              (?&PerlOWS)
                       (?: (?&PerlAttributes)            (?&PerlOWS) )?+
                       (?> ; | (?&PerlBlock) )
                   )
               )

               $PPR::X::GRAMMAR
           }x;

       Note  first  that any redefinitions of the various rules have to be specified before the interpolation of
       the standard rules (so that the new rules take syntactic precedence over the originals).

       The structure of each redefinition is essentially identical.  First try the original rule, which is still
       accessible as "(?&PerlStd...)"  (instead of "(?&Perl...)"). Otherwise, try the new alternative, which may
       be constructed out of other rules.
           original rule.

       There is no absolute requirement to try the original rule as part of the new rule, but if you don't  then
       you  are replacing the rule, rather than extending it. For example, to replace the low-precedence boolean
       operators ("and", "or", "xor", and "not") with their Latin equivalents:

           my $GRAMMATICA = qr{

               # Verbum sapienti satis est...
               (?(DEFINE)

                   # Iunctiones...
                   (?<PerlLowPrecedenceInfixOperator>
                       atque | vel | aut
                   )

                   # Contradicetur...
                   (?<PerlLowPrecedenceNotExpression>
                       (?: non  (?&PerlOWS) )*+  (?&PerlCommaList)
                   )
               )

               $PPR::X::GRAMMAR
           }x;

       Or to maintain a line count within the parse:

           my $COUNTED_GRAMMAR = qr{

               (?(DEFINE)

                   (?<PerlEndOfLine>
                       # Try the standard syntax
                       (?&PerlStdEndOfLine)

                       # Then count the line (must localize, to handle backtracking)...
                       (?{ local $linenum = $linenum + 1; })
                   )
               )

               $PPR::X::GRAMMAR
           }x;

   Comparison with PPI
       The PPI and PPR::X modules can both identify valid Perl code, but they do so in very different ways,  and
       are optimal for different purposes.

       PPI  scans an entire Perl document and builds a hierarchical representation of the various components. It
       is therefore suitable for recognition, validation, partial extraction,  and  in-place  transformation  of
       Perl code.

       PPR::X  matches  only as much of a Perl document as specified by the regex you create, and does not build
       any hierarchical representation of the various components  it  matches.  It  is  therefore  suitable  for
       recognition  and  validation of Perl code. However, unless great care is taken, PPR::X is not as reliable
       as PPI for extractions or transformations of components smaller than a single statement.

       On the other hand, PPI always has to parse its entire input, and build a complete non-trivial nested data
       structure for it, before it can be used to recognize or validate any component. So it  is  almost  always
       significantly slower and more complicated than PPR::X for those kinds of tasks.

       For  example, to determine whether an input string begins with a valid Perl block, PPI requires something
       like:

           if (my $document = PPI::Document->new(\$input_string) ) {
               my $block = $document->schild(0)->schild(0);
               if ($block->isa('PPI::Structure::Block')) {
                   $block->remove;
                   process_block($block);
                   process_extra($document);
               }
           }

       whereas PPR::X needs just:

           if ($input_string =~ m{ \A (?&PerlOWS) ((?&PerlBlock)) (.*) }xs) {
               process_block($1);
               process_extra($2);
           }

       Moreover, the PPR::X version will be at least twice as  fast  at  recognizing  that  leading  block  (and
       usually  four  to seven times faster)...mainly because it doesn't have to parse the trailing code at all,
       nor build any representation of its hierarchical structure.

       As a simple rule of thumb, when you only need to quickly detect, identify, or confirm valid Perl (or just
       a single valid Perl component), use PPR::X.  When you  need  to  examine,  traverse,  or  manipulate  the
       internal structure or component relationships within an entire Perl document, use PPI.

DIAGNOSTICS

"Warning: This program is running under Perl 5.20..."
Due to an unsolved issue with that particular release of Perl, the single regex in the PPR::X module
takes a ridiculously long time to compile under Perl 5.20 (i.e. minutes, not milliseconds).

The code will work correctly when it eventually does compile, but the start-up delay is so extreme
that the module issues this warning, to reassure users the something is actually happening, and
explain why it's happening so slowly.

The only remedy at present is to use an older or newer version of Perl.

For all the gory details, see: <https://rt.perl.org/Public/Bug/Display.html?id=122283>
<https://rt.perl.org/Public/Bug/Display.html?id=122890>

"PPR::X::decomment() does not work under Perl 5.14"
There is a separate bug in the Perl 5.14 regex engine that prevents the decomment() subroutine from
correctly detecting the location of comments.

The subroutine throws an exception if you attempt to call it when running under Perl 5.14
specifically.

The module has no other diagnostics, apart from those Perl provides for all regular expressions.

The commonest error is to forget to add $PPR::X::GRAMMAR to a regex, in which case you will get a
standard Perl error message such as:

Reference to nonexistent named group in regex;
marked by <-- HERE in m/

(?&PerlDocument <-- HERE )

/ at example.pl line 42.

Adding $PPR::X::GRAMMAR at the end of the regex solves the problem.

CONFIGURATION AND ENVIRONMENT

       PPR::X requires no configuration files or environment variables.

DEPENDENCIES

       Requires Perl 5.10 or later.

INCOMPATIBILITIES

       None reported.

LIMITATIONS

This module works under all versions of Perl from 5.10 onwards.

However, the lastest release of Perl 5.20 seems to have significant difficulties compiling large regular
expressions, and typically requires over a minute to build any regex that incorporates the
$PPR::X::GRAMMAR rule definitions.

The problem does not occur in Perl 5.10 to 5.18, nor in Perl 5.22 or later, though the parser is still
measurably slower in all Perl versions greater than 5.20 (presumably because most regexes are measurably
slower in more modern versions of Perl; such is the price of full re-entrancy and safe lexical scoping).

The decomment() subroutine trips a separate regex engine bug in Perl 5.14 only and will not run under
that version.

There was a lingering bug in regex re-interpolation between Perl 5.18 and 5.28, which means that
interpolating a PPR::X grammar (or any other precompiled regex that uses the "(??{...})" construct) into
another regex sometimes does not work. In these cases, the spurious error message generated is usually:
Sequence (?_...) not recognized. This problem is unlikely ever to be resolved, as those versions of Perl
are no longer being maintained. The only known workaround is to upgrade to Perl 5.30 or later.

There are also constructs in Perl 5 which cannot be parsed without actually executing some code...which
the regex does not attempt to do, for obvious reasons.

BUGS

       No bugs have been reported.

       Please report any bugs or feature requests to "bug-ppr@rt.cpan.org", or  through  the  web  interface  at
       <http://rt.cpan.org>.

AUTHOR

       Damian Conway  "<DCONWAY@CPAN.org>"

LICENCE AND COPYRIGHT

       Copyright (c) 2017, Damian Conway "<DCONWAY@CPAN.org>". All rights reserved.

       This  module  is  free  software;  you  can redistribute it and/or modify it under the same terms as Perl
       itself. See perlartistic.

DISCLAIMER OF WARRANTY

       BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE,  TO  THE  EXTENT
       PERMITTED  BY  APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
       PARTIES PROVIDE THE SOFTWARE "AS  IS"  WITHOUT  WARRANTY  OF  ANY  KIND,  EITHER  EXPRESSED  OR  IMPLIED,
       INCLUDING,  BUT  NOT  LIMITED  TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
       PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF  THE  SOFTWARE  IS  WITH  YOU.  SHOULD  THE
       SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

       IN  NO  EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY
       OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE  LIABLE
       TO  YOU  FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
       THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT  LIMITED  TO  LOSS  OF  DATA  OR  DATA  BEING
       RENDERED  INACCURATE  OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE
       WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF  SUCH
       DAMAGES.

perl v5.40.0                                       2024-10-11                                        PPR::X(3pm)