Provided by: golf_601.4.41-1_amd64 bug

NAME

       match-regex -  (regex)

PURPOSE

       Find, or find and replace patterns in strings using regex (regular expressions).

SYNTAX

           match-regex <pattern> in <target> \
               [ \
                   ( replace-with <replace pattern> \
                       result <result> \
                       [ status <status> ] )  \
               | \
                   ( status <status> \
                       [ case-insensitive [ <case-insensitive> ] ] \
                       [ single-match [ <single-match> ] ] \
                       [ utf [ <utf> ] ] ) \
               ] \
               [ cache ] \
               [ clear-cache <clear cache> )

DESCRIPTION

       match-regex  searches  <target>  string  for  regex  <pattern>.  If  "replace-with"  is  specified,  then
       instance(s) of <pattern> in <target> are replaced with <replace pattern> string, and the result is stored
       in <result> string.

       The number of found or found/replaced patterns can be obtained in number <status> variable  (in  "status"
       clause).

       If  "replace-with"  is  not  specified, then the number of matched <pattern>s within <target> is given in
       <status> number, which in this case must be specified.

       If "case-insensitive" is used without  boolean  variable  <case-insensitive>,  or  if  <case-insensitive>
       evaluates to true, then searching for pattern is case insensitive. By default, it is case sensitive.

       If "single-match" is specified without boolean variable <single-match>, or if <single-match> evaluates to
       true,  then  only  the  very  first  occurrence  of  <pattern>  in  <target> is processed. Otherwise, all
       occurrences are processed.

       If "utf" is used, then the pattern itself and all data strings used for matching  are  treated  as  UTF-8
       strings.

       <result> and <status> variables can be created within the statement.

       If  the  pattern  is bad (i.e. <pattern> is not a correct regular expression), Golf will error out with a
       message.

       PCRE2 AND POSIX REGEX

       By default, PCRE2 regex syntax (Perl-compatible Regular Expressions v2) is used. To  use  extended  regex
       syntax  (Posix  ERE),  specify  "--posix-regex" when building your application with gg. See more below in
       Limitations about differences.

       CACHING AND PERFORMANCE

       If "cache" clause is used, then regex compilation of <pattern> will be  done  only  once  and  saved  for
       future  use. There is a significant performance benefit when match-regex executes repeatedly with "cache"
       (such as in case of web applications or in any kind of loop).  If  <pattern>  changes  and  you  need  to
       recompile  it  once  in  a while, use "clear-cache" clause. <clear cache> is a "bool" variable; the regex
       cache is cleared if it is true, and stays if it is false. For example:

           set-bool cl_c
           if-true q equal 0
               set-bool cl_c = true
           end-if
           match-regex ps in look_in replace-with "Yes it is \e1!" result res cache clear-cache cl_c

       In this case, when "q" is 0, cache will be cleared, and the pattern in variable "ps" will be  recompiled.
       In all other cases, the last computed regex stays the same.

       While  every pattern is different, when using cache, even a relatively small pattern was seen in tests to
       speed up the match-regex by about 500%, or 5x faster. Use cache whenever possible as  it  brings  parsing
       performance close to its theoretical limits.

       SUBEXPRESSIONS AND BACK-REFERENCING

       Subexpressions  are  referenced  via  a  backslash  followed  by a number. Because in strings a backslash
       followed by a number is an octal number, you must use double backslash (\e). For example:

           match-regex "(good).*(day)" \
               in "Hello, good day!" \
               replace-with "\e2 \e1" \
               result res

       will produce string "Hello, day good!" as a result in "res" variable. Each  subexpression  is  within  ()
       parenthesis,  so for instance "day" in the above pattern is the 2nd subexpression, and is back-referenced
       as \e2 in replacement.

       There can be a maximum of 23 subexpressions.

       Note that backreferences to non-existing subexpressions are ignored - for example \e4 when there are only
       3 subexpressions. Golf is "smart" about using two digits and differentiating between  \e1  and  \e10  for
       instance  -  it  takes into account the actual number of subexpressions and their validity, and selects a
       proper subexpression even when two digits are used in a backreference.

       LOOKAHEADS AND LOOKBEHINDS

       match-regex supports  syntax  for  lookaheads  (i.e.  "(?=...)"  and  "(?!...)")  and  lookbehinds  (i.e.
       "(?<=...)"  and  "(?<!...)").  See  PCRE2  pattern matching for more details. For instance, the following
       matches "bar" only if preceded by "foo":

           match-regex "\ew*(?<=foo)bar" in "foobar" status st single-match

       and the following matches "foo" if followed by "bar":

           match-regex "\ew*foo(?=bar)" in "foofoo" status st single-match

       LIMITATIONS

       If you are using older versions of PCRE2 (10.36 or earlier) such as by default in Debian 10 or Ubuntu 18,
       then instead of PCRE2, an Extended Regex Expressions (ERE) from the   built-in  Linux  regex  library  is
       used,  due to older PCRE2 versions having name conflicts with other libraries. In this case, "utf" clause
       will have no effect, and lookaheads/lookbehinds functionality will not work; possibly  a  few  others  as
       well,  however  for the most part the two are compatible. If your system uses an older PCRE2 library, you
       can upgrade to 10.37 or later to use PCRE2.

       In any case, if you need to use Posix ERE instead of PCRE2 (for compatibility, to reduce memory footprint
       or some other reason), you can use "--posix-regex" option of gg; the same  limitations  as  above  apply.
       Note that PCRE2 (the default) is generally faster than ERE.

EXAMPLES

       Use match-regex statement to find out if a string matches a pattern, for example:

           match-regex "SOME (.*) IS (.*)" in "WOW SOME THING IS GOOD HERE" status st

       In  this  case, the first parameter ("SOME (.*) IS (.*)") is a pattern and you're matching it with string
       ("WOW SOME THING IS GOOD HERE"). Since there is a match, status variable (defined on the fly as  integer)
       "st"  will  be  "1"  (meaning  one  match  was found) - in general it will contain the number of patterns
       matched.

       Search for patterns and replace them by using replace-with clause, for example:

           match-regex "SOME (.*) IS ([^ ]+)" in "WOW SOME THING IS GOOD HERE FOR SURE" replace-with "THINGS ARE \e2 YES!" result res status st

       In this case, the result from replacement will be in a new  string  variable  "res"  specified  with  the
       result clause, and it will be

           WOW THINGS ARE GOOD YES! HERE FOR SURE

       The  above  demonstrates  a  typical  use  of subexpressions in regex (meaning "()" statements) and their
       referencing with "\e1", "\e2" etc. in the order in which they appear.  Consult  regex  documentation  for
       more  information.   Status variable specified with status clause ("st" in this example) will contain the
       number of patterns matched and replaced.

       Matching is by default case sensitive. Use "case-insensitive" clause to change it  to  case  insensitive,
       for instance:

           match-regex "SOME (.*) IS (.*)" in "WOW some THING IS GOOD HERE" status st case-insensitive

       In  the  above  case, the pattern would not be found without "case-insensitive" clause because "SOME" and
       "some" would not match. This clause works the same in matching-only as well as replacing strings.

       If you want to match only the first occurrence of a pattern, use "single-match" option:

           match-regex "SOME ([^ ]+) IS ([^ ]+)" in "WOW SOME THING IS GOOD HERE AND SOME STUFF IS GOOD TOO" status st single-match

       In this case there would be two matches by default ("SOME THING IS GOOD" and "SOME STUFF  IS  GOOD")  but
       only  the  first  would  be  found.  This  clause  works  the same for replacing as well - only the first
       occurrence would be replaced.

SEE ALSO

        Regex

       match-regex See all documentation

$DATE                                               $VERSION                                           GOLF(2gg)