Provided by: tcllib_2.0+dfsg-4_all bug

NAME

       grammar::fa::op - Operations on finite automatons

SYNOPSIS

       package require Tcl 8.5 9

       package require snit

       package require struct::list

       package require struct::set

       package require grammar::fa::op ?0.4.2?

       ::grammar::fa::op::constructor cmd

       ::grammar::fa::op::reverse fa

       ::grammar::fa::op::complete fa ?sink?

       ::grammar::fa::op::remove_eps fa

       ::grammar::fa::op::trim fa ?what?

       ::grammar::fa::op::determinize fa ?mapvar?

       ::grammar::fa::op::minimize fa ?mapvar?

       ::grammar::fa::op::complement fa

       ::grammar::fa::op::kleene fa

       ::grammar::fa::op::optional fa

       ::grammar::fa::op::union fa fb ?mapvar?

       ::grammar::fa::op::intersect fa fb ?mapvar?

       ::grammar::fa::op::difference fa fb ?mapvar?

       ::grammar::fa::op::concatenate fa fb ?mapvar?

       ::grammar::fa::op::fromRegex fa regex ?over?

       ::grammar::fa::op::toRegexp fa

       ::grammar::fa::op::toRegexp2 fa

       ::grammar::fa::op::toTclRegexp regexp symdict

       ::grammar::fa::op::simplifyRegexp regexp

________________________________________________________________________________________________________________

DESCRIPTION

       This package provides a number of complex operations on finite automatons (Short: FA), as provided by the
       package  grammar::fa.  The package does not provide the ability to create and/or manipulate such FAs, nor
       the  ability  to  execute  a  FA  for  a  stream  of  symbols.   Use   the   packages   grammar::fa   and
       grammar::fa::interpreter  for that.  Another package related to this is grammar::fa::compiler which turns
       a FA into an executor class which has the definition of the FA hardwired into it.

       For more information about  what  a  finite  automaton  is  see  section  FINITE  AUTOMATONS  in  package
       grammar::fa.

API

       The  package  exports the API described here.  All commands modify their first argument. I.e. whatever FA
       they compute is stored back into it. Some of the operations will construct an automaton whose states  are
       all  new,  but  related to the states in the source automaton(s). These operations take variable names as
       optional arguments where they will store mappings which describe the relationship(s).  The operations can
       be loosely partitioned into structural and language operations. The latter are defined in  terms  of  the
       language the automaton(s) accept, whereas the former are defined in terms of the structural properties of
       the involved automaton(s). Some operations are both.  Structure operations

       ::grammar::fa::op::constructor cmd
              This command has to be called by the user of the package before any other operations is performed,
              to  establish  a command which can be used to construct a FA container object. If this is not done
              several operations will fail as they are unable to construct internal and transient containers  to
              hold state and/or partial results.

              Any  container class using this package for complex operations should set its own class command as
              the constructor. See package grammar::fa for an example.

       ::grammar::fa::op::reverse fa
              Reverses the fa. This is done by reversing the direction of all transitions and swapping the  sets
              of start and final states. The language of fa changes unpredictably.

       ::grammar::fa::op::complete fa ?sink?
              Completes  the  fa  complete, but nothing is done if the fa is already complete. This implies that
              only the first in a series  of  multiple  consecutive  complete  operations  on  fa  will  perform
              anything. The remainder will be null operations.

              The language of fa is unchanged by this operation.

              This is done by adding a single new state, the sink, and transitions from all other states to that
              sink for all symbols they have no transitions for. The sink itself is made complete by adding loop
              transitions for all symbols.

              Note:  When  a FA has epsilon-transitions transitions over a symbol for a state S can be indirect,
              i.e. not attached directly to S, but to a state in the epsilon-closure of S. The symbols for  such
              indirect  transitions  count  when  computing  completeness  of  a  state.  In  other words, these
              indirectly reached symbols are not missing.

              The argument sink provides the name for the new state and  most  not  be  present  in  the  fa  if
              specified. If the name is not specified the command will name the state "sinkn", where n is set so
              that there are no collisions with existing states.

              Note  that  the  sink  state  is  not  useful by definition.  In other words, while the FA becomes
              complete, it is also not useful in the strict sense as it has a state from which  no  final  state
              can be reached.

       ::grammar::fa::op::remove_eps fa
              Removes  all epsilon-transitions from the fa in such a manner the the language of fa is unchanged.
              However nothing is done if the fa is already epsilon-free.  This implies that only the first in  a
              series of multiple consecutive complete operations on fa will perform anything. The remainder will
              be null operations.

              Note:  This  operation  may cause states to become unreachable or not useful. These states are not
              removed by this operation.  Use ::grammar::fa::op::trim for that instead.

       ::grammar::fa::op::trim fa ?what?
              Removes unwanted baggage from fa.  The legal  values  for  what  are  listed  below.  The  command
              defaults to !reachable|!useful if no specific argument was given.

              !reachable
                     Removes all states which are not reachable from a start state.

              !useful
                     Removes all states which are unable to reach a final state.

              !reachable&!useful

              !(reachable|useful)
                     Removes  all  states  which  are not reachable from a start state and are unable to reach a
                     final state.

              !reachable|!useful

              !(reachable&useful)
                     Removes all states which are not reachable from a start state or  are  unable  to  reach  a
                     final state.

       ::grammar::fa::op::determinize fa ?mapvar?
              Makes  the  fa  deterministic without changing the language accepted by the fa. However nothing is
              done if the fa is already deterministic. This implies that only the first in a series of  multiple
              consecutive  complete  operations  on  fa  will  perform  anything.  The  remainder  will  be null
              operations.

              The command will store a dictionary describing the relationship between  the  new  states  of  the
              resulting  dfa  and  the  states of the input nfa in mapvar, if it has been specified. Keys of the
              dictionary are the handles for the states of the resulting dfa, values are sets of states from the
              input nfa.

              Note: An empty dictionary signals that the command was able to make the fa  deterministic  without
              performing  a  full  subset construction, just by removing states and shuffling transitions around
              (As part of making the FA epsilon-free).

              Note: The algorithm fails to make the FA deterministic in the technical sense if  the  FA  has  no
              start  state(s),  because  determinism  requires the FA to have exactly one start states.  In that
              situation we make a best effort; and the missing start state will be the only condition preventing
              the generated result from being deterministic.  It should also be noted  that  in  this  case  the
              possibilities  for  trimming  states  from  the  FA are also severely reduced as we cannot declare
              states unreachable.

       ::grammar::fa::op::minimize fa ?mapvar?
              Creates a FA which accepts the same language as fa, but has  a  minimal  number  of  states.  Uses
              Brzozowski's method to accomplish this.

              The  command  will  store  a  dictionary describing the relationship between the new states of the
              resulting minimal fa and the states of the input fa in mapvar, if it has been specified.  Keys  of
              the  dictionary  are  the  handles  for the states of the resulting minimal fa, values are sets of
              states from the input fa.

              Note: An empty dictionary signals that the command was able to minimize the fa without  having  to
              compute new states. This should happen if and only if the input FA was already minimal.

              Note:  If  the  algorithm  has  no  start  or  final  states to work with then the result might be
              technically minimal, but have a very unexpected structure.  It should also be noted that  in  this
              case  the  possibilities  for  trimming  states from the FA are also severely reduced as we cannot
              declare states unreachable.

       Language operations All operations in this section require that all input FAs have at least one start and
       at least one final state. Otherwise the language of the FAs will not be  defined,  making  the  operation
       senseless (as it operates on the languages of the FAs in a defined manner).

       ::grammar::fa::op::complement fa
              Complements fa. This is possible if and only if fa is complete and deterministic. The resulting FA
              accepts the complementary language of fa. In other words, all inputs not accepted by the input are
              accepted by the result, and vice versa.

              The result will have all states and transitions of the input, and different final states.

       ::grammar::fa::op::kleene fa
              Applies  Kleene's  closure  to fa.  The resulting FA accepts all strings S for which we can find a
              natural number n (0 inclusive) and strings A1 ... An in the language of fa  such  that  S  is  the
              concatenation of A1 ... An.  In other words, the language of the result is the infinite union over
              finite length concatenations over the language of fa.

              The result will have all states and transitions of the input, and new start and final states.

       ::grammar::fa::op::optional fa
              Makes  the fa optional. In other words it computes the FA which accepts the language of fa and the
              empty the word (epsilon) as well.

              The result will have all states and transitions of the input, and new start and final states.

       ::grammar::fa::op::union fa fb ?mapvar?
              Combines the FAs fa and fb such that the resulting FA accepts the union of the  languages  of  the
              two FAs.

              The  result  will  have  all  states and transitions of the two input FAs, and new start and final
              states. All states of fb which exist in fa as well will be renamed, and the mapvar will contain  a
              mapping from the old states of fb to the new ones, if present.

              It  should  be  noted  that  the  result  will  be  non-deterministic,  even  if  the  inputs  are
              deterministic.

       ::grammar::fa::op::intersect fa fb ?mapvar?
              Combines the FAs fa and fb such that the resulting FA accepts the intersection of the languages of
              the two FAs. In other words, the result will accept a word if and only if the word is accepted  by
              both fa and fb. The result will be useful, but not necessarily deterministic or minimal.

              The  command  will  store  a  dictionary describing the relationship between the new states of the
              resulting fa and the pairs of states of the input FAs in mapvar, if it has been specified. Keys of
              the dictionary are the handles for the states of the resulting fa, values are pairs of states from
              the input FAs. Pairs are represented by lists. The first element in each pair will be a  state  in
              fa, the second element will be drawn from fb.

       ::grammar::fa::op::difference fa fb ?mapvar?
              Combines  the  FAs fa and fb such that the resulting FA accepts the difference of the languages of
              the two FAs. In other words, the result will accept a word if and only if the word is accepted  by
              fa, but not by fb. This can also be expressed as the intersection of fa with the complement of fb.
              The result will be useful, but not necessarily deterministic or minimal.

              The  command  will  store  a  dictionary describing the relationship between the new states of the
              resulting fa and the pairs of states of the input FAs in mapvar, if it has been specified. Keys of
              the dictionary are the handles for the states of the resulting fa, values are pairs of states from
              the input FAs. Pairs are represented by lists. The first element in each pair will be a  state  in
              fa, the second element will be drawn from fb.

       ::grammar::fa::op::concatenate fa fb ?mapvar?
              Combines  the  FAs fa and fb such that the resulting FA accepts the cross-product of the languages
              of the two FAs. I.e. a word W will be accepted by the result if  there  are  two  words  A  and  B
              accepted by fa, and fb resp. and W is the concatenation of A and B.

              The result FA will be non-deterministic.

       ::grammar::fa::op::fromRegex fa regex ?over?
              Generates  a non-deterministic FA which accepts the same language as the regular expression regex.
              If the over is specified it is treated as the set  of  symbols  the  regular  expression  and  the
              automaton  are  defined  over. The command will compute the set from the "S" constructors in regex
              when over was not specified. This set is important if and only if the complement operator  "!"  is
              used  in  regex  as  the  complementary language of an FA is quite different for different sets of
              symbols.

              The regular expression is represented by a nested list, which forms a syntax tree.  The  following
              structures are legal:

              {S x}  Atomic  regular  expression.  Everything else is constructed from these. Accepts the Symbol
                     "x".

              {. A1 A2 ...}
                     Concatenation operator. Accepts the concatenation of the regular expressions A1, A2, etc.

                     Note that this operator accepts zero or more arguments. With zero arguments the represented
                     language is epsilon, the empty word.

              {| A1 A2 ...}
                     Choice operator, also called "Alternative". Accepts all input accepted by at least  one  of
                     the regular expressions A1, A2, etc. In other words, the union of A1, A2.

                     Note that this operator accepts zero or more arguments. With zero arguments the represented
                     language is the empty language, the language without words.

              {& A1 A2 ...}
                     Intersection  operator, logical and. Accepts all input accepted which is accepted by all of
                     the regular expressions A1, A2, etc. In other words, the intersection of A1, A2.

              {? A}  Optionality operator. Accepts the empty word and anything from the regular expression A.

              {* A}  Kleene closure. Accepts the empty word and any finite concatenation of  words  accepted  by
                     the regular expression A.

              {+ A}  Positive  Kleene closure. Accepts any finite concatenation of words accepted by the regular
                     expression A, but not the empty word.

              {! A}  Complement operator. Accepts any word not accepted by the regular expression A.  Note  that
                     the  complement depends on the set of symbol the result should run over. See the discussion
                     of the argument over before.

       ::grammar::fa::op::toRegexp fa
              This command generates and returns a regular expression which accepts the  same  language  as  the
              finite   automaton  fa.  The  regular  expression  is  in  the  format  as  described  above,  for
              ::grammar::fa::op::fromRegex.

       ::grammar::fa::op::toRegexp2 fa
              This command has the same functionality  as  ::grammar::fa::op::toRegexp,  but  uses  a  different
              algorithm to simplify the generated regular expressions.

       ::grammar::fa::op::toTclRegexp regexp symdict
              This  command  generates and returns a regular expression in Tcl syntax for the regular expression
              regexp,   if   that   is   possible.   regexp   is   in   the   same   format   as   expected   by
              ::grammar::fa::op::fromRegex.

              The  command  will  fail  and  throw  an error if regexp contains complementation and intersection
              operations.

              The argument symdict is a dictionary mapping symbol names to pairs  of  syntactic  type  and  Tcl-
              regexp. If a symbol occurring in the regexp is not listed in this dictionary then single-character
              symbols  are  considered to designate themselves whereas multiple-character symbols are considered
              to be a character class name.

       ::grammar::fa::op::simplifyRegexp regexp
              This command simplifies a regular expression by applying the following algorithm first to the main
              expression and then recursively to all sub-expressions:

              [1]    Convert the expression into a finite automaton.

              [2]    Minimize the automaton.

              [3]    Convert the automaton back to a regular expression.

              [4]    Choose the shorter of original expression and expression from the previous step.

EXAMPLES

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes, will undoubtedly contain bugs and  other  problems.   Please
       report  such  in  the  category grammar_fa of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].
       Please also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments  can  be  made  by
       going  to the Edit form of the ticket immediately after its creation, and then using the left-most button
       in the secondary navigation bar.

KEYWORDS

       automaton, finite automaton, grammar, parsing, regular expression, regular  grammar,  regular  languages,
       state, transducer

CATEGORY

       Grammars and finite automata

COPYRIGHT

       Copyright (c) 2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>

tcllib                                                 0.4                                 grammar::fa::op(3tcl)