Ubuntu Manpage: xsxp - eXtremely Simple Xml Parser

NAME

       xsxp - eXtremely Simple Xml Parser

SYNOPSIS

       package require Tcl 8.5 9

       package require xsxp 1.1

       package require xml

       xsxp::parse xml

       xsxp::fetch pxml path ?part?

       xsxp::fetchall pxml_list path ?part?

       xsxp::only pxml tagname

       xsxp::prettyprint pxml ?chan?

________________________________________________________________________________________________________________

DESCRIPTION

       This  package provides a simple interface to parse XML into a pure-value list.  It also provides accessor
       routines to pull out specific subtags, not unlike DOM access.  This package was written for and  is  used
       by Darren New's Amazon S3 access package.

       This is pretty lame, but I needed something like this for S3, and at the time, TclDOM would not work with
       the new 8.5 Tcl due to version number problems.

       In  addition,  this  is  a  pure-value  implementation. There is no garbage to clean up in the event of a
       thrown error, for example.  This simplifies the code for sufficiently small XML documents, which is  what
       Amazon's S3 guarantees.

       Copyright  2006 Darren New. All Rights Reserved.  NO WARRANTIES OF ANY TYPE ARE PROVIDED.  COPYING OR USE
       INDEMNIFIES THE AUTHOR IN ALL WAYS.  This software is licensed under essentially the same terms  as  Tcl.
       See LICENSE.txt for the terms.

COMMANDS

       The  package  implements  five  rather simple procedures.  One parses, one is for debugging, and the rest
       pull various parts of the parsed document out for processing.

       xsxp::parse xml
              This parses an XML document (using the standard xml tcllib module in a SAX sort of way) and builds
              a data structure which it returns if the parsing succeeded. The return value is referred to herein
              as a "pxml", or "parsed xml". The list consists of two or more elements:

              •      The first element is the name of the tag.

              •      The second element is an  array-get  formatted  list  of  key/value  pairs.  The  keys  are
                     attribute  names and the values are attribute values. This is an empty list if there are no
                     attributes on the tag.

              •      The third through end elements are the children  of  the  node,  if  any.  Each  child  is,
                     recursively, a pxml.

              •      Note that if the zero'th element, i.e. the tag name, is "%PCDATA", then the attributes will
                     be  empty  and  the  third  element  will  be  the  text of the element. In addition, if an
                     element's contents consists only of PCDATA, it will have only one child, and all the PCDATA
                     will be concatenated. In other words, this parser works poorly for XML with  elements  that
                     contain  both child tags and PCDATA.  Since Amazon S3 does not do this (and for that matter
                     most uses of XML where XML is a poor choice don't do this), this is probably not a  serious
                     limitation.

       xsxp::fetch pxml path ?part?
              pxml  is  a  parsed  XML, as returned from xsxp::parse.  path is a list of element tag names. Each
              element is the name of a child to look up, optionally followed by a hash ("#")  and  a  string  of
              digits.  An  empty  list or an initial empty element selects pxml. If no hash sign is present, the
              behavior is as if "#0" had been appended to that element. (In addition  to  a  list,  slashes  can
              separate subparts where convenient.)

              An  element  of  path  scans  the children at the indicated level for the n'th instance of a child
              whose tag matches the part of the element before the hash  sign.  If  an  element  is  simply  "#"
              followed by digits, that indexed child is selected, regardless of the tags in the children. Hence,
              an element of "#3" will always select the fourth child of the node under consideration.

              part defaults to "%ALL". It can be one of the following case-sensitive terms:

              %ALL   returns the entire selected element.

              %TAGNAME
                     returns lindex 0 of the selected element.

              %ATTRIBUTES
                     returns index 1 of the selected element.

              %CHILDREN
                     returns lrange 2 through end of the selected element, resulting in a list of elements being
                     returned.

              %PCDATA
                     returns  a  concatenation  of  all  the bodies of direct children of this node whose tag is
                     %PCDATA.  It throws an error if no such children are found.  That  is,  part=%PCDATA  means
                     return the textual content found in that node but not its children nodes.

              %PCDATA?
                     is like %PCDATA, but returns an empty string if no PCDATA is found.

       For example, to fetch the first bold text from the fifth paragraph of the body of your HTML file,

              xsxp::fetch $pxml {body p#4 b} %PCDATA

       xsxp::fetchall pxml_list path ?part?
              This  iterates over each PXML in pxml_list (which must be a list of pxmls) selecting the indicated
              path from it, building a new list with the selected data, and returning that new list.

              For example, pxml_list might be the %CHILDREN of a particular element, and the path and part might
              select from each child a sub-element in which we're interested.

       xsxp::only pxml tagname
              This iterates over the direct children of pxml and selects only those with tagname as  their  tag.
              Returns a list of matching elements.

       xsxp::prettyprint pxml ?chan?
              This outputs to chan (default stdout) a pretty-printed version of pxml.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the package it describes, will undoubtedly contain bugs and other problems.  Please
       report such in the category amazon-s3  of  the  Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].
       Please also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note  further  that  attachments  are strongly preferred over inlined patches. Attachments can be made by
       going to the Edit form of the ticket immediately after its creation, and then using the left-most  button
       in the secondary navigation bar.

KEYWORDS

       dom, parser, xml

COPYRIGHT

       2006 Darren New. All Rights Reserved.

tcllib                                                 1.1                                            xsxp(3tcl)

NAME

SYNOPSIS

DESCRIPTION

COMMANDS

BUGS, IDEAS, FEEDBACK

KEYWORDS

CATEGORY

COPYRIGHT