Provided by: txt2pdbdoc_1.4.4-8_amd64 bug

html2pdbtxt(1)                               General Commands Manual                              html2pdbtxt(1)

NAME

       html2pdbtxt - HTML to Doc Text converter for Palm Pilots

SYNOPSIS

       html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL ] file.html [ file.txt ]
       html2pdbtxt -v

DESCRIPTION

       html2pdbtxt converts HTML to text suitable for conversion to a Doc(4) file via txt2pdbdoc(1).  If no text
       filename is given, the generated text is sent to standard output.

   HTML Tags
       The  following HTML tags (and corresponding ending tags) are recognized: ADDRESS, A NAME, BLOCKQUOTE, BR,
       CENTER, DIV, DL, DT, H1, H2, H3, H4, H5, H6, OL, OPTION, PRE, P, SELECT, SCRIPT, STYLE, TABLE, TITLE, UL.
       In all cases, the most ``reasonable'' thing is done given the constraints of the Doc(4) format  which  is
       essentially  plain  text.   ALT  attributes  (typically  found in IMG tags) have their text extracted and
       placed between brackets [like this].  All other HTML tags are stripped.

   Character Entities
       Both HTML character and numeric (decimal and hexadecimal) entity references are converted to  their  byte
       value  according  to  the  ISO  8859-1 (Latin 1) character set so they appear properly on the Pilot.  For
       example, ``résumé'' becomes ``resume'' with accented letter 'e's.

   Document Title
       Unless specified with the -t option, the HTML file is scanned for  <TITLE>  ...  </TITLE>  tags  and,  if
       found, the title is extracted and put on line 1 of the generated file.

   Bookmarks
       Bookmarks are placed into the generated file wherever <A NAME="..."> tags are found in the HTML file.

OPTIONS

       -bchars   Specify the character sequence that is to serve as the bookmark indicator.  The default is (*).
                 (See the CAVEATS.)

       -ttitle   Specify  the title of the document that is to appear on line 1 of the generated file overriding
                 any title found inside the HTML file between <TITLE> ... </TITLE> tags.

       -uurl     Specify the URL the HTML file supposedly came from and put it on the line after the  title,  if
                 any, in the generated file.

       -v        Print the version number to standard output and exit.

EXAMPLE

       To convert an HTML file to Doc:

            html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
            txt2pdbdoc "`head -1 alice.txt`" alice.txt alice.pdb

CAVEATS

       1.  Some  Doc readers have a ``feature'' whereby, during the scan for bookmarks phase, they recognize the
           bookmark sequence of characters anywhere in the text and not just at the beginning of a line.

       2.  Some Doc readers do not allow the bookmark sequence to contain the > character since  they  interpret
           that as the sequence delimiter, e.g., <->> will be interpreted as the sequence being merely -.

       3.  Ordered  lists  (via  the  OL  tag) are treated as unordered lists (like the UL tag) because it would
           greatly complicate the code since it would have to be parsed rather than simple  substitutions  being
           performed.

SEE ALSO

       pdbtxt2html(1), txt2pdbdoc(1), doc(4), pdb(4)

       International  Standards  Organization.   ``ISO 8859-1: Information Processing -- 8-bit single-byte coded
       graphic character sets -- Part 1: Latin alphabet No. 1.''  1987.

       World Wide Web Consortium.  ``Character  entity  references  in  HTML  4.0.''   HTML  4.0  Specification,
       http://www.w3.org/

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

html2pdbtxt                                     January 21, 2005                                  html2pdbtxt(1)