Provided by: docknot_8.0.1-1_all bug

NAME

       App::DocKnot::Spin::Text - Convert some particular text formats into HTML

SYNOPSIS

           use App::DocKnot::Spin::Text;

           my $text = App::DocKnot::Spin::Text->new({style => '/styles/faq.css'});
           $text->spin_text_file('/path/to/input', '/path/to/output.html');

REQUIREMENTS

       Perl 5.24 or later and the modules List::SomeUtils, Path::Tiny, and Sort::Versions, available from CPAN.

DESCRIPTION

       This is another of those odd breed of partially functional beasts, a text to HTML converter.

       This is not truly possible in general; people do too many varied things with their text to intuit
       document structure from it.  This is therefore a converter that will translate documents written the way
       I write.  It may or may not work for you.  The chances that it will work for you are directly
       proportional to how much your writing looks like mine.

       App::DocKnot::Spin::Text understands digest separators (lines of exactly thirty hyphens, from the minimal
       digest standard) and will treat a "Subject" header immediately after them as a section header.  Beyond
       that, headings must either be outdented, underlined on the following line, or in all caps to be
       recognized as section headers.  (Outdenting means that the regular text is indented by a few spaces, but
       headers start in column 0, or at least in a column farther to the left than the regular text.)

       Section headers that begin with numbers (with any number of periods) will be given "<a id>" tags
       containing that number prepended with "S".  As a special case of the parsing, any section with a header
       containing "contents" will have lines beginning with numbers turned into links to the appropriate <a id>
       tags in the same document.  You can use this to turn the table of contents of your minimal digest format
       FAQ into a real table of contents with links in the HTML version.

       Text with embedded whitespace more than a single space or a couple of spaces at a sentence boundary or
       after a colon (and any text with literal tabs) will be wrapped in "<pre>" tags.  So will any indented
       text that doesn't look like English paragraphs.  URLs surrounded by "<...>" or "<URL:...>" will be turned
       into links.  Other URLs will not be turned into links, nor is any effort made to turn random body text
       into links because it happens to look like a link.

       Bullet lists and numbered lists will be turned into the appropriate HTML structures.  Some attempt is
       also made to recognize description lists, but App::DocKnot::Spin::Text was written by someone who writes
       a lot of technical documentation and therefore tends to prefer "<pre>" if unsure whether something is a
       description list or preformatted text.  Description lists are therefore only going to work if the
       description titles aren't indented relative to the surrounding text.

       Regular indented paragraphs or paragraphs quoted with a consistent non-alphanumeric quote character are
       recognized and turned into HTML block quotes.

       It's worthwhile paying attention to the headers at the top of your document so that
       App::DocKnot::Spin::Text can get a few things right.  If you use RCS or CVS, put the RCS "Id" keyword as
       the first line of your document; it will be stripped out of the resulting output and
       App::DocKnot::Spin::Text will use it to determine the document revision.  This should be followed by
       regular message headers and news.answers subheaders if the document is an actual FAQ, and
       App::DocKnot::Spin::Text will use the "From" and "Subject" headers to figure out a title and headings to
       use.  As a special case, an HTML-title header in the subheaders will override any other title that
       App::DocKnot::Spin::Text thinks it should use for the document.

       App::DocKnot::Spin::Text expects your document to have an "<h1>" title, and will add one from the Subject
       header if it doesn't find one.  It will also add subheaders ("class="subheading"") giving the author
       (from the "From" header) and the last modified time and revision (from the RCS "Id" string) if there are
       no subheadings already.  If there's a subheading that contains RCS identifiers, it will be replaced by a
       nicely formatted heading generated from the RCS "Id" information in the HTML output.

       Text marked as "*bold*" using the standard asterisk notation will be surrounded by "<strong>" tags, if
       the asterisks appear to be marking bold text rather than serving as wildcards or some other function.

       App::DocKnot::Spin::Text produces output (at least in the absence of any lurking bugs) which complies
       with the XHTML 1.0 Transitional standard.  The input and output character set is assumed to be UTF-8.

CLASS METHODS

       new(ARGS)
           Create  a  new  App::DocKnot::Spin::Text  object.  A single converter object can be reused to convert
           multiple files provided that they have the same options.  ARGS should be a hash reference with one or
           more of the following keys, all of which are optional:

           output
               The path to the root of the output tree when converting a tree of files.  This will  be  used  to
               calculate  relative  path  names  for  generating  inter-page  links using the provided "sitemap"
               argument.  If "sitemap" is given, this option should also always be given.

           modified
               Add a last modified subheader to the document.  This will always be done if an RCS "Id" string is
               present in the input.  Otherwise, a last modified subheader based on the last  modification  date
               of  the  input  file will be added if the input is a file and this option is set to a true value.
               The default is false.

           sitemap
               An App::DocKnot::Spin::Sitemap object.  This will be used to create inter-page links.  For inter-
               page links, the "output" argument must also be provided.

           style
               The URL to the style sheet to use.  The appropriate HTML will be added to the "<head>" section of
               the resulting document.

           title
               The HTML page title to use.  This will also be used as the "<h1>" heading if the document doesn't
               contain one, but will not override a heading found in  the  document  (only  the  HTML  "<title>"
               attribute).

INSTANCE METHODS

       spin_text_file([INPUT[, OUTPUT]])
           Convert  a  single  text file to HTML.  INPUT is the path of the input file and OUTPUT is the path of
           the output file.  OUTPUT or both INPUT and OUTPUT may be omitted, in which  case  standard  input  or
           standard output, respectively, will be used.

           If OUTPUT is omitted, App::DocKnot::Spin::Text will not be able to obtain sitemap information even if
           a sitemap was provided, and therefore will not add inter-page links.

NOTES

       I  wrote this program because every other text to HTML converter that I've seen made specific assumptions
       about the document format and wanted you to write like it wanted you to write rather than  like  the  way
       you  wanted to write.  This program instead wants you to write like I write, which from my perspective is
       an improvement.

       I don't claim that this is the be-all and end-all of text to HTML converters, as I don't believe  such  a
       beast  exists.  I do believe it's pretty close to being the be-all and end-all of text to HTML converters
       for text that I personally have written, since I've written into it a lot of knowledge of  the  sorts  of
       text  formatting  conventions  that I use.  If you happen to use the same ones, you may be delighted with
       this module.  If you don't, you'll probably be very frustrated with it.

       In any case, I took to this project the perspective  that  whenever  there  was  something  this  program
       couldn't  handle,  I wanted to make it smarter rather than change the input.  I've mostly been successful
       at that, so far.

CAVEATS

       This program attempts to intuit structure from an unstructured markup format.  It therefore relies  on  a
       whole  bunch of fussy heuristics, poorly-understood assumptions, and sheer blind luck.  To fully document
       the boundary cases of this program would take more time and patience than  I  care  to  invest;  see  the
       source  code  if  you're curious.  This is not a predictable or easily documentable program.  Instead, it
       attempts to do what I mean without bugging me about it.

       There is therefore, at least currently, no way to control or adjust parameters in  this  program  without
       editing  it.   I  may  someday  add  that,  but  I'm  leery  of it, since the code complexity would start
       increasing exponentially if I tried to let people tweak everything.  I've given up on more than one  text
       to  HTML  converter  because it had more options than ls and expected you to try to figure out which ones
       should be used for a document yourself.

       English month names are used for the last modification dates, and the resulting HTML always declares that
       the document is in English.  This could be made configurable if anyone wishes.

AUTHOR

       Russ Allbery <rra@cpan.org>

COPYRIGHT AND LICENSE

       Copyright 1999-2002, 2004-2005, 2008, 2010, 2013, 2021-2024 Russ Allbery <rra@cpan.org>

       Permission is hereby granted, free of charge, to any  person  obtaining  a  copy  of  this  software  and
       associated  documentation  files (the "Software"), to deal in the Software without restriction, including
       without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,  and/or  sell
       copies  of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
       following conditions:

       The above copyright notice and this permission notice shall be included  in  all  copies  or  substantial
       portions of the Software.

       THE  SOFTWARE  IS  PROVIDED  "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
       LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  NONINFRINGEMENT.   IN
       NO  EVENT  SHALL  THE  AUTHORS  OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
       WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT  OF  OR  IN  CONNECTION  WITH  THE
       SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

SEE ALSO

       docknot(1), App::DocKnot::Spin, App::DocKnot::Spin::Sitemap

       This  module  is  part of the App-DocKnot distribution.  The current version of DocKnot is available from
       CPAN, or directly from its web site at <https://www.eyrie.org/~eagle/software/docknot/>.

perl v5.38.2                                       2024-07-14                      App::DocKnot::Spin::Text(3pm)