Provided by: inn2_2.7.3-1_amd64 bug

NAME

       storage.conf - Configuration file for storage manager

DESCRIPTION

       The file pathetc/storage.conf contains the rules to be used in assigning articles to different storage
       methods.  These rules determine where incoming articles will be stored.

       The storage manager is a unified interface between INN and a variety of different storage methods,
       allowing the news administrator to choose between different storage methods with different trade-offs (or
       even use several at the same time for different newsgroups, or articles of different sizes).  The rest of
       INN need not care what type of storage method was used for a given article; the storage manager will
       figure this out automatically when that article is retrieved via the storage API.  Note that you may also
       want to see the options provided in inn.conf(5) regarding article storage.

       The storage.conf file consists of a series of storage method entries.  Blank lines and lines beginning
       with a number sign ("#") are ignored.  The maximum number of characters in each line is 255.  The order
       of entries in this file is important, see below.

       Each entry specifies a storage method and a set of rules.  Articles which match all of the rules of a
       storage method entry will be stored using that storage method; if an article matches multiple storage
       method entries, the first one will be used.  Each entry is formatted as follows:

           method <methodname> {
               newsgroups: <wildmat>
               class: <storage_class>
               size: <minsize>[,<maxsize>]
               expires: <mintime>[,<maxtime>]
               options: <options>
               exactmatch: <bool>
               filtered: <bool>
               path: <wildmat>
           }

       If spaces or tabs are included in a value, that value must be enclosed in double quotes ("").  If either
       a number sign ("#") or a double quote are meant to be included verbatim in a value, they should be
       escaped with "\".

       <methodname> is the name of a storage method to use for articles which match the rules of this entry.
       The currently available storage methods are:

           cnfs
           timecaf
           timehash
           tradspool
           trash

       See the "STORAGE METHODS" section below for more details.

       The meanings of the keys in each storage method entry are as follows:

       newsgroups: <wildmat>
           What  newsgroups  are  stored  using  this  storage method.  <wildmat> is a uwildmat pattern which is
           matched against the newsgroups an article is posted to.  If storeonxref in  inn.conf  is  true,  this
           pattern will be matched against the newsgroup names in the Xref header field body; otherwise, it will
           be  matched  against  the  newsgroup  names  in the Newsgroups header field body (see inn.conf(5) for
           discussion of the differences between these possibilities).  Poison wildmat expressions  (expressions
           starting  with  "@")  are  allowed  and  can  be  used  to  exclude  certain group patterns: articles
           crossposted to poisoned newsgroups will not be stored  using  this  storage  method.   The  <wildmat>
           pattern is matched in order.

           There  is  no  default  newsgroups  pattern; if an entry should match all newsgroups, use an explicit
           "newsgroups: *".

       class: <storage_class>
           An identifier for this storage method entry.  <storage_class> should be a number between 0  and  255.
           It  should  be  unique  across  all  of  the  entries in this file.  It is mainly used for specifying
           expiration times by storage class as described in expire.ctl(5); "timehash" and "timecaf"  will  also
           set  the  top-level  directory  in which articles accepted by this storage class are stored.  Storage
           classes can be for instance numbered sequentially in storage.conf.

           The assignment of a particular number to a storage class is arbitrary but permanent (since it is used
           in storage tokens).  As a matter of fact, an article is assigned a storage  class  depending  on  the
           storage  rules  in  effect  at  the  time  of  its  arrival.  This identifier will not change even if
           storage.conf is modified afterwards and the same article would have  been  assigned  another  storage
           class, had it been received after that change.  The article is still perfectly valid and retrievable.
           The  only  difference will be for expiration with groupbaseexpiry set to false in inn.conf: the rules
           in expire.ctl apply to the storage class assigned to articles at their arrival.

       size: <minsize>[,<maxsize>]
           A range of article sizes (in bytes) which should be stored using this storage method.   If  <maxsize>
           is "0" or not given, the upper size of articles is limited only by maxartsize in inn.conf.  The size:
           field  is  optional and may be omitted entirely if you want articles of any size to be stored in this
           storage method (if, of course, these articles fulfill all the  other  requirements  of  this  storage
           method entry).  By default, <minsize> is set to "0".

       expires: <mintime>[,<maxtime>]
           A  range  of  article expiration times which should be stored using this storage method.  Be careful;
           this is less useful than it may appear at first.  This is based only on the Expires header  field  of
           the  article,  not  on any local expiration policies or anything in expire.ctl!  If <mintime> is non-
           zero, then this entry will not match any article without  an  Expires  header  field.   This  key  is
           therefore  only really useful for assigning articles with requested longer expire times to a separate
           storage method.  Articles only match if the time until expiration (that is to say, the amount of time
           into the future that the Expires header field of the article requests that it remain around) falls in
           the interval specified by <mintime> and <maxtime>.

           The format of these parameters is "0d0h0m0s" (days, hours, minutes, and seconds into the future).  If
           <maxtime> is "0s" or is not specified, there is no upper bound on  expire  times  falling  into  this
           entry  (note  that  this  key has no effect on when the article will actually be expired, but only on
           whether or not the article will be stored using this storage method).  This field  is  also  optional
           and  may  be  omitted entirely if you do not want to store articles according to their Expires header
           field, if any.

           A <mintime> value greater than "0s" implies that this storage method won't match any article  without
           an Expires header field.

       options: <options>
           This key is for passing special options to storage methods that require them (currently only "cnfs").
           See the "STORAGE METHODS" section below for a description of its use.

       exactmatch: <bool>
           If  this  key  is  set  to  true,  all the newsgroups in the Newsgroups header field body (or Xref if
           storeonxref in inn.conf is true) of  incoming  articles  will  be  examined  to  see  if  they  match
           newsgroups  patterns.   (Normally, any non-zero number of matching newsgroups is sufficient, provided
           no newsgroup matches a poison wildmat as described above.)  This is a boolean  value;  "true",  "yes"
           and "on" are usable to enable this key.  The case of these values is not significant.  The default is
           false.

       filtered: <bool>
           If  this key is set to true, the article must have been rejected by any enabled article filters (Perl
           or Python) for innd.  This also  requires  that  dontrejectfiltered  is  set  to  true  in  inn.conf.
           Filtered  articles are usually stored in a small CNFS buffer, or another storage method with a rather
           tight expiration policy.  This is a boolean value; "true", "yes" and "on" are usable to  enable  this
           key.  The case of these values is not significant.  The default is false.

           If  all  the  storage  classes  have  this key set to false, filtered articles are stored in the same
           storage class as accepted articles.  It is only when at least one storage class has this key  set  to
           true  than filtered articles and accepted articles are no longer stored mixed together in any storage
           class.

       path: <wildmat>
           What articles by their Path header field are stored  using  this  storage  method.   <wildmat>  is  a
           uwildmat  pattern  which is matched against the Path header field body of articles, which corresponds
           to where articles have passed, or were posted at.  Poison wildmat expressions  (expressions  starting
           with  "@")  are  allowed  and  can  be  used to exclude certain path patterns (in lieu of expressions
           starting with "!" that are not considered negated because "!" has a special meaning  in  Path  header
           fields).  The <wildmat> pattern is matched in order.

           A  typical  use  case  might  be to store articles from a spammy site in a small CNFS buffer to avoid
           overall retention impacts:

               path: "*!spam-site.example.com!not-for-mail"

           The default is to match all articles.

       If an article matches all of the constraints of an entry, it is stored via that  storage  method  and  is
       associated with that <storage_class>.  This file is scanned in order and the first matching entry is used
       to store the article.

       If an article does not match any entry, either by being posted to a newsgroup which does not match any of
       the  <wildmat>  patterns  or by being outside the size and expires ranges of all entries whose newsgroups
       pattern it does match, the article is not stored and is rejected by innd.  When this happens,  the  error
       message:

           cant store article: no matching entry in storage.conf

       is  logged  to syslog.  If you want to silently drop articles matching certain newsgroup patterns or size
       or expires ranges, assign them to the "trash" storage method  rather  than  having  them  not  match  any
       storage method entry.

STORAGE METHODS

       Currently,  there  are five storage methods available.  Each method has its pros and cons; you can choose
       any mixture of them as is suitable for  your  environment.   Note  that  each  method  has  an  attribute
       EXPENSIVESTAT  which indicates whether checking the existence of an article is expensive or not.  This is
       used to run expireover(8).

       cnfs
           The "cnfs" storage method stores articles in large cyclic buffers (CNFS stands for Cyclic  News  File
           System).   Articles  are stored in CNFS buffers in arrival order, and when the buffer fills, it wraps
           around to the beginning and stores new articles over the top of the oldest articles  in  the  buffer.
           The  expire  time  of articles stored in CNFS buffers is therefore entirely determined by how long it
           takes the buffer to wrap around, which depends on how quickly data is  being  stored  in  it.   (This
           method  is  therefore  said to have self-expire functionality.  It also means that when an article is
           cancelled, the cycbuff doesn't go back and use space until it rolls over and the whole cycbuff starts
           being reused.)  EXPENSIVESTAT is false for this method.

           CNFS has its own configuration file, cycbuff.conf, which  describes  some  subtleties  to  the  basic
           description  given above.  Storage method entries for the "cnfs" storage method must have an options:
           field specifying the metacycbuff into which articles  matching  that  entry  should  be  stored;  see
           cycbuff.conf(5) for details on metacycbuffs.

           Advantages:  By  far the fastest of all storage methods (except for "trash"), since it eliminates the
           overhead of dealing with a file system and creating new files.  Unlike all other storage methods,  it
           does  not require manual article expiration.  With CNFS, the server will never throttle itself due to
           a full spool disk, and groups are restricted to just the buffer files given so that  they  can  never
           use more than the amount of disk space allocated to them.

           Disadvantages:  Article  retention  times  are  more  difficult  to  control because old articles are
           overwritten automatically.  Attacks on Usenet, such as flooding  or  massive  amounts  of  spam,  can
           result in wanted articles expiring much faster than intended (with no warning).

       timecaf
           This  method  stores multiple articles in one file, whose name is based on the article's arrival time
           and the storage class.  The file name will be:

               <patharticles>/timecaf-nn/bb/aacc.CF

           where "nn" is the  hexadecimal  value  of  <storage_class>,  "bb"  and  "aacc"  are  the  hexadecimal
           components  of  the  arrival  time, and "CF" is a hardcoded extension.  (The arrival time, in seconds
           since the epoch, is converted to hexadecimal and interpreted as "0xaabbccdd", with  "aa",  "bb",  and
           "cc"  used  to  build the path.)  This method does not have self-expire functionality (meaning expire
           has to run periodically to delete old articles, as well as cancelled articles if  immediatecancel  is
           not set to true in inn.conf).  EXPENSIVESTAT is false for this method.

           A  given  CAF  file  contains  all  the articles received during a time frame of 4 minutes or so (256
           seconds), and is limited to 262,144 articles and about 3,5 GB.  It is enough for  normal  operations.
           The  only caveat is when you're feeding at high speed bunches of articles between two servers; you'll
           then want to limit it to that amount of articles during the time frame when a CAF file  stores  newly
           arrived articles.

           Advantages:  It  is  roughly  four times faster than "timehash" for article writes, since much of the
           file system overhead is bypassed, while still retaining the same fine control over article  retention
           time.

           Disadvantages:  Using this method means giving up all but the most careful manually fiddling with the
           article spool; in this aspect, it looks like "cnfs".  As one of  the  newer  and  least  widely  used
           storage types, "timecaf" has not been as thoroughly tested as the other methods.  It requires running
           a  nightly expire program to delete old articles by either compacting CAF files if they still contain
           available articles, or removing them.

       timehash
           This method is very similar to "timecaf" except that each article is stored in a separate file.   The
           name of the file for a given article will be:

               <patharticles>/time-nn/bb/cc/yyyy-aadd

           where  "nn" is the hexadecimal value of <storage_class>, "yyyy" is a hexadecimal sequence number, and
           "bb", "cc", and "aadd" are components of the  arrival  time  in  hexadecimal  (the  arrival  time  is
           interpreted   as   documented  above  under  "timecaf").   This  method  does  not  have  self-expire
           functionality.  Cancelled articles are removed immediately.  EXPENSIVESTAT is true for this method.

           Advantages: Heavy traffic groups do not cause bottlenecks, and a fine control  of  article  retention
           time is still possible.

           Disadvantages:  The ability to easily find all articles in a given newsgroup and manually fiddle with
           the article spool is lost, and INN still suffers from speed degradation due to file  system  overhead
           (creating  and deleting individual files is a slow operation) and from a higher inode usage.  It also
           requires a nightly expire program to delete old articles out of the news spool.

       tradspool
           Traditional spool, or "tradspool", is the traditional news article storage format.  Each  article  is
           stored in an individual text file named:

               <patharticles>/news/group/name/nnnnn

           where "news/group/name" is the name of the newsgroup to which the article was posted with each period
           changed  to  a  slash,  and  "nnnnn"  is  the  sequence number of the article in that newsgroup.  For
           crossposted articles, the article is linked into each newsgroup to which  it  is  crossposted  (using
           either hard or symbolic links).  This is the way versions of INN prior to 2.0 stored all articles, as
           well  as  being the article storage format used by C News and earlier news systems.  This method does
           not have self-expire functionality.  Cancelled articles are removed  immediately.   EXPENSIVESTAT  is
           true for this method.

           Advantages:  It  is  widely  used  and  well-understood;  it can read article spools written by older
           versions of INN and it is compatible with  all  third-party  INN  add-ons.   This  storage  mechanism
           provides  easy  and  direct  access to the articles stored on the server, makes writing programs that
           fiddle with the news spool very easy, gives fine control over article retention times, and comes with
           the scanspool support utility to perform sanity checks.

           Disadvantages: It needs a faster file system and I/O system than the cnfs and timecaf storage methods
           due to file system overhead.  Groups with heavy traffic  tend  to  create  a  bottleneck  because  of
           inefficiencies  in  storing  large  numbers of article files in a single directory.  It consumes more
           inodes and requires a nightly expire program to delete old articles out of the news spool.

       trash
           This method silently discards all articles stored in it.  Its only real uses are for testing and  for
           silently  discarding  articles  matching  a  particular  storage  method entry (for whatever reason).
           Articles stored in this method take up no disk space and can never be retrieved, so this  method  has
           self-expire functionality of a sort.  EXPENSIVESTAT is false for this method.

EXAMPLES

       The  following  sample  storage.conf  file  would  store  all  articles  posted  to alt.binaries.* in the
       "BINARIES" CNFS metacycbuff, all articles over roughly 50 KB in any other hierarchy in the  "LARGE"  CNFS
       metacycbuff,  all other articles in alt.* in one timehash class, and all other articles in any newsgroups
       in a second timehash class, except for the internal.* hierarchy which  is  stored  in  traditional  spool
       format.

           method tradspool {
               class: 1
               newsgroups: internal.*
           }
           method cnfs {
               class: 2
               newsgroups: alt.binaries.*
               options: BINARIES
           }
           method cnfs {
               class: 3
               newsgroups: *
               size: 50000
               options: LARGE
           }
           method timehash {
               class: 4
               newsgroups: alt.*
           }
           method timehash {
               class: 5
               newsgroups: *
           }

       Notice  that the last storage method entry will catch everything.  This is a good habit to get into; make
       sure that you have at least one catch-all entry just in case something you did not expect  falls  through
       the  cracks.   Notice  also that the special rule for the internal.* hierarchy is first, so it will catch
       even articles crossposted to alt.binaries.* or over 50 KB in size.

       As for poison wildmat expressions, if you have for instance an article crossposted between  misc.foo  and
       misc.bar, the pattern:

           misc.*,!misc.bar

       will match that article whereas the pattern:

           misc.*,@misc.bar

       will not match that article.  An article posted only to misc.bar will fail to match either pattern.

       Usually,  high-volume  groups and groups whose articles do not need to be kept around very long (binaries
       groups, *.jobs*, news.lists.filters, etc.) are stored in CNFS buffers.  Use the other  methods  (or  CNFS
       buffers  again)  for  everything  else.   However,  it  is  as  often  as  not most convenient to keep in
       "tradspool" special hierarchies like local hierarchies  and  hierarchies  that  should  never  expire  or
       through the spool of which you need to go manually.

HISTORY

       Written by Katsuhiro Kondou <kondou@nec.co.jp> for InterNetNews.  Rewritten into POD by Julien Elie.

SEE ALSO

       cycbuff.conf(5), expire.ctl(5), expireover(8), inn.conf(5), innd(8), libinn_uwildmat(3), scanspool(8).

INN 2.7.3                                          2025-05-19                                    STORAGE.CONF(5)