Provided by: recollcmd_1.43.0-1build1_amd64 bug

NAME

       recollindex - indexing command for the Recoll full text search system

SYNOPSIS

       recollindex -h
       recollindex [ -z|-Z ] [ -k ] [ --nopurge ] [ -P ] [ --diagsfile <diagpath> ]
       recollindex -m [ -w <secs>] [ -D ] [ -O ] [ -x ] [ -C ] [ -n|-k ]
       recollindex -i [ -Z -k -f -P ] [<path [path ...]>]
       recollindex -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
       recollindex -e [<path [path ...]>]
       recollindex -l|-S|-E
       recollindex -s <lang>
       recollindex --webcache-compact
       recollindex --webcache-burst <destdir>
       recollindex --notindexed [path [path ...]]

DESCRIPTION

       Create or update a Recoll index.

       There  are  several  modes  of operation. All modes support an optional -c <cfgdir> option to specify the
       configuration directory name, overriding the default or $RECOLL_CONFDIR (or $HOME/.recoll by default).

       The normal mode will index the set of files described in  the  configuration.   This  will  incrementally
       update  the  index  with  files that changed since the last run. If option -z is given, the index will be
       erased before starting. If option -Z is given, the index will  not  be  reset,  but  all  files  will  be
       considered as needing reindexing (in place reset).

       recollindex  does  not  process  again  files  which previously failed to index (for example because of a
       missing helper program). If option -k is given, recollindex will try again to process all  failed  files.
       Please  note  that  recollindex  may  also  decide to retry failed files if the auxiliary checking script
       defined by the "checkneedretryindexscript" configuration variable indicates that this should happen.

       The --nopurge option will disable the normal erasure of deleted documents from the  index.  This  can  be
       useful in special cases (when it is known that part of the document set is temporarily not accessible).

       The  -P  option  will force the purge pass. This is useful only if the idxnoautopurge parameter is set in
       the configuration file.

       If the option --diagsfile is  given,  the  path  given  as  parameter  will  be  truncated  and  indexing
       diagnostics will be written to it. Each line in the file will have a diagnostic type (reason for the file
       not  to be indexed), the file path, and a possible additional piece of information, which can be the MIME
       type or the archive internal path depending on the issue. The following diagnostic  types  are  currently
       defined:

              Skipped : the path matches an element of skippedPaths or skippedNames.

              NoContentSuffix : the file name suffix is found in the noContentSuffixes list.

              MissingHelper : a helper program is missing.

              Error : general error (see the log).

              NoHandler: no handler is defined for the MIME type.

              ExcludedMime : the MIME type is part of the excludedmimetypes list.

              NotIncludedMime : the onlymimetypes list is not empty and the the MIME type is not in it.

       Option  -R allows specifying a temporary file for writing error messages in case indexing fails. The same
       messages go to stderr, and the option is mostly of use to the GUI, for displaying an error popup.

       If option -m is given, recollindex is started for real time monitoring, using the file system  monitoring
       package   it   was   configured   for   (inotify   on  linux,  gamin  on  xBSD,  fsevents  on  MacOS  and
       ReadDirectoryChanges on Windows).  The program will normally detach from  the  controlling  terminal  and
       become  a daemon.  If option -D is given, it will stay in the foreground. Option -w <seconds> can be used
       to specify that the program should sleep for the specified time before indexing begins. The default value
       is 60. The daemon normally monitors the X11 session and exits when it is reset. This can be disabled with
       option -x  . You can use option -n to skip the initial incrementing  pass  which  is  normally  performed
       before  monitoring starts. Once monitoring is started, the daemon monitors the configuration and restarts
       from scratch if a change is made. You can disable this with option -C  . Option -O also keeps the process
       in foreground and in addition will have the process exit if its parent process disappears.

       recollindex -i will index individual files into the index. The stem expansion and aspell  databases  will
       not  be  updated.  The  skippedPaths  and skippedNames configuration variables will be used, so that some
       files may be skipped. You can tell recollindex to ignore skippedPaths and skippedNames by setting the  -f
       option.  This  allows  fully  custom  file selection for a given subtree, for which you would add the top
       directory to skippedPaths, and use any custom tool to generate the file list (ie: a tool  from  a  source
       code  control  system).  When run this way, the indexer normally does not perform the deleted files purge
       pass, because it cannot be sure to have seen all the existing files. You can force a purge pass with -P.

       recollindex -e will erase data for individual files from the index. The stem expansion databases will not
       be updated.

       Options -i and -e can be combined. This will first perform the purge, then the indexing.

       With options -i or -e , if no file names are given on the command line, they will be read from stdin,  so
       that you could for example run:

       find /path/to/dir -print | recollindex -e -i

       to  force  the  reindexing of a directory tree (which has to exist inside the file system area defined by
       topdirs in recoll.conf). You could mostly accomplish the same thing with

       find /path/to/dir -print | recollindex -Z -i

       The latter will perform a less thorough job of purging stale sub-documents though.

       recollindex -r mostly works like -i , but the parameter is a single directory, which will be  recursively
       updated. This mostly does nothing more than find topdir | recollindex -i but it may be more convenient to
       use when started from another program. This retries failed files by default, use option -K to change. One
       or multiple -p options can be used to set shell-type selection patterns (e.g.: *.pdf).

       recollindex -l will list the names of available language stemmers.

       recollindex  -s will build the stem expansion database for a given language, which may or may not be part
       of the list in the configuration file. If the language  is  not  part  of  the  configuration,  the  stem
       expansion  database  will  be deleted at the end of the next normal indexing run. You can get the list of
       stemmer names from the recollindex -l command. Note that this is mostly for experimental use, the  normal
       way  to  add a stemming language is to set it in the configuration, either by editing "recoll.conf" or by
       using the GUI indexing configuration dialog.
       At the time of this writing, the following languages are recognized (out of Xapian's stem.h):

       •      danish

       •      dutch

       •      english Martin Porter's 2002 revision of his stemmer

       •      english_lovins Lovin's stemmer

       •      english_porter Porter's stemmer as described in his 1980 paper

       •      finnish

       •      french

       •      german

       •      italian

       •      norwegian

       •      portuguese

       •      russian

       •      spanish

       •      swedish

       recollindex -S will rebuild the phonetic/orthographic index. This feature uses the aspell package,  which
       must be installed on the system.

       recollindex  -E will check the configuration file for topdirs and other relevant paths existence (to help
       catch typos).

       recollindex --webcache-compact will recover the space wasted by erased  page  instances  inside  the  Web
       cache. It may temporarily need to use twice the disk space used by the Web cache.

       recollindex  --webcache-burst  <destdir>  will  extract  all  entries from the Web cache to files created
       inside <destdir>. Each cache entry is extracted as two files, for the data and metadata.

       recollindex --notindexed [path [path ...]]  will check each path and print out  those  which  are  absent
       from  the  index  (with  an "ABSENT" prefix), or caused an indexing error (with an "ERROR" prefix). If no
       paths are given on the command line, the command will read them, one per line, from stdin.

       Interrupting the command: as indexing can sometimes take a long time, the command can be  interrupted  by
       sending  an  interrupt  (Ctrl-C,  SIGINT)  or terminate (SIGTERM) signal. Some time may elapse before the
       process exits, because it needs to properly flush and close the index. This can also  be  done  from  the
       recoll  GUI  (menu  entry:  File/Stop_Indexing).  After  such an interruption, the index will be somewhat
       inconsistent because some operations which are normally performed at the end of the  indexing  pass  will
       have  been  skipped (for example, the stemming and spelling databases will be inexistent or out of date).
       You just need to restart indexing at a later time to restore consistency. The indexing  will  restart  at
       the  interruption  point  (the  full  file  tree will be traversed, but files that were indexed up to the
       interruption and for which the index is still up to date will not need to be reindexed).

SEE ALSO

       recoll(1) recoll.conf(5)

                                                 8 January 2006                                   RECOLLINDEX(1)