Provided by: recollcmd_1.43.0-1build1_amd64 bug

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

       This file defines the index configuration for the Recoll full-text search system.

       The  system-wide  configuration  file  is normally located inside /usr/[local]/share/recoll/examples. Any
       parameter set in the common file may be overridden by setting it  in  the  specific  index  configuration
       file, by default: $HOME/.recoll/recoll.conf

       All  recoll commands will accept a -c option or use the $RECOLL_CONFDIR environment variable to specify a
       non-default index configuration directory.

       A short extract of the file might look as follows:

       # Space-separated list of directories to index.
       topdirs =  ~/docs /usr/share/doc

       [~/somedirectory-with-utf8-txt-files]
       defaultcharset = utf-8

       There are three kinds of lines:

       •      Comment or empty.

       •      Parameter affectation.

       •      Section definition.

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'. In the following description, they also  have  a  type,
       which is mostly indicative. The two non-obvious ones are 'fn': file path, and 'dfn': directory path.

       Section  lines  allow  redefining  a  parameter  for a directory subtree. Some of the parameters used for
       indexing are looked up hierarchically from the more to the less  specific.  Not  all  parameters  can  be
       meaningfully redefined, this is specified for each in the next section.

       The tilde character (~) is expanded in file names to the name of the user's home directory.

       Some 'string' values are lists, which is only indicated by their description. In this case white space is
       used for separation, and elements with embedded spaces can be quoted with double-quotes.

OPTIONS

       topdirs = string
              Space-separated  list  of files or directories to recursively index. You can use symbolic links in
              the list, they will be followed, independently of the  value  of  the  followLinks  variable.  The
              default value is ~ : recursively index $HOME.

       monitordirs = string
              Space-separated  list  of  files or directories to monitor for updates. When running the real-time
              indexer, this allows monitoring only a subset of the whole indexed  area.  The  elements  must  be
              included in the tree defined by the 'topdirs' members.

       skippedNames = string
              File and directory names which should be ignored.  White space separated list of wildcard patterns
              (simple ones, not paths, must contain no

              Have  a  look  at  the default configuration for the initial value, some entries may not suit your
              situation. The easiest way to see it is through the GUI  Index  configuration  "local  parameters"
              panel.

              The  list in the default configuration does not exclude hidden directories (names beginning with a
              dot), which means that it may index quite a few things that you do not want. On  the  other  hand,
              email  user agents like Thunderbird usually store messages in hidden directories, and you probably
              want this indexed. One possible solution is to have ".*" in "skippedNames", and  add  things  like
              "~/.thunderbird" "~/.evolution" to "topdirs".

              Not  even  the  file  names  are  indexed  for  patterns in this list, see the "noContentSuffixes"
              variable for an alternative approach which indexes the  file  names.  Can  be  redefined  for  any
              subtree.

       skippedNames- = string
              List  of  name patterns to remove from the default skippedNames list. Allows modifying the list in
              the local configuration without copying it.

       skippedNames+ = string
              List of name patterns to add to the default skippedNames list. Allows modifying the  list  in  the
              local configuration without copying it.

       onlyNames = string
              Regular  file  name  filter  patterns.  This is normally empty. If set, only the file names not in
              skippedNames and matching one of the patterns will be considered for indexing.  Can  be  redefined
              per subtree. Does not apply to directories.

       noContentSuffixes = string
              List  of  name  endings  (not necessarily dot-separated suffixes) for which we don't try MIME type
              identification, and don't uncompress or index content.  Only  the  names  will  be  indexed.  This
              complements  the  now obsoleted recoll_noindex list from the mimemap file, which will go away in a
              future release (the move from mimemap to recoll.conf allows editing the  list  through  the  GUI).
              This  is  different  from  skippedNames  because  these are name ending matches only (not wildcard
              patterns),  and  the  file  name  itself  gets  indexed  normally.  This  can  be  redefined   for
              subdirectories.

       noContentSuffixes- = string
              List of name endings to remove from the default noContentSuffixes list.

       noContentSuffixes+ = string
              List of name endings to add to the default noContentSuffixes list.

       skippedPaths = string
              Absolute  paths  we  should not go into. Space-separated list of wildcard expressions for absolute
              filesystem paths (for files or directories). The variable must be defined at the top level of  the
              configuration file, not in a subsection.

              Any  value  in  the  list must be textually consistent with the values in topdirs, no attempts are
              made to resolve symbolic links. In practise, if, as is frequently the case, /home  is  a  link  to
              /usr/home,  your  default  topdirs  will  have  a  single  entry  "~"  which will be translated to
              "/home/yourlogin". In this case, any skippedPaths entry should start with "/home/yourlogin"  *not*
              with "/usr/home/yourlogin".

              The index and configuration directories will automatically be added to the list.

              The  expressions  are  matched  using "fnmatch(3)" with the FNM_PATHNAME flag set by default. This
              means that "/" characters must be matched explicitly. You can set "skippedPathsFnmPathname"  to  0
              to disable the use of FNM_PATHNAME (meaning that "/*/dir3" will match "/dir1/dir2/dir3").

              The  default  value contains the usual mount point for removable media to remind you that it is in
              most cases a bad idea to have  Recoll  work  on  these.  Explicitly  adding  "/media/xxx"  to  the
              "topdirs" variable will override this.

       skippedPathsFnmPathname = bool
              Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

       nowalkfn = string
              File  name  which  will  cause its parent directory to be skipped. Any directory containing a file
              with this name will be skipped as if it was part of the skippedPaths list. Ex: .recoll-noindex

       daemSkippedPaths = string
              skippedPaths equivalent specific to real time indexing. This enables  having  parts  of  the  tree
              which  are  initially  indexed  but not monitored. If daemSkippedPaths is not set, the daemon uses
              skippedPaths.

       zipUseSkippedNames = bool
              Use skippedNames inside Zip archives. Fetched directly by the rclzip.py handler. Skip the patterns
              defined  by  skippedNames  inside  Zip  archives.  Can  be  redefined  for  subdirectories.    See
              https://www.recoll.org/faqsandhowtos/FilteringOutZipArchiveMembers.html

       zipSkippedNames = string
              Space-separated list of wildcard expressions for names that should be ignored inside zip archives.
              This  is  used  directly  by  the  zip  handler. If zipUseSkippedNames is not set, zipSkippedNames
              defines the patterns to be skipped inside archives. If zipUseSkippedNames is set,  the  two  lists
              are     concatenated     and     used.    Can    be    redefined    for    subdirectories.     See
              https://www.recoll.org/faqsandhowtos/FilteringOutZipArchiveMembers.html

       followLinks = bool
              Follow symbolic links during indexing. The default is to ignore symbolic links to  avoid  multiple
              indexing  of linked files. No effort is made to avoid duplication when this option is set to true.
              This option can be set individually for each of the "topdirs" members by using  sections.  It  can
              not be changed below the "topdirs" level. Links in the "topdirs" list itself are always followed.

       indexedmimetypes = string
              Restrictive  list  of  indexed MIME types. Normally not set (in which case all supported types are
              indexed). If it is set, only the types from the list will have their contents indexed.  The  names
              will be indexed anyway if indexallfilenames is set (default). MIME type names should be taken from
              the  mimemap file (the values may be different from xdg-mime or file -i output in some cases). Can
              be redefined for subtrees.

       excludedmimetypes = string
              List of excluded MIME types. Lets you exclude some types from indexing. MIME type names should  be
              taken  from  the mimemap file (the values may be different from xdg-mime or file -i output in some
              cases) Can be redefined for subtrees.

       nomd5types = string
              MIME types for which we don't compute a md5 hash. md5 checksums are used  only  for  deduplicating
              results, and can be very expensive to compute on multimedia or other big files. This list lets you
              turn  off  md5 computation for selected types. It is global (no redefinition for subtrees). At the
              moment, it only has an effect for external handlers (exec  and  execm).  The  file  types  can  be
              specified by listing either MIME types (e.g. audio/mpeg) or handler names (e.g. rclaudio.py).

       compressedfilemaxkbs = int
              Size  limit  for  compressed  files.  We  need  to  decompress  these in a temporary directory for
              identification, which can be wasteful in some cases. Limit the waste. Negative means no  limit.  0
              results in no processing of any compressed file. Default 100 MB.

       textfilemaxmbs = int
              Size  limit  for text files. Mostly for skipping monster logs. Default 20 MB. Use a value of -1 to
              disable.

       textfilepagekbs = int
              Page size for text files. If this is set, text/plain files  will  be  divided  into  documents  of
              approximately this size. This will reduce memory usage at index time and help with loading data in
              the  preview window at query time. Particularly useful with very big files, such as application or
              system logs. Also see textfilemaxmbs and compressedfilemaxkbs.

       textunknownasplain = bool
              Process unknown text/xxx files as text/plain  Allows  indexing  misc.  text  files  identified  as
              text/whatever  by  "file" or "xdg-mime" without having to explicitely set config entries for them.
              This works fine for indexing (also will cause processing of a  lot  of  useless  files),  but  the
              documents indexed this way will be opened by the desktop viewer, even if text/plain has a specific
              editor.

       indexallfilenames = bool
              Index the file names of unprocessed files. Index the names of files the contents of which we don't
              index because of an excluded or unsupported MIME type.

       usesystemfilecommand = bool
              Use  a  system mechanism as last resort to guess a MIME type. Depending on platform and version, a
              compile-time configuration will decide if this actually executes a command or uses libmagic.  This
              last-resort identification (if the suffix-based one failed) is generally useful,  but  will  cause
              the indexing of many bogus extension-less "text" files. Also see "systemfilecommand".

       systemfilecommand = string
              Command to use for guessing the MIME type if the internal methods fail. This is ignored on Windows
              or  with Recoll 1.38+ if compiled with libmagic enabled (the default). Otherwise, this should be a
              "file -i" workalike.  The file path will be added as a last parameter to the command  line.  "xdg-
              mime"  works better than the traditional "file" command, and is now the configured default (with a
              hard-coded fallback to "file")

       processwebqueue = bool
              Decide if we process the Web queue. The queue is a directory where the Recoll Web browser  plugins
              create the copies of visited pages.

       membermaxkbs = int
              Size  limit  for  archive  members.  This  is  passed  to  the MIME handlers in the environment as
              RECOLL_FILTER_MAXMEMBERKB.

       indexStripChars = bool
              Decide if we store character case and diacritics in the index. If we  do,  searches  sensitive  to
              case  and  diacritics  can be performed, but the index will be bigger, and some marginal weirdness
              may sometimes occur. The default is a stripped index. When using multiple indexes  for  a  search,
              this parameter must be defined identically for all. Changing the value implies an index reset.

       indexStoreDocText = bool
              Decide  if  we  store the documents' text content in the index. Storing the text allows extracting
              snippets from it at query time, instead of building them from index position data.

              Newer Xapian index formats have rendered our use of  positions  list  unacceptably  slow  in  some
              cases.  The  last  Xapian index format with good performance for the old method is Chert, which is
              default for 1.2, still supported but not default in 1.4 and will be dropped in 1.6.

              The stored document text is translated from its original format  to  UTF-8  plain  text,  but  not
              stripped  of  upper-case, diacritics, or punctuation signs. Storing it increases the index size by
              10-20% typically, but also allows for nicer snippets, so it may be worth enabling it even  if  not
              strictly needed for performance if you can afford the space.

              The  variable  only has an effect when creating an index, meaning that the xapiandb directory must
              not exist yet. Its exact effect depends on the Xapian version.

              For Xapian 1.4, if the variable is set to 0, we used to use the Chert format  and  not  store  the
              text.  If  the  variable  was  1,  Glass was used, and the text stored. We don't do this any more:
              storing the text has proved to be the much better option, and dropping this possibility simplifies
              the code.

              So now, the index format for a new index is always the default, but the variable still controls if
              the text is stored or not, and the abstract generation method. With Xapian 1.4 and later, and  the
              variable  set  to 0, abstract generation may be very slow, but this setting may still be useful to
              save space if you do not use abstract generation at all, by using the appropriate setting  in  the
              GUI, and/or avoiding the Python API or recollq options which would trigger it.

       nonumbers = bool
              Decides if terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not
              be indexed if nonumbers is set ("value123" would still be). Numbers are often quite interesting to
              search  for,  and  this  should  probably not be set except for special situations, ie, scientific
              documents with huge amounts of numbers in them, where setting  nonumbers  will  reduce  the  index
              size. This can only be set for a whole index, not for a subtree.

       notermpositions = bool
              Do  not store term positions. Term positions allow for phrase and proximity searches, but make the
              index much bigger.  In some special circumstances, you may want to dispense with them.

       dehyphenate = bool
              Determines if we index "coworker" also when the input is "co-worker". This is new in version 1.22,
              and on by default. Setting the variable to off allows restoring the previous behaviour.

       indexedpunctuation = string
              String of UTF-8 punctuation characters to be indexed as words. The resulting terms  will  then  be
              searchable  and,  for  example,  by setting the parameter to "%€" (without the double quotes), you
              would be able to search separately for "100%" or "100€" Note that  "100%"  or  "100  %"  would  be
              indexed in the same way, the characters are their own word separators.

       backslashasletter = bool
              Process backslash as a normal letter. This may make sense for people wanting to index TeX commands
              as such but is not of much general use.

       underscoreasletter = bool
              Process  underscore  as  normal  letter.  This makes sense in so many cases that one wonders if it
              should not be the default.

       maxtermlength = int
              Maximum term length in Unicode characters. Words longer than this will be discarded.  The  default
              is  40  and  used to be hard-coded, but it can now be adjusted. You may need an index reset if you
              change the value.

       nocjk = bool
              Decides if specific East Asian (Chinese Korean Japanese) characters/word splitting is turned  off.
              This  will  save  a  small  amount of CPU if you have no CJK documents. If your document base does
              include such text but you are not interested in searching it, setting nocjk may be  a  significant
              time and space saver.

       cjkngramlen = int
              This  lets  you  adjust  the size of n-grams used for indexing CJK text. The default value of 2 is
              probably appropriate in most cases. A value of 3 would allow  more  precision  and  efficiency  on
              longer words, but the index will be approximately twice as large.

       hangultagger = string
              External  tokenizer  for  Korean  Hangul.  This  allows  using  an language specific processor for
              extracting  terms  from  Korean  text,  instead  of  the  generic  n-gram  term  generator.    See
              https://www.recoll.org/pages/recoll-korean.html for instructions.

       chinesetagger = string
              External  tokenizer  for  Chinese.  This  allows  using  the language specific Jieba tokenizer for
              extracting meaningful terms from Chinese text, instead of the generic n-gram term generator.   See
              https://www.recoll.org/pages/recoll-chinese.html for instructions.

       indexstemminglanguages = string
              Languages  for  which  to  create stemming expansion data. Stemmer names can be found by executing
              "recollindex -l", or this can also be set from a list in the GUI. The  values  are  full  language
              names, e.g. english, french...

       defaultcharset = string
              Default  character  set.  This  is  used for files which do not contain a character set definition
              (e.g.: text/plain). Values found inside files, e.g.  a  "charset"  tag  in  HTML  documents,  will
              override  it.  If  this  is  not  set,  the  default  character  set is the one defined by the NLS
              environment ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).  If for  some
              reason  you  want  a  general  default  which does not match your LANG and is not 8859-1, use this
              variable. This can be redefined for any sub-directory.

       unac_except_trans = string
              A list of characters, encoded in UTF-8, which should be handled specially when converting text  to
              unaccented  lowercase.  For  example,  in  Swedish,  the letter a with diaeresis has full alphabet
              citizenship and should not be turned into an a.  Each element in the space-separated list has  the
              special  character  as  first  element  and  the  translation  following. The handling of both the
              lowercase and upper-case versions of a character should be specified, as appartenance to the  list
              will  turn-off  both  standard  accent  and  case processing. The value is global and affects both
              indexing and querying.  We also convert a few confusing Unicode  characters  (quotes,  hyphen)  to
              their ASCII equivalent to avoid "invisible" search failures.

              Examples:  Swedish: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå ’'
              ❜' ʼ' ‐- unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi  flfl  ’'  ❜'  ʼ'  ‐-  a
              German  ß  unac_except_trans  =  ßss  œoe Œoe æae Æae ffff fifi flfl ’' ❜' ʼ' ‐- are not performed by
              unac,  but  it  is  unlikely  that  someone  would  type  the  composed   forms   in   a   search.
              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl ’' ❜' ʼ' ‐-

       maildefcharset = string
              Overrides  the  default  character  set for email messages which don't specify one. This is mainly
              useful for readpst (libpst) dumps, which are utf-8 but do not say so.

       localfields = string
              Set fields on all files (usually of a specific fs area). Syntax is the usual: name = value ; attr1
              = val1 ; [...]  value is empty so this needs an initial semi-colon.  This  is  useful,  e.g.,  for
              setting the rclaptg field for application selection inside mimeview.

       testmodifusemtime = bool
              Use  mtime  instead  of ctime to test if a file has been modified. The time is used in addition to
              the size, which is always used.  Setting this can reduce re-indexing  on  systems  where  extended
              attributes  are  used  (by  some  other  application),  but not indexed, because changing extended
              attributes only affects ctime.  Notes: - This may prevent detection of  change  in  some  marginal
              file  rename cases (the target would need to have the same size and mtime).  - You should probably
              also set noxattrfields to 1 in this case, except if you still prefer to  perform  xattr  indexing,
              for example if the local file update pattern makes it of value (as in general, there is a risk for
              pure extended attributes updates without file modification to go undetected). Perform a full index
              reset after changing this.

       noxattrfields = bool
              Disable  extended  attributes  conversion  to  metadata  fields.  This probably needs to be set if
              testmodifusemtime is set.

       metadatacmds = string
              Define commands to gather external metadata, e.g.  tmsu  tags.   There  can  be  several  entries,
              separated  by  semi-colons,  each  defining which field name the data goes into and the command to
              use. Don't forget the initial semi-colon. All the field names  must  be  different.  You  can  use
              aliases  in  the "field" file if necessary.  As a not too pretty hack conceded to convenience, any
              field name beginning with "rclmulti" will be taken as  an  indication  that  the  command  returns
              multiple  field  values  inside a text blob formatted as a recoll configuration file ("fieldname =
              fieldvalue" lines). The rclmultixx name will be ignored, and field names and values will be parsed
              from the data.  Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f

       cachedir = dfn
              Top directory for Recoll data. Recoll data  directories  are  normally  located  relative  to  the
              configuration  directory (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If "cachedir" is set, the
              directories are stored under the specified value instead (e.g. if cachedir is ~/.cache/recoll, the
              default dbdir would be ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir,  mboxcachedir,
              aspellDicDir,  which  can  still be individually specified to override cachedir.  Note that if you
              have multiple configurations,  each  must  have  a  different  cachedir,  there  is  no  automatic
              computation of a subpath under cachedir.

       maxfsoccuppc = int
              Maximum  file  system  occupation  over  which  we  stop  indexing.  The  value  is  a percentage,
              corresponding to what the "Capacity" df output column shows. The default value is  0,  meaning  no
              checking. This parameter is only checked when the indexer starts, it will not change the behaviour
              or a running process.

       dbdir = dfn
              Xapian database directory location. This will be created on first indexing. If the value is not an
              absolute  path,  it  will  be  interpreted  as  relative  to cachedir if set, or the configuration
              directory (-c argument or  $RECOLL_CONFDIR).   If  nothing  is  specified,  the  default  is  then
              ~/.recoll/xapiandb/

       idxstatusfile = fn
              Name  of  the  scratch  file  where the indexer process updates its status. Default: idxstatus.txt
              inside the configuration directory.

       mboxcachedir = dfn
              Directory location for storing mbox message offsets cache  files.  This  is  normally  "mboxcache"
              under  cachedir if set, or else under the configuration directory, but it may be useful to share a
              directory between different configurations.

       mboxcacheminmbs = int
              Minimum mbox file size over which we cache the offsets.  There  is  really  no  sense  in  caching
              offsets for small files. The default is 5 MB.

       mboxmaxmsgmbs = int
              Maximum  mbox  member message size in megabytes. Size over which we assume that the mbox format is
              bad or we misinterpreted it, at which point we just stop processing the file.

       webcachedir = dfn
              Directory where we store the archived web pages after they are processed. This is only used by the
              Web history indexing code. Note that this  is  different  from  webdownloadsdir  which  tells  the
              indexer  where  the  web  pages are stored by the browser, before they are indexed and stored into
              webcachedir.  Default: cachedir/webcache if cachedir is set, else $RECOLL_CONFDIR/webcache

       webcachemaxmbs = int
              Maximum size in MB of the Web archive. This is  only  used  by  the  web  history  indexing  code.
              Default: 40 MB.  Reducing the size will not physically truncate the file.

       webqueuedir = fn
              The  path  to  the  Web  indexing  queue.  This  used  to  be  hard-coded  in  the  old  plugin as
              ~/.recollweb/ToIndex so there would be no need or possibility to change it, but the  WebExtensions
              plugin  now  downloads  the  files  to  the  user  Downloads directory, and a script moves them to
              webqueuedir. The script reads this value from the config so it has become possible to change it.

       webdownloadsdir = fn
              The path to the browser add-on download directory. This tells the indexer where  the  Web  browser
              add-on  stores  the  web  page  data.  The  data  is  then  moved by a script to webqueuedir, then
              processed, and finally stored in webcachedir for future previews.

       webcachekeepinterval = string
              Page recycle interval By default, only one instance of an URL is kept in the cache.  This  can  be
              changed  by  setting  this  to  a  value  determining at what frequency we keep multiple instances
              ("day", "week", "month", "year"). Note that  increasing  the  interval  will  not  erase  existing
              entries.

       aspellDicDir = dfn
              Aspell  dictionary  storage  directory  location.  The  aspell  dictionary (aspdict.(lang).rws) is
              normally stored in the directory  specified  by  cachedir  if  set,  or  under  the  configuration
              directory.

       filtersdir = dfn
              Directory  location for executable input handlers. If RECOLL_FILTERSDIR is set in the environment,
              we use it instead. Defaults to $prefix/share/recoll/filters. Can be redefined for subdirectories.

       iconsdir = dfn
              Directory location for icons. The only reason to change this would be if you want  to  change  the
              icons displayed in the result list. Defaults to $prefix/share/recoll/images

       idxflushmb = int
              Threshold  (megabytes  of  new data) where we flush from memory to disk index. Setting this allows
              some control over memory usage by the indexer process. A value of 0 means  no  explicit  flushing,
              which  lets Xapian perform its own thing, meaning flushing every $XAPIAN_FLUSH_THRESHOLD documents
              created, modified or deleted: as memory usage depends on average document size, not only  document
              count,  the  Xapian  approach is is not very useful, and you should let Recoll manage the flushes.
              The program compiled value is 0. The configured default value (from this file) is now 50  MB,  and
              should  be  ok  in  many  cases.   You  can set it as low as 10 to conserve memory, but if you are
              looking for maximum speed, you may want to experiment with  values  between  20  and  200.  In  my
              experience, values beyond this are always counterproductive. If you find otherwise, please drop me
              a note.

       filtermaxseconds = int
              Maximum  external  filter  execution  time in seconds. Default 1200 (20mn). Set to 0 for no limit.
              This is mainly to avoid infinite loops in postscript files (loop.ps)

       filtermaxmbytes = int
              Maximum virtual memory space for filter processes (setrlimit(RLIMIT_AS)), in megabytes. Note  that
              this includes any mapped libs (there is no reliable Linux way to limit the data space only), so we
              need  to be a bit generous here. Anything over 2000 will be ignored on 32 bits machines.  The high
              default value is needed because of java-based handlers (pdftk) which need a lot of VM (most of  it
              text), esp. pdftk when executed from Python rclpdf.py. You can use a much lower value if you don't
              need Java.

       thrQSizes = string
              Task  queue  depths  for  each stage and threading configuration control. There are three internal
              queues in the indexing pipeline stages (file data extraction,  terms  generation,  index  update).
              This  parameter  defines the queue depths for each stage (three integer values). In practise, deep
              queues have not been shown to increase performance. The  first  value  is  also  used  to  control
              threading  autoconfiguration  or  disabling  multithreading.  If the first queue depth is set to 0
              Recoll will set the queue depths and thread counts based on  the  detected  number  of  CPUs.  The
              arbitrarily chosen values are as follows (depth,nthread). 1 CPU -> no threading. Less than 4 CPUs:
              (2,  2)  (2,  2) (2, 1). Less than 6: (2, 4), (2, 2), (2, 1). Else (2, 5), (2, 3), (2, 1).  If the
              first queue depth is set to -1, multithreading will be disabled entirely.  The  second  and  third
              values are ignored in both these cases.

       thrTCounts = string
              Number  of  threads  used for each indexing stage. If the first entry in thrQSizes is not 0 or -1,
              these three values define the number of threads used for each stage (file  data  extraction,  term
              generation,  index  update).   It  makes  no  sense to use a value other than 1 for the last stage
              because updating the Xapian index is necessarily single-threaded (and protected by a mutex).

       thrTmpDbCnt = int
              Number of temporary indexes used during incremental or full indexing. If not  set  to  zero,  this
              defines  how  many  temporary  indexes we use during indexing.  These temporary indexes are merged
              into the main one at the end of the operation.  Using multiple  indexes  and  a  final  merge  can
              significantly  improve indexing performance when the single-threaded Xapian index updates become a
              bottleneck. How useful this is depends on the type of input and  CPU.  See  the  manual  for  more
              details.

       loglevel = int
              Log file verbosity 1-6. A value of 2 will print only errors and warnings. 3 will print information
              like document updates, 4 is quite verbose and 6 very verbose.

       logfilename = fn
              Log file destination. Use "stderr" (default) to write to the console.

       idxloglevel = int
              Override loglevel for the indexer.

       idxlogfilename = fn
              Override logfilename for the indexer.

       helperlogfilename = fn
              Destination  file for external helpers standard error output. The external program error output is
              left alone by default, e.g. going to the terminal when the recoll[index] program is executed  from
              the  command  line. Use /dev/null or a file inside a non-existent directory to completely suppress
              the output.

       daemloglevel = int
              Override loglevel for the indexer in real time mode. The default is to use the  idx...  values  if
              set, else the log... values.

       daemlogfilename = fn
              Override logfilename for the indexer in real time mode. The default is to use the idx... values if
              set, else the log... values.

       pyloglevel = int
              Override loglevel for the python module.

       pylogfilename = fn
              Override logfilename for the python module.

       idxnoautopurge = bool
              Do  not purge data for deleted or inaccessible files This can be overridden by recollindex command
              line options and may be useful if some parts of the document set may predictably  be  inaccessible
              at times, so that you would only run the purge after making sure that everything is there.

       orgidxconfdir = dfn
              Original  location  of the configuration directory. This is used exclusively for movable datasets.
              Locating the configuration directory inside the  directory  tree  makes  it  possible  to  provide
              automatic  query  time  path translations once the data set has moved (for example, because it has
              been mounted on another location).

       curidxconfdir = dfn
              Current location of the configuration directory. Complement orgidxconfdir  for  movable  datasets.
              This  should  be  used  if the configuration directory has been copied from the dataset to another
              location, either because the dataset is readonly and an r/w copy is desired,  or  for  performance
              reasons.  This  records  the  original  moved  location  before  copy,  to  allow path translation
              computations.  For example if a dataset originally indexed as  "/home/me/mydata/config"  has  been
              mounted  to  "/media/me/mydata", and the GUI is running from a copied configuration, orgidxconfdir
              would be "/home/me/mydata/config", and curidxconfdir (as set in the copied configuration) would be
              "/media/me/mydata/config".

       idxrundir = dfn
              Indexing process current directory. The input handlers sometimes  leave  temporary  files  in  the
              current directory, so it makes sense to have recollindex chdir to some temporary directory. If the
              value  is  empty,  the current directory is not changed. If the value is (literal) tmp, we use the
              temporary directory as set by the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the  value
              is an absolute path to a directory, we go there.

       checkneedretryindexscript = fn
              Script  used  to  heuristically  check if we need to retry indexing files which previously failed.
              The default script checks the modified dates on /usr/bin and /usr/local/bin. A relative path  will
              be looked up in the filters dirs, then in the path. Use an absolute path to do otherwise.

       recollhelperpath = string
              Additional  places  to search for helper executables. This is used, e.g., on Windows by the Python
              code, and on Mac OS by the bundled recoll.app (because I  could  find  no  reliable  way  to  tell
              launchd to set the PATH). The example below is for Windows. Use ":" as entry separator for Mac and
              Ux-like systems, ";" is for Windows only.

       idxabsmlen = int
              Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.  The
              text  can  come from an actual "abstract" section in the document or will just be the beginning of
              the document. It is stored in the index so that it  can  be  displayed  inside  the  result  lists
              without  decoding  the  original  file.  The  idxabsmlen  parameter defines the size of the stored
              abstract. The default value is 250 bytes. The search interface gives you  the  choice  to  display
              this  stored text or a synthetic abstract built by extracting text around the search terms. If you
              always prefer the synthetic abstract, you can reduce this value and save a little space.

       idxmetastoredlen = int
              Truncation length of stored metadata fields. This does not affect indexing  (the  whole  field  is
              processed  anyway),  just  the  amount  of  data stored in the index for the purpose of displaying
              fields inside result lists or previews. The default value is 150 bytes which may be too low if you
              have custom fields.

       idxtexttruncatelen = int
              Truncation length for all document texts. Only index the  beginning  of  documents.  This  is  not
              recommended  except  if  you are sure that the interesting keywords are at the top and have severe
              disk space issues.

       idxsynonyms = fn
              Name of the index-time synonyms file. This is only used  to  issue  multi-word  single  terms  for
              multi-word  synonyms  so  that  phrase  and proximity searches work for them (ex: applejack "apple
              jack"). The feature will only have an effect for querying if the query-time and index-time synonym
              files are the same.

       idxniceprio = int
              "nice" process priority for the indexing processes. Default: 19  (lowest)  Appeared  with  1.26.5.
              Prior versions were fixed at 19.

       noaspell = bool
              Disable  aspell  use. The aspell dictionary generation takes time, and some combinations of aspell
              version, language, and local terms, result in aspell crashing, so it sometimes makes sense to just
              disable the thing.

       aspellLanguage = string
              Language definitions to use when creating the aspell dictionary. The value must  match  a  set  of
              aspell language definition files. You can type "aspell dicts" to see a list The default if this is
              not  set  is  to  use the NLS environment to guess the value. The values are the 2-letter language
              codes (e.g. "en", "fr"...)

       aspellAddCreateParam = string
              Additional option and parameter to aspell dictionary creation command. Some  aspell  packages  may
              need  an  additional  option (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian
              bug 772415.

       aspellKeepStderr = bool
              Set this to have a look at aspell dictionary creation errors. There are always many,  so  this  is
              mostly for debugging.

       monauxinterval = int
              Auxiliary  database  update  interval.  The real time indexer only updates the auxiliary databases
              (stemdb, aspell) periodically, because it would be too costly to do it for every document  change.
              The default period is one hour.

       monixinterval = int
              Minimum  interval  (seconds) between processings of the indexing queue. The real time indexer does
              not process each event when it comes in, but lets the queue accumulate, to diminish  overhead  and
              to aggregate multiple events affecting the same file. Default 30 S.

       mondelaypatterns = string
              Timing  parameters  for  the  real  time  indexing. Definitions for files which get a longer delay
              before reindexing is allowed. This is for fast-changing files, that should only be reindexed  once
              in   a   while.   A   list  of  wildcardPattern:seconds  pairs.  The  patterns  are  matched  with
              fnmatch(pattern, path, 0) You can quote entries containing white space with double  quotes  (quote
              the  whole  entry,  not  the pattern). The default is empty.  Example: mondelaypatterns = *.log:20
              "*with spaces.*:30"

       monioniceclass = int
              ionice class for the indexing process. Despite the misleading name, and on platforms where this is
              supported, this affects all indexing processes,  not  only  the  real  time/monitoring  ones.  The
              default value is 3 (use lowest "Idle" priority).

       monioniceclassdata = string
              ionice class level parameter if the class supports it. The default is empty, as the default "Idle"
              class has no levels.

       idxlocalguisettings = bool
              Store some GUI parameters locally to the index. GUI settings are normally stored in a global file,
              valid  for  all  indexes. Setting this parameter will make some settings, such as the result table
              setup, specific to the index.

       autodiacsens = bool
              auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide  if  we
              automatically  trigger  diacritics  sensitivity if the search term has accented characters (not in
              unac_except_trans). Else you need to use the query  language  and  the  "D"  modifier  to  specify
              diacritics sensitivity. Default is no.

       autocasesens = bool
              auto-trigger   case   sensitivity   (raw   index   only).  IF  the  index  is  not  stripped  (see
              indexStripChars), decide if we automatically trigger character case sensitivity if the search term
              has upper-case characters in any but the first position. Else you need to use the  query  language
              and the "C" modifier to specify character-case sensitivity. Default is yes.

       maxTermExpand = int
              Maximum  query  expansion  count for a single term (e.g.: when using wildcards). This only affects
              queries, not indexing. We used to not limit this at all (except for filenames where the limit  was
              too low at 1000), but it is unreasonable with a big index. Default 10000.

       maxXapianClauses = int
              Maximum  number  of  clauses  we  add  to  a  single  Xapian query. This only affects queries, not
              indexing. In some cases, the result of term expansion can be multiplicative, and we want to  avoid
              eating all the memory. Default 50000.

       snippetMaxPosWalk = int
              Maximum number of positions we walk while populating a snippet for the result list. The default of
              1,000,000  may  be  insufficient  for  very  big documents, the consequence would be snippets with
              possibly meaning-altering missing words.

       thumbnailercmd = string
              Command to use for generating thumbnails.  If set, this should be a path to a  command  or  script
              followed by its constant arguments. Four arguments will be appended before execution: the document
              URL,  MIME  type, target icon SIZE (e.g. 128), and output file PATH. The command should generate a
              thumbnail from these values. E.g. if the MIME is video,  a  script  could  use:  ffmpegthumbnailer
              -iURL -oPATH -sSIZE.

       stemexpandphrases = bool
              Default  to applying stem expansion to phrase terms. Recoll normally does not apply stem expansion
              to terms inside phrase searches.  Setting this parameter will  change  the  default  behaviour  to
              expanding  terms  inside  phrases.  If  set, you can use a "l" modifier to disable expansion for a
              specific instance.

       autoSpellRarityThreshold = int
              Inverse of the ratio of term occurrence to total db terms over which we look for spell  neighbours
              for automatic query expansion When a term is very uncommon, we may (depending on user choice) look
              for spelling variations which would be more common and possibly add them to the query.

       autoSpellSelectionThreshold = int
              Ratio  of  spell  neighbour  frequency  over user input term frequency beyond which we include the
              neighbour in the query. When a term has been  selected  for  spelling  expansion  because  of  its
              rarity, we only include spelling neighbours which are more common by this ratio.

       kioshowsubdocs = bool
              Show  embedded  document results in KDE dolphin/kio and krunner Embedded documents may clutter the
              results and are not always easily usable  from  the  kio  or  krunner  environment.  Setting  this
              variable will restrict the results to standalone documents.

       pdfocr = bool
              Attempt  OCR of PDF files with no text content. This can be defined in subdirectories. The default
              is off because OCR is so very slow.

       pdfoutline = bool
              Extract outlines and bookmarks from PDF documents  (needs  pdftohtml).  This  is  not  enabled  by
              default because it is rarely needed, and the extra command takes a little time.

       pdfattach = bool
              Enable  PDF  attachment  extraction  by executing pdftk (if available). This is normally disabled,
              because it does slow down PDF indexing a bit even if not one attachment is ever found.

       pdfextrameta = string
              Extract text from selected XMP metadata tags. This is a space-separated list of qualified XMP  tag
              names.  Each  element  can  also  include a translation to a Recoll field name, separated by a "|"
              character. If the second element is absent, the tag name is used as the Recoll  field  names.  You
              will  also  need  to add specifications to the "fields" file to direct processing of the extracted
              data.

       pdfextrametafix = fn
              Define name of XMP field editing script. This defines the name  of  a  script  to  be  loaded  for
              editing  XMP  field  values.  The script should define a "MetaFixer" class with a metafix() method
              which will be called with the qualified tag name and value of each selected field, for editing  or
              erasing.  A new instance is created for each document, so that the object can keep state for, e.g.
              eliminating duplicate values.

       ocrprogs = string
              OCR modules to try. The top OCR script will try to load the corresponding modules in order and use
              the first which reports being capable of performing OCR on the input file. Modules  for  tesseract
              (tesseract)   and  ABBYY  FineReader  (abbyy)  are  present  in  the  standard  distribution.  For
              compatibility with the previous version, if this is not defined  at  all,  the  default  value  is
              "tesseract".  Use  an  explicit  empty  value  if  needed.  A  value of "abbyy tesseract" will try
              everything.

       ocrcachedir = dfn
              Location for caching OCR data. The default if this is empty or undefined is to  store  the  cached
              OCR data under $RECOLL_CONFDIR/ocrcache.

       tesseractlang = string
              Language  to  assume for tesseract OCR. Important for improving the OCR accuracy. This can also be
              set  through  the  contents  of  a  file  in  the   currently   processed   directory.   See   the
              rclocrtesseract.py script. Example values: eng, fra... See the tesseract documentation.

       tesseractcmd = fn
              Path  for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying
              a non-default tesseract command. E.g. on Windows.  tesseractcmd =  C:/ProgramFiles(x86)/Tesseract-
              OCR/tesseract.exe

       abbyylang = string
              Language  to  assume for abbyy OCR. Important for improving the OCR accuracy. This can also be set
              through the contents of a file in  the  currently  processed  directory.  See  the  rclocrabbyy.py
              script. Typical values: English, French... See the ABBYY documentation.

       abbyyocrcmd = fn
              Path for the abbyy command The ABBY directory is usually not in the path, so you should set this.

       speechtotext = string
              Activate  speech  to  text conversion The only possible value at the moment is "whisper" for using
              the OpenAI whisper program.

       sttmodel = string
              Name of the whisper model

       sttdevice = string
              Name of the device to be used by for whisper

       orgmodesubdocs = bool
              Index org-mode level 1 sections as separate sub-documents This is the default. If  set  to  false,
              org-mode files will be indexed as plain text

       mhmboxquirks = string
              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email
              mbox files are stored.

SEE ALSO

       recollindex(1) recoll(1)

                                                14 November 2012                                  RECOLL.CONF(5)