Provided by: watchdog_5.16-1.1_amd64 bug

NAME

       watchdog.conf - configuration file for the watchdog daemon

DESCRIPTION

       This file carries all configuration options for the Linux watchdog daemon.  Each option has to be written
       on  a  line  for itself. Comments start with '#'.  Blanks are ignored except after the '=' sign. An empty
       text after the '=' sign disables the feature as long as that makes sense.

OPTIONS

       interval = <interval>
              Set the highest possible interval between two writes  to  the  watchdog  device.   The  device  is
              triggered  after  each  check  regardless of the time it took. After finishing all checks watchdog
              goes to sleep for a full cycle of <interval> seconds.  Default  value  is  1  second.  The  kernel
              drivers  typically  expects  a  write  command every minute otherwise the system will be rebooted.
              Therefore an interval of more than a minute can only be used with the  force  command-line  option
              [--force | -f].

       logtick = <logtick>
              If  you  enable  verbose logging, a message is written into the syslog or a logfile. While this is
              nice, it is not necessary to get a message every interval which really fills  up  disk  and  needs
              CPU. logtick allows adjustment of the number of intervals skipped before a log message is written.
              If  you  use  logtick  =  60  and  interval = 10, only every 10 minutes (600 seconds) a message is
              written. This may make the exact time of a crash harder to find but greatly reduces disk usage and
              administrator nerves if you're looking for a  particular  syslog  entry  in  between  of  watchdog
              messages.

       max-load-1 = <load1>
              Set  the  maximal  allowed load average for a 1 minute span. Once this load average is reached the
              system is rebooted. Default value is 0. That means the load average check is disabled. Be  careful
              not to set this parameter too low. To set a value less then the predefined minimal value of 2, you
              have to use the -f command line option.

       max-load-5 = <load5>
              Set  the  maximal  allowed load average for a 5 minute span. Once this load average is reached the
              system is rebooted. Default value is 3/4*max-load-1.  Be careful not to this parameter too low. To
              set a value less then the predefined minimal value of 2, you have  to  use  the  -f  command  line
              option.

       max-load-15 = <load15>
              Set  the  maximal allowed load average for a 15 minute span. Once this load average is reached the
              system is rebooted. Default value is 1/2*max-load-1.  Be careful not to this parameter too low. To
              set a value less then the predefined minimal value of 2, you have  to  use  the  -f  command  line
              option.

       min-memory = <minpage>
              Set  the minimal amount of memory that has to stay free. Note that this is in memory pages (4kB on
              x86). Default value is 0 pages which means this test is disabled. The page size is taken from  the
              system  include files.  The usable memory is computed from MemFree + Buffers + Cached since buffer
              and cache use typically expand to use most free memory but the kernel will reclaim this as needed.
              NOTE: If this measure gets below a few tens of MB then the system will page swap aggressively have
              poorer file system performance due to the lack of caching.  This is a 'passive' test and works  by
              reading /proc/meminfo

       allocatable-memory = <minpage>
              Set the minimum amount of allocatable memory available on the system.  Note that this is in pages.
              Default  value  is 0 pages which means the test is disabled.  As with min-memory, the page size is
              taken from the system include files. This is an 'active'  test  and  it  works  by  attempting  to
              memory-map a block of the configured size.

       max-swap = <maxpage>
              Set  the maximum amount of swap use. Note that this is in memory pages (4kB on x86). Default value
              is 0 pages which means this test is disabled. Often this should be a large  portion  of  available
              swap,  but remember that paging 1GB of swap can take several/tens of seconds.  This is a 'passive'
              test and works by reading /proc/meminfo

       watchdog-device = <device>
              Set the watchdog device name, typically /dev/watchdog. Default is to disable keep  alive  support.
              This  should  be tested by running the daemon from the command line before configuring it to start
              automatically on booting.

       watchdog-refresh-use-settimeout = <auto|yes|no>
              Refresh watchdog timer by  setting  its  timeout  instead  of  using  a  normal  watchdog  refresh
              operation.  Might  help  if your watchdog trips by itself when the first timeout interval elapses.
              Default is 'auto' for IT87 fix-up but this can be disabled with 'no' or forced for  other  modules
              with 'yes'.

       watchdog-refresh-ignore-errors = <yes|no>
              Ignore  errors reported by writing to the watchdog device. Typically this is used for systems that
              have broken implementations of the IPMI driver to avoid a reboot loop.

       watchdog-timeout = <timeout>
              Set the watchdog device timeout during startup.  If not set, a default is used that should be  set
              to the kernel timer margin at compile time.

       temperature-sensor = <temp-virtual-file>
              Set  the temperature sensor name. This is normally a 'virtual file' under /sys and it contains the
              temperature in milli-Celsius. Usually these are generated by the sensors package, but take care as
              device enumeration may not be fixed. Default is to disable temperature checking. Multiple  sensors
              can  be  used  by  having  repeated temperature-sensor entries. Due to the enumeration problem any
              missing temp sensor is simply ignored and not treated as a reboot trigger.

       max-temperature = <temp>
              Set the maximal allowed temperature in Celsius. Once this temperature is  reached  the  system  is
              stopped.  Default  value is 90 C. Watchdog will issue warnings once the temperature increases 90%,
              95% and 98% of this temperature.

       temp-power-off = <yes|no>
              Set the watchdog action on overheating. Yes option (default) is  to  power  the  machine  off,  no
              option is to halt machine and allow Ctrl-Alt-Del reboot.

       file = <filename>
              Set  file  name  for  file  mode.   This option can be given as often as you like to check several
              files.

       change = <mtime>
              Set the change interval time for file mode. This options always belongs to  the  active  filename,
              that is when finding a 'change =' line watchdog assumes it belongs to the most recently read 'file
              ='  line.   They  don't  necessarily  have to follow each other directly. But you cannot specify a
              'change =' before a 'file ='.  The default is to only stat the file and don't  look  for  changes.
              Using  this  feature  to  monitor  changes  in /var/log/messages might require some special syslog
              daemon configuration, e.g. rsyslog needs "$ActionWriteAllMarkMessages on" to be set to  make  sure
              the marks are written no matter what.

       pidfile = <pidfilename>
              Set  pidfile  name  for  daemon test mode.  This option can be given as often as you like to check
              several daemons, assuming they write their post-forking PID to the specified files.

       ping = <ip-addr>
              Set IPv4 address for ping mode.  This option can  be  used  more  than  once  to  check  different
              connections.

       ping-count = <ping-per-interval>
              Set  the  number of ping attempts in each 'interval' of time. Default is 3 and it completes on the
              first successful ping.

       interface = <if-name>
              Set interface name for network mode.  This option can be used more than once  to  check  different
              interfaces. Note it is only possible to check physical interfaces, and not aliased IP interfaces.

       test-binary = <testbin>
              Execute the given binary to do some user defined tests.

       test-timeout = <timeout in seconds>
              User defined tests may only run for <timeout> seconds. Set to 0 for unlimited.

       repair-binary = <repbin>
              Execute the given binary in case of a problem instead of shutting down the system.

       repair-timeout = <timeout in seconds>
              repair  command  may  only  run for <timeout> seconds. Set to 0 for 'unlimited', but note that the
              hardware timer is not refreshed in this case so the system will hard-reset at some point.

       retry-timeout = <timeout in seconds>
              Allow most error conditions to persist for <timeout> seconds. Set to 0 for immediate action  (like
              softboot behaviour).

       repair-maximum = <count>
              This  allows  no more then <count> repair attempts against a given fault that report success (i.e.
              return 0), but fail to clear the fault, before a reboot is initiated anyway. If set to zero then a
              repairable fault can always be blocked by a repair  program  reporting  success  (previous  daemon
              behaviour).

       softboot-option = <yes|no>
              This acts like the -b / --softboot command line and simply sets the retry timeout to zero.

       admin = <mail-address>
              Email  address  to  send  admin  mail to. That is, who shall be notified that the machine is being
              halted or rebooted. Default is 'root'. If you want to disable  notification  via  email  just  set
              admin to en empty string.

       realtime = <yes|no>
              If set to yes watchdog will lock itself into memory so it is never swapped out.

       priority = <schedule priority>
              Set the schedule priority for realtime mode passed to sched_setscheduler().

       test-directory = <test directory>
              Set  the  directory  to  run  user test/repair scripts.  Default is '/etc/watchdog.d' See the Test
              Directory section in watchdog(8) for more information.

       log-dir = <log directory>
              Set the log directory to capture the standard output and standard  error  from  repair-binary  and
              test-binary execution. Default is '/var/log/watchdog'.

       sigterm-delay = <time in seconds>
              Set  the  time  on  shut  down  between  first  sending SIGTERM to all processes, and then sending
              SIGKILL. Default is 5 seconds which is generally enough,  but  systems  with  large  databases  or
              virtual machines might need longer.

       verbose = <level>
              This  overrides  the command line --verbose option. Generally the verbose mode is only enabled for
              debugging as it creates a lot of syslog chatter, so use this option with  consideration.  Zero  is
              "normal"  operation  (quiet), while 1 is typically used for debugging. Values of 2 or more usually
              generate far too many messages.

       heartbeat-file = <filename>
              For debugging this allows a rolling set of status values to be kept on disk

       heartbeat-stamps = <interval>
              For debugging this sets the number of entries in the <heartbeat-file>

       log-killed-pids = <yes|no>
              This acts like enabling 'verbose' logging, but only for a system  reboot,  where  it  enables  the
              logging  of the PID values for all processes that are being killed. The results are written to the
              killall5.log file in the log directory (if at all possible) in this case.  Intended for  debugging
              cases  where  you  would  like  to  know  what  was running at the point the machine triggered the
              watchdog, but don't want syslog filling up with the usual chatter of activity.

FILES

       /etc/watchdog.conf
              The watchdog configuration file

       /etc/watchdog.d
              A directory containing test-or-repair commands. See the Test Directory section in watchdog(8)  for
              more information.

SEE ALSO

       watchdog(8)

4th Berkeley Distribution                         February 2019                                 WATCHDOG.CONF(5)