Provided by: slurm-client_24.11.5-4_amd64 bug

NAME

       cgroup.conf - Slurm configuration file for the cgroup support

DESCRIPTION

       cgroup.conf  is an ASCII file which defines parameters used by Slurm's Linux cgroup related plugins.  The
       file will always be located in the same directory as the slurm.conf.

       Parameter names are case insensitive.  Any text following a "#" in the configuration file is treated as a
       comment through the end of that line.  Changes to the configuration file  take  effect  upon  restart  of
       Slurm  daemons,  daemon  receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure"
       unless otherwise noted.

       For general Slurm cgroups information, see the Cgroups Guide at <https://slurm.schedmd.com/cgroups.html>.

       The following cgroup.conf parameters are defined to control the general behavior of Slurm cgroup plugins.

       CgroupMountpoint=PATH
              Only intended for development and testing. Specifies  the  PATH  under  which  cgroup  controllers
              should be mounted. The default PATH is /sys/fs/cgroup.

       CgroupPlugin=<cgroup/v1|cgroup/v2|autodetect|disabled>
              Specify  the  plugin  to be used when interacting with the cgroup subsystem.  Supported values are
              "disabled", to completely disable any interaction with the cgroups, "cgroup/v1" which supports the
              legacy interface of cgroup v1, "cgroup/v2" for the unified cgroup2 architecture, and  "autodetect"
              which  tries to determine which cgroup version does your system provide. "autodetect" is useful to
              have a single configuration in clusters where  nodes  can  have  different  cgroup  versions.  The
              default value is "autodetect".

       SystemdTimeout=<number>
              On  slow  systems  like  virtual  machines  or  when systemd is busy, it can take a lot of time to
              initialize and prepare the scope for slurmd during startup.  Slurm will wait  a  maximum  of  this
              amount  of  time  (in  milliseconds)  for  the  scope  to be ready before failing. Only applies to
              cgroup/v2.  The default is 1000 ms.

       IgnoreSystemd=<yes|no>
              Only for cgroup/v2 and for development and testing. It will avoid any call  to  dbus  and  contact
              with systemd, and instead will prepare all the cgroup hierarchy manually. This option is dangerous
              in systems with systemd since the cgroup can be modified by systemd and cause issues to jobs.

       IgnoreSystemdOnFailure=<yes|no>
              Only  for cgroup/v2 and for development and testing. It has similar functionality to IgnoreSystemd
              but only in the case that a dbus call does not succeed.

       EnableControllers=<yes|no>
              Only for cgroup/v2 and generally for development, testing, when running on old kernels and/or  old
              systemd  versions  (e.g.  RHEL8,  systemd  < 244) where not all the controllers are enabled in the
              cgroup  tree.  With  this  parameter,  slurmd  gets  the   available   controllers   from   root's
              cgroup.controllers file (CgroupMountPoint, by default /sys/fs/cgroup/cgroup.controllers) and makes
              them available in all levels of the cgroup tree until it reaches Slurm's cgroup leaf.

TASK/CGROUP PLUGIN

       The following cgroup.conf parameters are defined to control the behavior of this particular plugin:

       AllowedRAMSpace=<number>
              Constrain  the  job/step  cgroup  RAM  to this percentage of the allocated memory.  The percentage
              supplied may be expressed as floating point number, e.g. 101.5.  Sets the cgroup soft memory limit
              at  the  allocated  memory  size  and  then  sets  the  job/step  hard   memory   limit   at   the
              (AllowedRAMSpace/100)  *  allocated  memory. If the job/step exceeds the hard limit, then it might
              trigger Out Of Memory (OOM) events (including oom-kill) which will be logged to  kernel  log  ring
              buffer  (dmesg  in  Linux). Setting AllowedRAMSpace above 100 may cause system Out of Memory (OOM)
              events as it allows job/step to allocate more  memory  than  configured  to  the  nodes.  Reducing
              configured  node available memory to avoid system OOM events is suggested. Setting AllowedRAMSpace
              below 100 will result in jobs receiving less memory than allocated and soft memory limit will  set
              to the same value as the hard limit.  Also see ConstrainRAMSpace.  The default value is 100.

       AllowedSwapSpace=<number>
              Constrain  the job cgroup swap space to this percentage of the allocated memory. The default value
              is 0, which means that RAM+Swap will be limited to AllowedRAMSpace. The supplied percentage may be
              expressed as a floating point number, e.g. 50.5. If the limit is exceeded, the job steps  will  be
              killed  and  a  warning  message  will be written to standard error.  Also see ConstrainSwapSpace.
              NOTE: Setting AllowedSwapSpace to 0 does not restrict the Linux kernel from using swap  space.  To
              control how the kernel uses swap space, see MemorySwappiness.

       ConstrainCores=<yes|no>
              If  configured  to  "yes"  then constrain allowed cores to the subset of allocated resources. This
              functionality makes use of the cpuset subsystem.  Due to a bug fixed in version 1.11.5  of  HWLOC,
              the task/affinity plugin may be required in addition to task/cgroup for this to function properly.
              The default value is "no".

       ConstrainDevices=<yes|no>
              If configured to "yes" then constrain the job's allowed devices based on GRES allocated resources.
              It uses the devices subsystem for that.  The default value is "no".

       ConstrainRAMSpace=<yes|no>
              If  configured to "yes" then constrain the job's RAM usage by setting the memory soft limit to the
              allocated memory and the hard limit to the allocated memory * AllowedRAMSpace. The  default  value
              is  "no",  in  which  case  the  job's  RAM  limit  will  be  set  to  its  swap  space  limit  if
              ConstrainSwapSpace is set to "yes". CR_*_Memory must be set in slurm.conf for  this  parameter  to
              take any effect.  Also see AllowedSwapSpace, AllowedRAMSpace and ConstrainSwapSpace.

              NOTE:  When  using  ConstrainRAMSpace,  if  the combined memory used by all processes in a step is
              greater than the limit, then the kernel will trigger an OOM event, killing  one  or  more  of  the
              processes in the step. The step state will be marked as OOM, but the step itself will keep running
              and  other  processes  in the step may continue to run as well.  This differs from the behavior of
              OverMemoryKill, where the whole step will be killed/cancelled.

              NOTE: When enabled, ConstrainRAMSpace can lead to a noticeable decline in per-node job throughout.
              Sites with high-throughput requirements should  carefully  weigh  the  tradeoff  between  per-node
              throughput,  versus potential problems that can arise from unconstrained memory usage on the node.
              See <https://slurm.schedmd.com/high_throughput.html> for further discussion.

       ConstrainSwapSpace=<yes|no>
              If configured to "yes" then constrain the job's swap space usage.  The default value is "no". Note
              that when set to "yes" and ConstrainRAMSpace is set to "no", AllowedRAMSpace is automatically  set
              to  100%  in  order  to limit the RAM+Swap amount to 100% of job's requirement plus the percent of
              allowed swap space. This amount is thus set to both RAM and RAM+Swap limits. This  means  that  in
              that  particular  case,  ConstrainRAMSpace is automatically enabled with the same limit as the one
              used to constrain swap space. CR_*_Memory must be set in slurm.conf for this parameter to take any
              effect.  Also see AllowedSwapSpace.

       MaxRAMPercent=PERCENT
              Set an upper bound in percent of total  RAM  (configured  RealMemory  of  the  node)  on  the  RAM
              constraint  for  a job. This will be the memory constraint applied to jobs that are not explicitly
              allocated memory by Slurm  (i.e.  Slurm's  select  plugin  is  not  configured  to  manage  memory
              allocations). The PERCENT may be an arbitrary floating point number. The default value is 100.

       MaxSwapPercent=PERCENT
              Set  an  upper bound (in percent of total RAM, configured RealMemory of the node) on the amount of
              RAM+Swap that may be used for a job. This will be the swap limit applied to jobs on systems  where
              memory  is  not  being explicitly allocated to job. The PERCENT may be an arbitrary floating point
              number between 0 and 100. The default value is 100.

       MemorySwappiness=<number>
              Only for cgroup/v1.  Configure the kernel's priority for swapping out  anonymous  pages  (such  as
              program  data)  verses  file  cache  pages for the job cgroup. Valid values are between 0 and 100,
              inclusive. A value of 0 prevents the kernel from swapping out program data. A value of  100  gives
              equal  priority  to  swapping  out  file  cache  or anonymous pages. If not set, then the kernel's
              default swappiness value will be used. ConstrainSwapSpace must be set to yes  in  order  for  this
              parameter to be applied.

       MinRAMSpace=<number>
              Set  a  lower  bound (in MB) on the memory limits defined by AllowedRAMSpace and AllowedSwapSpace.
              This prevents accidentally creating a memory cgroup with such  a  low  limit  that  slurmstepd  is
              immediately killed due to lack of RAM. The default limit is 30M.

PROCTRACK/CGROUP PLUGIN

       The following cgroup.conf parameters are defined to control the behavior of this particular plugin:

       SignalChildrenProcesses=<yes|no>
              If  configured  to  "yes",  then  send signals (for cancelling, suspending, resuming, etc.) to all
              children processes in a job/step. Otherwise,  only  send  signals  to  the  parent  process  of  a
              job/step. The default setting is "no".

DISTRIBUTION-SPECIFIC NOTES

       Debian  and  derivatives (e.g. Ubuntu) usually exclude the memory and memsw (swap) cgroups by default. To
       include them, add the following parameters to the kernel command line: cgroup_enable=memory swapaccount=1

       This can usually be placed in /etc/default/grub inside the GRUB_CMDLINE_LINUX variable. A command such as
       update-grub must be run after updating the file.

EXAMPLE

       /etc/slurm/cgroup.conf:
              This example cgroup.conf file shows a configuration that enables the  more  commonly  used  cgroup
              enforcement mechanisms.

              ###
              # Slurm cgroup support configuration file.
              ###
              ConstrainCores=yes
              ConstrainDevices=yes
              ConstrainRAMSpace=yes
              ConstrainSwapSpace=yes

       /etc/slurm/slurm.conf:
              These  are  the entries required in slurm.conf to activate the cgroup enforcement mechanisms. Make
              sure that the node definitions in your slurm.conf closely match  the  configuration  as  shown  by
              "slurmd -C".  Either MemSpecLimit should be set or RealMemory should be defined with less than the
              actual  amount  of  memory  for  a  node  to  ensure  that  all system/non-job processes will have
              sufficient memory at all times. Sites should also configure pam_slurm_adopt to  ensure  users  can
              not escape the cgroups via ssh.

              ###
              # Slurm configuration entries for cgroups
              ###
              ProctrackType=proctrack/cgroup
              TaskPlugin=task/cgroup,task/affinity
              JobAcctGatherType=jobacct_gather/cgroup #optional for gathering metrics
              PrologFlags=Contain                     #X11 flag is also suggested

COPYING

       Copyright  (C)  2010-2012  Lawrence Livermore National Security.  Produced at Lawrence Livermore National
       Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This   file   is   part   of   Slurm,   a   resource    management    program.     For    details,    see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm is distributed in the hope that it will be useful, but  WITHOUT  ANY  WARRANTY;  without  even  the
       implied  warranty  of  MERCHANTABILITY  or  FITNESS  FOR A PARTICULAR PURPOSE. See the GNU General Public
       License for more details.

SEE ALSO

       slurm.conf(5)

July 2024                                   Slurm Configuration File                              cgroup.conf(5)