Provided by: slurm-client_24.11.5-4_amd64 bug

NAME

       SPANK - Slurm Plug-in Architecture for Node and job (K)control

DESCRIPTION

       This manual briefly describes the capabilities of the Slurm Plug-in Architecture for Node and job Kontrol
       (SPANK) as well as the SPANK configuration file: (By default: plugstack.conf.)

       SPANK  provides  a  very generic interface for stackable plug-ins which may be used to dynamically modify
       the job launch code in Slurm. SPANK plugins may be built without access to Slurm source code.  They  need
       only  be compiled against Slurm's spank.h header file, added to the SPANK config file plugstack.conf, and
       they will be loaded at runtime during the next  job  launch.  Thus,  the  SPANK  infrastructure  provides
       administrators  and  other  developers  a  low cost, low effort ability to dynamically modify the runtime
       behavior of Slurm job launch.

       NOTE: All SPANK plugins should be recompiled when upgrading Slurm to a new major release. The  SPANK  API
       is  not  guaranteed  to  be ABI compatible between major releases. Any SPANK plugin linking to any of the
       Slurm libraries should be carefully checked as the Slurm  APIs  and  headers  can  change  between  major
       releases.

SPANK PLUGINS

       SPANK  plugins  are loaded in up to five separate contexts during a Slurm job. Briefly, the five contexts
       are:

       local   In local context, the plugin is loaded by srun. (i.e. the "local" part of a parallel job).

       remote  In remote context, the plugin is loaded by slurmstepd. (i.e. the  "remote"  part  of  a  parallel
               job).

       allocator
               In  allocator context, the plugin is loaded in one of the job allocation utilities salloc, sbatch
               or scrontab.

       slurmd  In slurmd context, the plugin is loaded in the slurmd daemon  itself.  NOTE:  Plugins  loaded  in
               slurmd  context  persist for the entire time slurmd is running, so if configuration is changed or
               plugins are updated, slurmd must be restarted for the changes to take effect.

       job_script
               In the job_script context, plugins are loaded in the context of the job prolog or  epilog.  NOTE:
               Plugins  are  loaded in job_script context on each run on the job prolog or epilog, in a separate
               address space from plugins in slurmd context. This means there is no state  shared  between  this
               context   and   other   contexts,   or   even  between  one  call  to  slurm_spank_job_prolog  or
               slurm_spank_job_epilog and subsequent calls.

       In local context, only the init, exit,  init_post_opt,  and  local_user_init  functions  are  called.  In
       allocator  context,  only  the  init, exit, and init_post_opt functions are called.  Similarly, in slurmd
       context, only the init and slurmd_exit callbacks are active, and in  the  job_script  context,  only  the
       job_prolog  and  job_epilog  callbacks are used.  Plugins may query the context in which they are running
       with the spank_context and spank_remote functions defined in spank.h.

       SPANK plugins may be called from multiple points during the Slurm job launch. A  plugin  may  define  the
       following functions:

       slurm_spank_init
         Called  just  after  plugins are loaded. In remote context, this is just after job step is initialized.
         This function is called before any plugin option processing.

       slurm_spank_job_prolog
         Called at the same time as the job prolog. If this function returns a  non-zero  value  and  the  SPANK
         plugin  that  contains  it  is  required  in  the  plugstack.conf, the node that this is run on will be
         drained.

       slurm_spank_init_post_opt
         Called at the same point as slurm_spank_init, but after all  user  options  to  the  plugin  have  been
         processed.  The  reason  that the init and init_post_opt callbacks are separated is so that plugins can
         process system-wide options specified in  plugstack.conf  in  the  init  callback,  then  process  user
         options,  and  finally  take  some  action in slurm_spank_init_post_opt if necessary.  In the case of a
         heterogeneous job, slurm_spank_init is invoked once per job component.

       slurm_spank_local_user_init
         Called in local (srun) context only after all options have been processed.  This is  called  after  the
         job ID and step IDs are available.  This happens in srun after the allocation is made, but before tasks
         are launched.

       slurm_spank_user_init
         Called after privileges are temporarily dropped. (remote context only)

       slurm_spank_task_init_privileged
         Called  for  each task just after fork, but before all elevated privileges are dropped. (remote context
         only)

       slurm_spank_task_init
         Called for each task just before execve (2).  If  you  are  restricting  memory  with  cgroups,  memory
         allocated here will be in the job's cgroup. (remote context only)

       slurm_spank_task_post_fork
         Called  for each task from parent process after fork (2) is complete.  Due to the fact that slurmd does
         not exec any tasks until all tasks have completed fork (2), this call is guaranteed to run  before  the
         user task is executed. (remote context only)

       slurm_spank_task_exit
         Called for each task as its exit status is collected by Slurm.  (remote context only)

       slurm_spank_exit
         Called  once  just  before  slurmstepd  exits  in remote context.  In local context, called before srun
         exits.

       slurm_spank_job_epilog
         Called at the same time as the job epilog. If this function returns a  non-zero  value  and  the  SPANK
         plugin  that  contains  it  is  required  in  the  plugstack.conf, the node that this is run on will be
         drained.

       slurm_spank_slurmd_exit
         Called in slurmd when the daemon is shut down.

       All of these functions have the same prototype, for example:
          int slurm_spank_init (spank_t spank, int ac, char *argv[])

       Where spank is the SPANK handle which must be passed back to Slurm when the plugin calls  functions  like
       spank_get_item  and  spank_getenv.  Configured  arguments  (See  CONFIGURATION  below)  are passed in the
       argument vector argv with argument count ac.

       SPANK plugins can query the current list of supported slurm_spank symbols to  determine  if  the  current
       version  supports a given plugin hook.  This may be useful because the list of plugin symbols may grow in
       the future. The query is  done  using  the  spank_symbol_supported  function,  which  has  the  following
       prototype:
           int spank_symbol_supported (const char *sym);

       The return value is 1 if the symbol is supported, 0 if not.

       SPANK plugins do not have direct access to internally defined Slurm data structures. Instead, information
       about the currently executing job is obtained via the spank_get_item function call.
         spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);

       The  spank_get_item  call must be passed the current SPANK handle as well as the item requested, which is
       defined by the passed spank_item_t. A variable number of pointer arguments are also passed, depending  on
       which item was requested by the plugin. A list of the valid values for item is kept in the spank.h header
       file. Some examples are:

       S_JOB_UID
         User id for running job. (uid_t *) is third arg of spank_get_item

       S_JOB_STEPID
         Job step id for running job. (uint32_t *) is third arg of spank_get_item.

       S_TASK_EXIT_STATUS
         Exit  status  for  exited  task.  Only  valid  from  slurm_spank_task_exit.   (int  *)  is third arg of
         spank_get_item.

       S_JOB_ARGV
         Complete job command line. Third and fourth args to spank_get_item are (int *, char ***).

       See spank.h for more details.

       SPANK functions in the local and allocator environment  should  use  the  getenv,  setenv,  and  unsetenv
       functions to view and modify the job's environment.  SPANK functions in the remote environment should use
       the  spank_getenv,  spank_setenv,  and spank_unsetenv functions to view and modify the job's environment.
       spank_getenv searches the job's environment for the environment variable var and copies the current value
       into a buffer buf of length len.  spank_setenv allows a SPANK plugin to set or overwrite  a  variable  in
       the  job's  environment,  and spank_unsetenv unsets an environment variable in the job's environment. The
       prototypes are:
        spank_err_t spank_getenv (spank_t spank, const char *var,
                            char *buf, int len);
        spank_err_t spank_setenv (spank_t spank, const char *var,
                            const char *val, int overwrite);
        spank_err_t spank_unsetenv (spank_t spank, const char *var);

       These are only necessary in remote context since modifications of the standard process environment  using
       setenv (3), getenv (3), and unsetenv (3) may be used in local context.

       Functions  are  also  available  from  within  the SPANK plugins to establish environment variables to be
       exported to the Slurm PrologSlurmctld, Prolog, Epilog and EpilogSlurmctld  programs  (the  so-called  job
       control  environment).   The  name  of environment variables established by these calls will be prepended
       with the string SPANK_ in order to avoid any security  implications  of  arbitrary  environment  variable
       control. (After all, the job control scripts do run as root or the Slurm user.).

       These functions are available from local context only.
         spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
                              char *buf, int len);
         spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
                              const char *val, int overwrite);
         spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);

       See spank.h for more information.

       Many  of the described SPANK functions available to plugins return errors via the spank_err_t error type.
       On success, the return value will be set to ESPANK_SUCCESS, while on failure, the return  value  will  be
       set to one of many error values defined in spank.h. The SPANK interface provides a simple function
         const char * spank_strerror(spank_err_t err);
       which may be used to translate a spank_err_t value into its string representation.

       The slurm_spank_log function can be used to print messages back to the user at an error level. This is to
       keep  users  from  having to rely on the slurm_error function, which can be confusing because it prepends
       "error:" to every message.

SPANK OPTIONS

       SPANK plugins also have an interface through which they may define and implement extra job options. These
       options are made available to the user through Slurm commands such as srun(1), salloc(1), and  sbatch(1).
       If  the  option is specified by the user, its value is forwarded and registered with the plugin in slurmd
       when the job is run.  In this way, SPANK plugins may dynamically provide new options and functionality to
       Slurm.

       Each option registered by a plugin to Slurm takes the form of a struct spank_option which is declared  in
       spank.h as
          struct spank_option {
             char *         name;
             char *         arginfo;
             char *         usage;
             int            has_arg;
             int            val;
             spank_opt_cb_f cb;
          };

       Where

       name   is the name of the option. Its length is limited to SPANK_OPTION_MAXLEN defined in spank.h.

       arginfo
              is a description of the argument to the option, if the option does take an argument.

       usage  is a short description of the option suitable for --help output.

       has_arg
              0  if  option  takes  no  argument,  1  if  option takes an argument, and 2 if the option takes an
              optional argument. (See getopt_long (3)).

       val    A plugin-local value to return to the option callback function.

       cb     A  callback  function  that  is  invoked  when  the  plugin  option  is  registered  with   Slurm.
              spank_opt_cb_f is typedef'd in spank.h as

                typedef int (*spank_opt_cb_f) (int val, const char *optarg,
                                         int remote);
              Where  val  is  the  value  of  the  val  field in the spank_option struct, optarg is the supplied
              argument if applicable, and remote is 0 if the function is being  called  from  the  "local"  host
              (e.g.  host  where  srun  or  sbatch/salloc  are  invoked) or 1 from the "remote" host (host where
              slurmd/slurmstepd run) but only  executed  by  slurmstepd  (remote  context)  if  the  option  was
              registered for such context.

       Plugin  options  may  be registered with Slurm using the spank_option_register function. This function is
       only valid when called from the plugin's slurm_spank_init handler, and registers one option  at  a  time.
       The prototype is
          spank_err_t spank_option_register (spank_t sp,
                    struct spank_option *opt);
       This  function  will return ESPANK_SUCCESS on successful registration of an option, or ESPANK_BAD_ARG for
       errors including invalid spank_t handle, or when the function is not  called  from  the  slurm_spank_init
       function.  All  options need to be registered from all contexts in which they will be used. For instance,
       if an option is only used in local (srun) and remote (slurmd) contexts, then spank_option_register should
       only be called from within those contexts. For example:
          if (spank_context() != S_CTX_ALLOCATOR)
             spank_option_register (sp, opt);
       If, however, the option is used in all contexts, the spank_option_register needs to be called everywhere.

       In addition to spank_option_register, plugins may also export options to Slurm by  defining  a  table  of
       struct  spank_option  with  the symbol name spank_options. This method, however, is not supported for use
       with sbatch and salloc (allocator context), thus the use  of  spank_option_register  is  preferred.  When
       using  the  spank_options  table,  the  final  element  in  the  array  must  be  filled  with  zeros.  A
       SPANK_OPTIONS_TABLE_END macro is provided in spank.h for this purpose.

       When an option is provided by the user  on  the  local  side,  either  by  command  line  options  or  by
       environment  variables,  Slurm will immediately invoke the option's callback with remote=0. This is meant
       for the plugin to do local sanity checking of the option before the value is  sent  to  the  remote  side
       during  job  launch.  If the argument the user specified is invalid, the plugin should issue an error and
       issue a non-zero return code from the callback. The plugin should be able to handle cases where the spank
       option is set multiple  times  through  environment  variables  and  command  line  options.  Environment
       variables are processed before command line options.

       On  the  remote  side, options and their arguments are registered just after SPANK plugins are loaded and
       before the spank_init  handler  is  called.  This  allows  plugins  to  modify  behavior  of  all  plugin
       functionality based on the value of user-provided options.

       As   an   alternative   to  use  of  an  option  callback  and  global  variable,  plugins  can  use  the
       spank_option_getopt option to check for supplied options after option processing. This function  has  the
       prototype:
          spank_err_t spank_option_getopt(spank_t sp,
              struct spank_option *opt, char **optargp);
       This  function  returns ESPANK_SUCCESS if the option defined in the struct spank_option opt has been used
       by the user. If optargp is non-NULL then it is set to any option argument passed (if the option takes  an
       argument).   The   use   of   this   method   is  required  to  process  options  in  job_script  context
       (slurm_spank_job_prolog and slurm_spank_job_epilog). This function is valid in  the  following  contexts:
       slurm_spank_job_prolog,                slurm_spank_local_user_init,                slurm_spank_user_init,
       slurm_spank_task_init_privileged,        slurm_spank_task_init,        slurm_spank_task_exit,         and
       slurm_spank_job_epilog.

CONFIGURATION

       The  default  SPANK  plug-in  stack  configuration  file  is  plugstack.conf  in  the  same  directory as
       slurm.conf(5), though this may be changed via the Slurm config parameter  PlugStackConfig.  Normally  the
       plugstack.conf  file  should  be  identical  on  all  nodes  of the cluster.  The config file lists SPANK
       plugins, one per line, along with whether the plugin is required or optional, and  any  global  arguments
       that  are to be passed to the plugin for runtime configuration. Comments are preceded with '#' and extend
       to the end of the line. If the configuration file is missing or empty, it will simply be ignored.

       NOTE: The SPANK plugins need to be installed on the machines that execute slurmd (compute nodes) as  well
       as on the machines that execute job allocation utilities such as salloc, sbatch, etc (login nodes).

       The format of each non-comment line in the configuration file is:
         required/optional   plugin   arguments
        For example:
         optional /usr/lib/slurm/test.so
       Tells  slurmd  to  load  the  plugin  test.so  passing no arguments.  If a SPANK plugin is required, then
       failure of any of the plugin's functions will cause slurmd, or the job allocator command to terminate the
       job, while optional plugins only cause a warning.

       If a fully-qualified path is not specified for a plugin,  then  the  currently  configured  PluginDir  in
       slurm.conf(5) is searched.

       SPANK  plugins  are  stackable, meaning that more than one plugin may be placed into the config file. The
       plugins will simply be called in order, one after the other, and  appropriate  action  taken  on  failure
       given that state of the plugin's optional flag.

       Additional config files or directories of config files may be included in plugstack.conf with the include
       keyword.  The include keyword must appear on its own line, and takes a glob as its parameter, so multiple
       files may be included from one include line. For example, the following syntax will load all config files
       in the /etc/slurm/plugstack.conf.d directory, in local collation order:
         include /etc/slurm/plugstack.conf.d/*
       which might be considered a more flexible method for building up a spank plugin stack.

       The SPANK config file is re-read on each job launch, so editing the config file will not  affect  running
       jobs. However care should be taken so that a partially edited config file is not read by a launching job.

Errors

       When SPANK plugin results in a non-zero result, the following changes will result:

       ┌────────────┬────────────────────────────────────┬─────────────┬────────────┬───────────────┬────────────┐
       │ Command    │ Function                           │ Context     │ Exitcode   │ Drains Node   │  Fails job │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_init                   │ local       │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_init_post_opt          │ local       │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_local_user_init        │ local       │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_user_init              │ remote      │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_task_init_privileged   │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_task_post_fork         │ remote      │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_task_init              │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_task_exit              │ remote      │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ srun       │ slurm_spank_exit                   │ local       │ 0          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_init                   │ allocator   │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_init_post_opt          │ allocator   │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_user_init              │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_task_init_privileged   │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_task_post_fork         │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_task_init              │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_task_exit              │ remote      │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ salloc     │ slurm_spank_exit                   │ allocator   │ 0          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_init                   │ allocator   │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_init_post_opt          │ allocator   │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_user_init              │ remote      │ 1          │ yes           │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_task_init_privileged   │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_task_post_fork         │ remote      │ 1          │ yes           │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_task_init              │ remote      │ 1          │ no            │     yes    │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_task_exit              │ remote      │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ sbatch     │ slurm_spank_exit                   │ allocator   │ 0          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ scrontab   │ slurm_spank_init                   │ allocator   │ 1          │ no            │     no     │
       ├────────────┼────────────────────────────────────┼─────────────┼────────────┼───────────────┼────────────┤
       │ scrontab   │ slurm_spank_exit                   │ allocator   │ 0          │ no            │     no     │
       └────────────┴────────────────────────────────────┴─────────────┴────────────┴───────────────┴────────────┘

       NOTE: The behavior for ProctrackType=proctrack/pgid may result in timeouts for slurm_spank_task_post_fork
       with remote context on failure.

COPYING

       Portions  copyright  (C)  2010-2022  SchedMD  LLC.   Copyright  (C) 2006 The Regents of the University of
       California.  Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).  CODE-OCEC-09-009.  All
       rights reserved.

       This    file    is    part    of    Slurm,   a   resource   management   program.    For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under  the  terms  of  the  GNU  General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm  is  distributed  in  the  hope  that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR  PURPOSE.  See  the  GNU  General  Public
       License for more details.

FILES

       /etc/slurm/slurm.conf - Slurm configuration file.
       /etc/slurm/plugstack.conf - SPANK configuration file.
       /usr/include/slurm/spank.h - SPANK header file.

SEE ALSO

       srun(1), slurm.conf(5)

December 2023                                    Slurm Component                                        SPANK(7)