Provided by: gridengine-exec_8.1.9+dfsg-11build3_amd64 bug

NAME

       sge_shepherd - Grid Engine single job-controlling agent

SYNOPSIS

       sge_shepherd

DESCRIPTION

       sge_shepherd  provides  the  parent  process  functionality  for  a  single  Grid Engine job.  The parent
       functionality is necessary on UNIX systems to retrieve  resource  usage  information  (see  getrusage(2))
       after a job has finished. In addition, the sge_shepherd forwards signals to the job, such for suspension,
       enabling,  termination,  and  the Grid Engine checkpointing signal (see sge_ckpt(1) and queue_conf(5) for
       details).

       The sge_shepherd receives information about the job to be started  from  the  sge_execd(8).   During  the
       execution  of  the  job  it actually starts up to 5 child processes. First a prolog script is run if this
       feature is enabled by the prolog parameter in the  cluster  configuration.  (See  sge_conf(5).)   Next  a
       parallel  environment  startup  procedure  is  run  if the job is a parallel job. (See sge_pe(5) for more
       information.)  After that, the job itself is run, followed by a parallel environment  shutdown  procedure
       for  parallel  jobs,  and  finally  an  epilog script if requested by the epilog parameter in the cluster
       configuration. The prolog and epilog scripts, as well as the parallel environment  startup  and  shutdown
       procedures,  are  to  be  provided  by  the  Grid Engine administrator and are intended for site-specific
       actions to be taken before and after execution of the actual user job.

       After the job has finished and the epilog script is  processed,  sge_shepherd  retrieves  resource  usage
       statistics  about the job, places them in a job-specific subdirectory of the sge_execd(8) spool directory
       for reporting through sge_execd(8), and finishes.

       sge_shepherd also places an exit status file in the spool directory. This exit status can be viewed  with
       qacct -j JobId (see qacct(1)); it is not the exit status of sge_shepherd itself but of one of the methods
       executed  by  sge_shepherd.  This exit status can have several meanings, depending on the method in which
       an error occurred (if any).  The possible methods  are:  prolog,  parallel  start,  job,  parallel  stop,
       epilog, suspend, restart, terminate, clean, migrate, and checkpoint.

       The following exit values are returned:

       0      All methods: Operation was executed successfully.

       99     Job  script,  prolog  and  epilog:  When  FORBID_RESCHEDULE  is  not set in the configuration (see
              sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       100    Job script, prolog and  epilog:  When  FORBID_APPERROR  is  not  set  in  the  configuration  (see
              sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       Other  Job  script:  This  is the exit status of the job itself. No action is taken upon this exit status
              because the meaning of this exit status is not known.
              Prolog, epilog and parallel start: The queue is set to error state and the job is re-queued.
              Parallel stop: The queue is set to error state, but the job is not re-queued. It is  assumed  that
              the job itself ran successfully and only the clean up script failed.
              Suspend, restart, terminate, clean, and migrate: Always successful.
              Checkpoint:  Success,  except  for  kernel  checkpointing:  checkpoint was not successful, did not
              happen (but migration will happen).

       For the meaning of the return codes of the shepherd  itself  (which  are  interpreted  by  qacct(1))  see
       sge_status(5).

RESTRICTIONS

       sge_shepherd should not be invoked manually, but only by sge_execd(8).

ENVIRONMENT VARIABLES

       SGE_ROOT       Specifies the location of the Grid Engine standard configuration files.

       SGE_CELL       If  set,  specifies  the default Grid Engine cell. To address a Grid Engine cell sge_execd
                      uses (in the order of precedence):

                             The name of the cell specified in the environment variable SGE_CELL, if it is set.

                             The name of the default cell, i.e. default.

       SGE_ENABLE_COREDUMP
                      If set, enable core dumps on Linux when  the  admin_user  is  not  root.   Linux  normally
                      disables  core  dumps when the daemon has changed uid or gid.  Setting SGE_ENABLE_COREDUMP
                      in sge_execd's environment defeats that to enable core dumps for  debugging  if  they  are
                      otherwise allowed.  This is typically not a big hazard with SGE, since most information is
                      exposed in the spool area anyhow.  Dumps will appear in the qmaster spool directory, which
                      need not be world-readable.
                      On Solaris, coreadm(1) may be used to enable such dumps.

       SGE_CGROUP_DIR If  Linux  cgroups  handling  is enabled, this variable names a directory under the cgroup
                      mount point in which to create job-specific directories.  The default is sge.SGE_CELL  so,
                      for instance, the cpuset cgroup for a job might be /sys/fs/cgroup/cpuset/sge.default/123.

FILES

       sgepasswd  contains  a  list of user names and their corresponding encrypted passwords. If available, the
       password file will be used by sge_shepherd. To change the contents of this file please use the  sgepasswd
       command. It is not advised to change that file manually.
       <execd_spool>/job_dir/<job_id>     job specific directory
       <sge_root>/<cell>/common/sgepasswd
                                          Password information used on Microsoft Windows hosts.  See
       sgepasswd(5).

SEE ALSO

       sge_intro(1), sge_conf(5), sge_status(5), remote_startup(5), sgepasswd(5), sge_execd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.

SGE 8.1.3pre                              $Date: 2007-07-19 09:04:33 $                           SGE_SHEPHERD(8)