Ubuntu Manpage: ch-run - Run a command in a Charliecloud container

Provided by: charliecloud-runtime_0.26-1_amd64

NAME

       ch-run - Run a command in a Charliecloud container

SYNOPSIS

          $ ch-run [OPTION...] IMAGE -- CMD [ARG...]

DESCRIPTION

Run command CMD in a fully unprivileged Charliecloud container using the image located at IMAGE, which
can be either a directory or, if the proper support is enabled, a SquashFS archive.

-b, --bind=SRC[:DST]
Bind-mount SRC at guest DST. The default destination if not specified is to use the same path
as the host; i.e., the default is --bind=SRC:SRC. Can be repeated.

If --write is given and DST does not exist, it will be created as an empty directory. However,
DST must be entirely within the image itself; DST cannot enter a previous bind mount. For
example, --bind /foo:/tmp/foo will fail because /tmp is shared with the host via bind-mount
(unless $TMPDIR is set to something else or --private-tmp is given).

Most images do have ten directories /mnt/[0-9] already available as mount points.

Symlinks in DST are followed, and absolute links can have surprising behavior. Bind-mounting
happens after namespace setup but before pivoting into the container image, so absolute links
use the host root. For example, suppose the image has a symlink /foo -> /mnt. Then,
--bind=/bar:/foo will bind-mount on the host’s /mnt, which is inaccessible on the host because
namespaces are already set up and also inaccessible in the container because of the subsequent
pivot into the image. Currently, this problem is only detected when DST needs to be created:
ch-run will refuse to follow absolute symlinks in this case, to avoid directory creation
surprises.

-c, --cd=DIR
Initial working directory in container.

--ch-ssh
Bind ch-ssh(1) into container at /usr/bin/ch-ssh.

--env-no-expand
don’t expand variables when using --set-env

-g, --gid=GID
Run as group GID within container.

-j, --join
Use the same container (namespaces) as peer ch-run invocations.

--join-pid=PID
Join the namespaces of an existing process.

--join-ct=N
Number of ch-run peers (implies --join; default: see below).

--join-tag=TAG
Label for ch-run peer group (implies --join; default: see below).

-m, --mount=DIR
Use DIR for the SquashFS mount point, which must already exist. If not specified, the default
is /var/tmp/$USER.ch/mnt, which will be created if needed.

--no-home
By default, your host home directory (i.e., $HOME) is bind-mounted at guest /home/$USER. This
is accomplished by mounting a new tmpfs at /home, which hides any image content under that
path. If this is specified, neither of these things happens and the image’s /home is exposed
unaltered.

--no-passwd
By default, temporary /etc/passwd and /etc/group files are created according to the UID and GID
maps for the container and bind-mounted into it. If this is specified, no such temporary files
are created and the image’s files are exposed.

-t, --private-tmp
By default, the host’s /tmp (or $TMPDIR if set) is bind-mounted at container /tmp. If this is
specified, a new tmpfs is mounted on the container’s /tmp instead.

--set-env, --set-env=FILE, --set-env=VAR=VALUE
Set environment variable(s). With:

• no argument: as listed in file /ch/environment within the image. It is an error if the
file does not exist or cannot be read. (Note that with SquashFS images, it is not
currently possible to use other files within the image.)

• FILE (i.e., no equals in argument): as specified in file at host path FILE. Again, it is
an error if the file cannot be read.

• NAME=VALUE (i.e., equals sign in argument): set variable NAME to VALUE.

See below for details on how environment variables work in ch-run.

-u, --uid=UID
Run as user UID within container.

--unset-env=GLOB
Unset environment variables whose names match GLOB.

-v, --verbose
Be more verbose (can be repeated).

-w, --write
Mount image read-write (by default, the image is mounted read-only).

-?, --help
Print help and exit.

--usage
Print a short usage message and exit.

-V, --version
Print version and exit.

Note: Because ch-run is fully unprivileged, it is not possible to change UIDs and GIDs within the
container (the relevant system calls fail). In particular, setuid, setgid, and setcap executables do not
work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS, 1) to disable these executables within the
container. This does not reduce functionality but is a “belt and suspenders” precaution to reduce the
attack surface should bugs in these system calls or elsewhere arise.

IMAGE FORMAT

ch-run supports two different image formats.

The first is a simple directory that contains a Linux filesystem tree. This can be accomplished by:

• ch-convert directly from ch-image or another builder to a directory.

• Charliecloud’s tarball workflow: build or pull the image, ch-convert it to a tarball, transfer the
tarball to the target system, then ch-convert the tarball to a directory.

• Manually mount a SquashFS image, e.g. with squashfuse(1) and then un-mount it after run with fusermount
-u.

• Any other workflow that produces an appropriate directory tree.

The second is a SquashFS image archive mounted internally by ch-run, available if it’s linked with the
optional libsquashfuse_ll. ch-run mounts the image filesystem, services all FUSE requests, and unmounts
it, all within ch-run. See --mount above to set the mount point location.

Prior versions of Charliecloud provided wrappers for the squashfuse and squashfuse_ll SquashFS mount
commands and fusermount -u unmount command. We removed these because we concluded they had minimal
value-add over the standard, unwrapped commands.

WARNING:
Currently, Charliecloud unmounts the SquashFS filesystem when user command CMD’s process exits. It
does not monitor any of its child processes. Therefore, if the user command spawns child processes and
then exits before them (e.g., some daemons), those children will have the image unmounted from
underneath them. In this case, the workaround is to mount/unmount using external tools. We expect to
remove this limitation in a future version.

HOST FILES AND DIRECTORIES AVAILABLE IN CONTAINER VIA BIND MOUNTS

In addition to any directories specified by the user with --bind, ch-run has standard host files and
directories that are bind-mounted in as well.

The following host files and directories are bind-mounted at the same location in the container. These
give access to the host’s devices and various kernel facilities. (Recall that Charliecloud provides
minimal isolation and containerized processes are mostly normal unprivileged processes.) They cannot be
disabled and are required; i.e., they must exist both on host and within the image.

• /dev

• /proc

• /sys

Optional; bind-mounted only if path exists on both host and within the image, without error or warning if
not.

• /etc/hosts and /etc/resolv.conf. Because Charliecloud containers share the host network namespace,
they need the same hostname resolution configuration.

• /etc/machine-id. Provides a unique ID for the OS installation; matching the host works for most
situations. Needed to support D-Bus, some software licensing situations, and likely other use cases.
See also issue #1050.

• /var/lib/hugetlbfs at guest /var/opt/cray/hugetlbfs, and /var/opt/cray/alps/spool. These support
Cray MPI.

• $PREFIX/bin/ch-ssh at guest /usr/bin/ch-ssh. SSH wrapper that automatically containerizes after
connecting.

Additional bind mounts done by default but can be disabled; see the options above.

• $HOME at /home/$USER (and image /home is hidden). Makes user data and init files available.

• /tmp (or $TMPDIR if set) at guest /tmp. Provides a temporary directory that persists between
container runs and is shared with non-containerized application components.

• temporary files at /etc/passwd and /etc/group. Usernames and group names need to be customized for
each container run.

MULTIPLE PROCESSES IN THE SAME CONTAINER WITH --JOIN

By default, different ch-run invocations use different user and mount namespaces (i.e., different
containers). While this has no impact on sharing most resources between invocations, there are a few
important exceptions. These include:

1. ptrace(2), used by debuggers and related tools. One can attach a debugger to processes in descendant
namespaces, but not sibling namespaces. The practical effect of this is that (without --join), you
can’t run a command with ch-run and then attach to it with a debugger also run with ch-run.

2. Cross-memory attach (CMA) is used by cooperating processes to communicate by simply reading and
writing one another’s memory. This is also not permitted between sibling namespaces. This affects
various MPI implementations that use CMA to pass messages between ranks on the same node, because it’s
faster than traditional shared memory.

--join is designed to address this by placing related ch-run commands (the “peer group”) in the same
container. This is done by one of the peers creating the namespaces with unshare(2) and the others
joining with setns(2).

To do so, we need to know the number of peers and a name for the group. These are specified by additional
arguments that can (hopefully) be left at default values in most cases:

• --join-ct sets the number of peers. The default is the value of the first of the following environment
variables that is defined: OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE, SLURM_CPUS_ON_NODE.

• --join-tag sets the tag that names the peer group. The default is environment variable SLURM_STEP_ID,
if defined; otherwise, the PID of ch-run’s parent. Tags can be re-used for peer groups that start at
different times, i.e., once all peer ch-run have replaced themselves with the user command, the tag can
be re-used.

Caveats:

• One cannot currently add peers after the fact, for example, if one decides to start a debugger after
the fact. (This is only required for code with bugs and is thus an unusual use case.)

• ch-run instances race. The winner of this race sets up the namespaces, and the other peers use the
winner to find the namespaces to join. Therefore, if the user command of the winner exits, any
remaining peers will not be able to join the namespaces, even if they are still active. There is
currently no general way to specify which ch-run should be the winner.

• If --join-ct is too high, the winning ch-run’s user command exits before all peers join, or ch-run
itself crashes, IPC resources such as semaphores and shared memory segments will be leaked. These
appear as files in /dev/shm/ and can be removed with rm(1).

• Many of the arguments given to the race losers, such as the image path and --bind, will be ignored in
favor of what was given to the winner.

ENVIRONMENT VARIABLES

       ch-run  leaves  environment  variables  unchanged, i.e. the host environment is passed through unaltered,
       except:

       • limited tweaks to avoid significant guest breakage;

       • user-set variables via --set-env;

       • user-unset variables via --unset-env; and

       • set CH_RUNNING.

       This section describes these features.

       The default tweaks happen first, then --set-env and --unset-env in the order  specified  on  the  command
       line,  and  then  CH_RUNNING.  The two options can be repeated arbitrarily many times, e.g. to add/remove
       multiple variable sets or add only some variables in a file.

   Default behavior
       By default, ch-run makes the following environment variable changes:

       • $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out  that  it’s  in  an  unprivileged
         container  and  what namespaces are active without this hint, that can be messy, and there is no way to
         tell that it’s a Charliecloud container specifically. This  variable  makes  such  a  test  simple  and
         well-defined. (Note: This variable is unaffected by --unset-env.)

       • $HOME:  If the path to your home directory is not /home/$USER on the host, then an inherited $HOME will
         be incorrect inside the guest. This confuses some software, such as Spack. Thus,  we  change  $HOME  to
         /home/$USER, unless --no-home is specified, in which case it is left unchanged.

       • $PATH:  Newer  Linux  distributions replace some root-level directories, such as /bin, with symlinks to
         their counterparts in /usr.

         Some of these distributions (e.g., Fedora 24) have also dropped /bin from the default $PATH. This is  a
         problem  when  the guest OS does not have a merged /usr (e.g., Debian 8 “Jessie”). Thus, we add /bin to
         $PATH if it’s not already present.

         Further reading:

            • The case for the /usr Merge

            • Fedora

            • Debian

       • $TMPDIR: Unset, because this is almost certainly a host path, and that host path is made  available  in
         the guest at /tmp unless --private-tmp is given.

   Setting variables with --set-env
       The  purpose  of --set-env is to set environment variables within the container. Values given replace any
       already in the environment (i.e., inherited from the host shell) or set by earlier --set-env.  This  flag
       takes an optional argument with two possible forms:

       1. If the argument contains an equals sign (=, ASCII 61), that sets an environment variable directly. For
          example, to set FOO to the string value bar:

             $ ch-run --set-env=FOO=bar ...

          Single  straight  quotes around the value (', ASCII 39) are stripped, though be aware that both single
          and double quotes are also interpreted by the shell. For example, this example is similar to the prior
          one; the double quotes are removed by the shell and the single quotes are removed by ch-run:

             $ ch-run --set-env="'BAZ=qux'" ...

       2. If the argument does not contain an equals sign, it is a host path to a file containing zero  or  more
          variables using the same syntax as above (except with no prior shell processing). This file contains a
          sequence  of  assignments  separated  by  newlines.  Empty  lines  are  ignored,  and  no comments are
          interpreted. (This syntax is designed to accept the output of printenv and be easily produced by other
          simple mechanisms.) For example:

             $ cat /tmp/env.txt
             FOO=bar
             BAZ='qux'
             $ ch-run --set-env=/tmp/env.txt ...

          For directory images only (because the file is read before containerizing), guest paths can  be  given
          by prepending the image path.

       3. If  there  is  no  argument,  the file /ch/environment within the image is used. This file is commonly
          populated by ENV instructions in the Dockerfile. For example, equivalently to form 2:

             $ cat Dockerfile
             [...]
             ENV FOO=bar
             ENV BAZ=qux
             [...]
             $ ch-image build -t foo .
             $ ch-convert foo /var/tmp/foo.sqfs
             $ ch-run --set-env /var/tmp/foo.sqfs -- ...

          (Note the image path is interpreted correctly, not as the --set-env argument.)

          At present, there is no way to use files other than /ch/environment within SquashFS images.

       Environment variables are expanded for values that look like  search  paths,  unless  --env-no-expand  is
       given  prior  to  --set-env.  In  this case, the value is a sequence of zero or more possibly-empty items
       separated by colon (:, ASCII 58). If an item begins with dollar sign ($, ASCII 36), then the rest of  the
       item is the name of an environment variable.  If this variable is set to a non-empty value, that value is
       substituted  for  the  item;  otherwise  (i.e.,  the  variable is unset or the empty string), the item is
       deleted, including a delimiter colon. The purpose of omitting empty expansions  is  to  avoid  surprising
       behavior such as an empty element in $PATH meaning the current directory.

       For  example, to set HOSTPATH to the search path in the current shell (this is expanded by ch-run, though
       letting the shell do it happens to be equivalent):

          $ ch-run --set-env='HOSTPATH=$PATH' ...

       To prepend /opt/bin to this current search path:

          $ ch-run --set-env='PATH=/opt/bin:$PATH' ...

       To  prepend  /opt/bin  to  the  search  path  set  by  the  Dockerfile,  as  retrieved  from  guest  file
       /ch/environment (here we really cannot let the shell expand $PATH):

          $ ch-run --set-env --set-env='PATH=/opt/bin:$PATH' ...

       Examples  of valid assignment, assuming that environment variable BAR is set to bar and UNSET is unset or
       set to the empty string:
                           ────────────────────────────────────────────────────────────────
                             Assignment                    Name    Value
                           ────────────────────────────────────────────────────────────────
                             FOO=bar                       FOO     bar
                           ────────────────────────────────────────────────────────────────
                             FOO=bar=baz                   FOO     bar=baz
                           ────────────────────────────────────────────────────────────────
                             FLAGS=-march=foo -mtune=bar   FLAGS   -march=foo -mtune=bar
                           ────────────────────────────────────────────────────────────────
                             FLAGS='-march=foo             FLAGS   -march=foo -mtune=bar
                             -mtune=bar'
                           ────────────────────────────────────────────────────────────────
                             FOO=$BAR                      FOO     bar
                           ────────────────────────────────────────────────────────────────
                             FOO=$BAR:baz                  FOO     bar:baz
                           ────────────────────────────────────────────────────────────────
                             FOO=                          FOO     empty string
                           ────────────────────────────────────────────────────────────────
                             FOO=$UNSET                    FOO     empty string
                           ────────────────────────────────────────────────────────────────
                             FOO=baz:$UNSET:qux            FOO     baz:qux (not baz::qux)
                           ────────────────────────────────────────────────────────────────
                             FOO=:bar:baz::                FOO     :bar:baz::
                           ────────────────────────────────────────────────────────────────
                             FOO=''                        FOO     empty string
                           ────────────────────────────────────────────────────────────────
                             FOO=''''                      FOO     '' (two single quotes)
                           ┌─────────────────────────────┬───────┬────────────────────────┐
                           │                             │       │                        │
--

EXAMPLES

       Run the command echo hello inside a Charliecloud container using the unpacked image at /data/foo:

          $ ch-run /data/foo -- echo hello
          hello

       Run an MPI job that can use CMA to communicate:

          $ srun ch-run --join /data/foo -- bar

SYSLOG

By default, ch-run logs its command line to syslog. (This can be disabled by configuring with
--disable-syslog.) This includes: (1) the invoking real UID, (2) the number of command line arguments,
and (3) the arguments, separated by spaces. For example:

Dec 10 18:19:08 mybox ch-run: uid=1000 args=7: ch-run -v /var/tmp/00_tiny -- echo hello "wor l}\$d"

Logging is one of the first things done during program initialization, even before command line parsing.
That is, almost all command lines are logged, even if erroneous, and there is no logging of program
success or failure.

Arguments are serialized with the following procedure. The purpose is to provide a human-readable
reconstruction of the command line while also allowing each argument to be recovered byte-for-byte.

• If an argument contains only printable ASCII bytes that are not whitespace, shell metacharacters,
double quote (", ASCII 34 decimal), or backslash (\, ASCII 92), then log it unchanged.

• Otherwise, (a) enclose the argument in double quotes and (b) backslash-escape double quotes,
backslashes, and characters interpreted by Bash (including POSIX shells) within double quotes.

The verbatim command line typed in the shell cannot be recovered, because not enough information is
provided to UNIX programs. For example, echo 'foo' is given to programs as a sequence of two arguments,
echo and foo; the two spaces and single quotes are removed by the shell. The zero byte, ASCII NUL, cannot
appear in arguments because it would terminate the string.

EXIT STATUS

       If there is an error during containerization, ch-run exits with status non-zero. If the user  command  is
       started successfully, the exit status is that of the user command, with one exception: if the image is an
       internally  mounted  SquashFS filesystem and the user command is killed by a signal, the exit status is 1
       regardless of the signal value.

REPORTING BUGS

       If Charliecloud was obtained  from  your  Linux  distribution,  use  your  distribution’s  bug  reporting
       procedures.

       Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues

COPYRIGHT

       2014–2021, Triad National Security, LLC

0.26                                          2022-01-30 10:06 UTC                                     CH-RUN(1)