5. Frequently asked questions (FAQ)

5.1. My app needs to write to /var/log, /run, etc.

Because the image is mounted read-only by default, log files, caches, and other stuff cannot be written anywhere in the image. You have three options:

  1. Configure the application to use a different directory. /tmp is often a good choice, because it’s shared with the host and fast.
  2. Use RUN commands in your Dockerfile to create symlinks that point somewhere writeable, e.g. /tmp, or /mnt/0 with ch-run --bind.
  3. Run the image read-write with ch-run -w. Be careful that multiple containers do not try to write to the same image files.

5.2. Tarball build fails with “No command specified”

The full error from ch-docker2tar or ch-build2dir is:

docker: Error response from daemon: No command specified.

You will also see it with various plain Docker commands.

This happens when there is no default command specified in the Dockerfile or any of its ancestors. Some base images specify one (e.g., Debian) and others don’t (e.g., Alpine). Docker requires this even for commands that don’t seem like they should need it, such as docker create (which is what trips up Charliecloud).

The solution is to add a default command to your Dockerfile, such as CMD ["true"].

5.3. --uid 0 lets me read files I can’t otherwise!

Some permission bits can give a surprising result with a container UID of 0. For example:

$ whoami
reidpr
$ echo surprise > ~/cantreadme
$ chmod 000 ~/cantreadme
$ ls -l ~/cantreadme
---------- 1 reidpr reidpr 9 Oct  3 15:03 /home/reidpr/cantreadme
$ cat ~/cantreadme
cat: /home/reidpr/cantreadme: Permission denied
$ ch-run /var/tmp/hello cat ~/cantreadme
cat: /home/reidpr/cantreadme: Permission denied
$ ch-run --uid 0 /var/tmp/hello cat ~/cantreadme
surprise

At first glance, it seems that we’ve found an escalation – we were able to read a file inside a container that we could not read on the host! That seems bad.

However, what is really going on here is more prosaic but complicated:

  1. After unshare(CLONE_NEWUSER), ch-run gains all capabilities inside the namespace. (Outside, capabilities are unchanged.)
  2. This include CAP_DAC_OVERRIDE, which enables a process to read/write/execute a file or directory mostly regardless of its permission bits. (This is why root isn’t limited by permissions.)
  3. Within the container, exec(2) capability rules are followed. Normally, this basically means that all capabilities are dropped when ch-run replaces itself with the user command. However, if EUID is 0, which it is inside the namespace given --uid 0, then the subprocess keeps all its capabilities. (This makes sense: if root creates a new process, it stays root.)
  4. CAP_DAC_OVERRIDE within a user namespace is honored for a file or directory only if its UID and GID are both mapped. In this case, ch-run maps reidpr to container root and group reidpr to itself.
  5. Thus, files and directories owned by the host EUID and EGID (here reidpr:reidpr) are available for all access with ch-run --uid 0.

This isn’t a problem. The quirk applies only to files owned by the invoking user, because ch-run is unprivileged outside the namespace, and thus he or she could simply chmod the file to read it. Access inside and outside the container remains equivalent.

References:

5.4. Why is /bin being added to my $PATH?

Newer Linux distributions replace some root-level directories, such as /bin, with symlinks to their counterparts in /usr.

Some of these distributions (e.g., Fedora 24) have also dropped /bin from the default $PATH. This is a problem when the guest OS does not have a merged /usr (e.g., Debian 8 “Jessie”).

While Charliecloud’s general philosophy is not to manipulate environment variables, in this case, guests can be severely broken if /bin is not in $PATH. Thus, we add it if it’s not there.

Further reading:

5.5. How does setuid mode work?

As noted above, ch-run has a transition mode that uses setuid-root privileges instead of user namespaces. The goal of this mode is to let sites evaluate Charliecloud even on systems that do not have a Linux kernel that supports user namespaces. We plan to remove this code once user namespaces are more widely available, and we encourage sites to use the unprivileged, non-setuid mode in production.

We haven taken care to (1) drop privileges temporarily upon program start and only re-acquire them when needed and (2) drop privileges permanently before executing user code. In order to reliably verify the latter, ch-run in setuid mode will refuse to run if invoked directly by root.

It may be better to use capabilities and setcap rather than setuid. However, this also relies on newer features, which would hamper the goal of broadly available testing. For example, NFSv3 does not support extended attributes, which are required for setcap files.

Dropping privileges safely requires care. We follow the recommendations in “Setuid demystified” as well as the system call ordering and privilege drop verification recommendations of the SEI CERT C Coding Standard.

We do not worry about the Linux-specific fsuid and fsgid, which track euid/egid unless specifically changed, which we don’t do. Kernel bugs have existed that violate this invariant, but none are recent.

5.6. ch-run fails with “can’t re-mount image read-only”

Normally, ch-run re-mounts the image directory read-only within the container. This fails if the image resides on certain filesystems, such as NFS (see issue #9). There are two solutions:

  1. Unpack the image into a different filesystem, such as tmpfs or local disk. Consult your local admins for a recommendation. Note that tmpfs is a lot faster than Lustre.
  2. Use the -w switch to leave the image mounted read-write. Note that this has may have an impact on reproducibility (because the application can change the image between runs) and/or stability (if there are multiple application processes and one writes a file in the image that another is reading or writing).

5.7. Which specific sudo commands are needed?

For running images, sudo is not needed at all.

For building images, it depends on what you would like to support. For example, do you want to let users build images with Docker? Do you want to let them run the build tests?

We do not maintain specific lists, but you can search the source code and documentation for uses of sudo and $DOCKER and evaluate them on a case-by-case basis. (The latter includes sudo if needed to invoke docker in your environment.) For example:

$ find . \(   -type f -executable \
           -o -name Makefile \
           -o -name '*.bats' \
           -o -name '*.rst' \
           -o -name '*.sh' \) \
         -exec egrep -H '(sudo|\$DOCKER)' {} \;