Published the 2024-05-31 on Willow's site

How to keep clean your Docker bind-mounts files permissions?

If you ever used Docker to develop for any project, you might have already discovered some mess within your filesystem:

$ ls -lh
drwxr-sr-x 313 root root  12K May 29 19:18 node_modules

Very similar situations are produced with a PHP project, on vendor/ or var/ folders by example.

We all tried to do some kind of manual user ID mapping. If the user "www-data", from inside our containers, use the same user ID than our host user, the process should produce inodes that our regular user have access to. Mhh?

Also, maybe it is possible, somehow, to configure our projects, or Docker itself, to keep clean our host filesystem. Can we configure docker with this "user-namespace"? Or better, could we run Docker as rootless? What about Podman?

There should be some way to do so, right? Quick answer: Have you considered wanting something else?


To elaborate my thoughts, I'll give some context, demonstrate with examples, and enumerate facts. We'll see what is possible, and what is not.

Some context first: My username is "stacy", my uid and gid are 1000. I'll test (1) a rootfull Docker vanilla experience, (2) rootfull Docker with user-namespace mapping, (3) rootless Docker, and (4) rootless Podman. I'm starting with an empty folder. We'll be doing the same commands, and we'll check what are the consequences over the filesystem.


Docker rootfull vanilla:

$ doas docker run --rm -it -v ./data/:/var/src -w /var/src alpine sh
/var/src # whoami
root
/var/src # id
uid=0(root) gid=0(root) groups=0(root),...
/var/src # mkdir foo
/var/src # ls -lhn
drwxr-sr-x    2 0        0           4.0K May 29 19:27 foo
/var/src # exit
$ ls -lhn
drwxr-sr-x 3 0 0 4.0K May 29 21:27 data
$ ls -lhn data/
drwxr-sr-x 2 0 0 4.0K May 29 21:37 foo
$ rmdir data/foo/
rmdir: failed to remove 'data/foo/': Permission denied

The data/ folder has been created by the Docker daemon, as root, while creating the bind-mount. The data/foo/ directory is also owned by root, created by the root process from the container, that is also root on the host system. My regular user can't remove those directories. That is the default experience.

In this situation there is no user mapping at all. 0 for the containers is also 0 for the host. Then: what if we give a user ID explicitly?

To do so we have to create data/ ourselves, because 1000 will not have permissions over /var/src otherwise.

$ mkdir data
$ doas docker run --rm -it -u 1000:1000 -v ./data/:/var/src -w /var/src alpine sh
/var/src $ mkdir foo
/var/src $ ls -lhn
drwxr-sr-x    2 1000     1000        4.0K May 29 19:54 foo
/var/src $ exit
$ ls -lhn
drwxr-sr-x 3 1000 1000 4.0K May 29 21:54 data
$ ls -lhn data
drwxr-sr-x 2 1000 1000 4.0K May 29 21:54 foo
$ rmdir data/foo/

This is generally what I encounter on projects I work for. But it brings a lot of constraints we will demonstrate, because the daemon configuration may vary, or because some Docker images decide otherwise.


Docker rootfull with user-namespace over "stacy":

$ cat /etc/sub[u,g]id
stacy:100000:65536
stacy:100000:65536
$ doas docker run --rm -it -v ./data/:/var/src -w /var/src alpine sh
/var/src # whoami
root
/var/src # id
uid=0(root) gid=0(root) groups=0(root),...
/var/src # mkdir foo
/var/src # ls -lhn
drwxr-sr-x    2 0        0           4.0K May 29 19:33 foo
/var/src # exit
$ ls -lhn
drwxr-sr-x 3 100000 100000 4.0K May 29 21:33 data
$ ls -lhn data/
drwxr-sr-x 2 100000 100000 4.0K May 29 21:33 foo
$ rmdir data/foo
rmdir: failed to remove 'data/foo': Permission denied

Here, we still are considered "root" from the container point of view. But we are actually creating directories with owner IDs 100000. This means that our regular user still can't remove those directories.

But it is worse than that: what happens if you try to give your IDs explicitly now?

$ doas docker run --rm -it -u 1000:1000 -v ./data/:/var/src -w /var/src alpine sh
/var/src $ mkdir foo
mkdir: can't create directory 'foo': Permission denied
/var/src $ exit
$ doas rm -rf data/
$ mkdir data
$ doas docker run --rm -it -u 1000:1000 -v ./data/:/var/src -w /var/src alpine sh
/var/src $ mkdir foo
mkdir: can't create directory 'foo': Permission denied

Hehe, surprised? Now that we are using the user-namespace, the ID 1000 from the container point of view does not match your host user ID at all.

In that situation, any preparation of the filesystem is expected to fail. I recommend avoiding bind-mounts completely when Docker is used this way.


Docker rootless:

$ docker run --rm -it -v ./data/:/var/src -w /var/src alpine sh
/var/src # whoami
root
/var/src # id
uid=0(root) gid=0(root) groups=0(root),...
/var/src # mkdir foo
/var/src # ls -lhn
drwxr-sr-x    2 0        0           4.0K May 29 19:40 foo
/var/src # exit
$ ls -lhn
drwxr-sr-x 3 1000 1000 4.0K May 29 21:40 data
$ ls -lhn data/
drwxr-sr-x 2 1000 1000 4.0K May 29 21:40 foo
$ rmdir data/foo

Okay, you might think we won right? We still are "root" from the container point of view, but we are actually creating directories as 1000.

Rootless also use the user-namespace for every user, except for "root". So problems comes when we use another user:

$ mkdir data
$ docker run --rm -it -u 1000:1000 -v ./data/:/var/src -w /var/src alpine sh
/var/src $ whoami
whoami: unknown uid 1000
/var/src $ id
uid=1000 gid=1000 groups=1000
/var/src $ mkdir foo
mkdir: can't create directory 'foo': Permission denied

Rootless brings the same constraints as rootfull with user-namespace. It is now impossible to make IDs to match your host ones, except for the "root" user.

Unfortunately, some Docker images just refuse to works as "root". I've encountered the situation with OpenSearch images. But we can also argue that php-fpm can delegate its process pool to some "www-data".

Fortunately, Podman give us some tools to deal with rootless:


Podman rootless:

$ mkdir data
$ whoami
stacy
$ podman unshare
$ whoami
root
$ id
uid=0(root) gid=0(root) groups=0(root),...
$ ls -lh
drwxr-sr-x 2 root root 4.0K May 29 22:08 data
$ ls -lhn
drwxr-sr-x 3 0 0 4.0K May 29 22:09 data
$ chown 1000:1000 data
$ ls -lh
drwxr-sr-x 2 stacy stacy 4.0K May 29 22:08 data
$ ls -lhn
drwxr-sr-x 3 1000 1000 4.0K May 29 22:09 data
$ exit
exit
$ ls -lh
drwxr-sr-x 2 100999 100999 4.0K May 29 22:08 data
$ docker run --rm -it -u 1000:1000 -v ./data/:/var/src -w /var/src alpine sh
/var/src # mkdir foo
/var/src # exit
$ ls -lhn data
drwxr-sr-x 2 100999 100999 4.0K May 29 22:09 foo
$ rmdir data/foo
rmdir: failed to remove 'data/foo': Permission denied

podman unshare is a wrapper over unshare. It runs the user command within the user-namespace that Podman uses. While we are "unshared", we chown 1000:1000 data. From the host filesystem perspective, data/ now belongs to 100999:100999. This looks like a mess, but it is actually what the container needs to write to this folder.

To be said vulgarly, podman unshare give root privileges to the user over its own user-namespace. The user is now free to manipulate the inodes permissions, to prepare the filesystem for the containers.


What did we learn?

It is impossible to reliably map the user ID to your host one while doing bind-mounts. Because if the daemon is configured to use user-namespace, or if it runs rootless, the IDs will mismatch.

It is always easier when the containers use "root", because it makes more setups viable. But because some images may decide otherwise, user-namespace brings too many constraints over the filesystem.

Podman is the more appropriate solution to the rootless way. First because it implements it from a longer time, so we can expect a better support. But also because it provides podman unshare, so that the user never need root privileges to manipulate the filesystem too.


And to conclude to the introduction passive-aggressive hot-takes with more details. Linux is a multi-user operating system. Docker is in no way a black-box that can abstract, or iron out, the Linux ACLs. We have to deal with them, so let's do it in a correct and secure way.

RSS feed

If this post inspired you, feels free to leave a comment!

The Cogitatis mailing list