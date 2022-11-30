The developers are working on Composefs, which is a way to construct and use read-only images.

Composefs allows sharing of file data between images and it has dm-verity like validation on read.

The duo aims to use a composefs mount as the lower directory in an overlay mount, with the upper directory being the container work dir.

Red Hat developers Giuseppe Scrivano and Alexander Larsson introduced a new project for the first time. Composefs is a new method to construct and use read-only images that are used similarly to how users would use, for example, loop-back mounted squashfs images.

Opportunistic sharing

Composefs is a native Linux file system designed to help sharing filesystem contents, as well as ensuring said content is not modified. Composefs has two fundamental features. It allows file data sharing between images, on disk and in page cache, and it has dm-verity like validation on read.

When the kernel is reading an image file, which contains all information about directory and file metadata plus references to the backing files by name, it actually reads the backing file. Since the backing file is content-addressed, the objects directory can be shared for multiple images. Any files that happen to have the same content are shared. The team refers to this as opportunistic sharing.

During the validation, the object files have fs-verity enabled and they are named by their fs-verity digest. The generated filesystem image can contain the expected digest for the backing files. If the filesystem is mounted with the verity_check option, then open will fail when the backing file digest is incorrect. If the open succeeds, any other on-disk file changes will be detected by fs-verity. It protects the existing fs-verity functionality to protect against changes in file contents while adding on top of its protection against changes in filesystem metadata and structure. Alexander Larsson, Senior Principal Software Engineer at Red Hat said,

« So, why do we want this? There are two initial user cases. First of all we want to use the opportunistic sharing for podman container layers. The idea is to use a composefs mount as the lower directory in an overlay mount, with the upper directory being the container work dir. This will allow automatic file-level disk and page-cache sharing between any two images, independent of details like the permissions and timestamps of the files and the origin of the images. Secondly we are interested in using the verification aspects of composefs in the ostree project. Ostree already uses a content-addressed object store, but it is currently referenced to by hardlink farms. The object store and the trees that reference it are signed and verified at download time, but there is no runtime verification. If we replace the hardlink farm with a composefs image that points into the existing object store we can use the verification to implement runtime verification. In fact, the tooling to create composefs images is fully reproducible, so all we need is to add the fs-verity digest of the composefs image into the ostree commit metadata. Then the image can be reconstructed from the ostree commit, generating a composefs image with the same fs-verity digest. These are the use cases we’re currently interested in, but there seems to be a wealth of other possible uses. For example, many systems use loopback mounts for images (like lxc or snap), and these could take advantage of the opportunistic sharing. We’ve also talked about using fuse to implement a local cache for the backing files. I.e. you would have a second basedir be a fuse filesystem, and on lookup failure in the first basedir the fuse one triggers a download which is also saved in the first dir for later lookups. There are many interesting possibilities here. »

User space tools

The directory tools/ contains some userspace tools to create the binary blob to pass to the client. They are all experimental and lack documentation.

mkcomposefs : Creates a composefs image given a directory pathname. Can also compute digests and create a content store directory.

writer-json : convert from a CRFS metadata file to the binary blob.

dump : prints the content of the binary blob.

ostree-convert-commit.py: converts an OSTree commit into a CRFS config file that writer-json can use.

Kernel module

How to build:

make -C $KERNEL_SOURCE modules M=$PWD && make -C $KERNEL_SOURCE modules_install M=$PWD insmod /lib/modules/$(uname -r)/extra/composefs.ko

Once it is loaded, it can be used as:

mount /path/to/blob -t composefs -o basedir=$BASE_DIR /mnt

Mount options:

basedir : is the directory to use as a base when resolving relative content paths.

verity_check=0,1,2 : When to verify backing file fs-verity: 0 == never, 1 == if specified in image, 2 == always and require it in image.

digest: A fs-verity sha256 digest that the image file must match. If set, verity_check defaults to 2.