Skip to content

Set up CVMFS alien cache on NFS#1839

Closed
kysrpex wants to merge 10 commits intousegalaxy-eu:masterfrom
kysrpex:cvmfs_cache
Closed

Set up CVMFS alien cache on NFS#1839
kysrpex wants to merge 10 commits intousegalaxy-eu:masterfrom
kysrpex:cvmfs_cache

Conversation

@kysrpex
Copy link
Contributor

@kysrpex kysrpex commented Jan 22, 2026

Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level.

To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache (that means to completely dump CVMFS one has to walk /cvmfs/singularity.galaxyproject.org/).

The alien cache directory [...] can be located anywhere including cluster and network file systems. If configured, all data chunks are stored there. CernVM-FS ensures atomic access to the cache directory. It is safe to have the alien directory shared by multiple CernVM-FS processes, and it is safe to unlink files from the alien cache directory anytime. The contents of files, however, must not be touched by third-party programs.

Requires galaxyproject/ansible-cvmfs#85 and re-syncing our fork of galaxyproject.cvmfs.

Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level.

To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache.

> The alien cache directory [...] can be located anywhere including cluster and network file systems. If configured, all data chunks are stored there. CernVM-FS ensures atomic access to the cache directory. It is safe to have the alien directory shared by multiple CernVM-FS processes, and it is safe to unlink files from the alien cache directory anytime. The contents of files, however, must not be touched by third-party programs.
@kysrpex kysrpex self-assigned this Jan 22, 2026
@kysrpex
Copy link
Contributor Author

kysrpex commented Jan 22, 2026

This PR is only meant to show how it's done (and it was used to test this configuration). Most likely it will be closed in favor of a new PR in galaxyproject/ansible-cvmfs and another new (and shorter) PR in this repository.

@bgruening
Copy link
Member

It changes do not look very complex, we could also deploy it - collect some experience and then push it upstream. If you think that's easier.

@kysrpex
Copy link
Contributor Author

kysrpex commented Jan 22, 2026

Unfortunately I noticed in the meantime that it's not ready for deployment.

While mount -t cvmfs data.galaxyproject.org /mnt works flawlessly, mounting via autofs yields Jan 22 16:22:40 cvmfs-client-1 cvmfs2[21981]: (data.galaxyproject.org) Failed to setup posix cache 'nfs' in /data/db/cvmfs_cache/: Permission denied (9 - cache directory/plugin problem).

I could not figure out yet what triggers the permission error when mounting via autofs.

@bgruening
Copy link
Member

does mount -t cvmfs data.galaxyproject.org /data/db/cvmfs_cache/ work?

Switch from the `usegalaxy_eu.cvmfs_cache` role to a version of the `galaxyproject.cvmfs` role supporting the configuration of arbitrary CVMFS parameters.
@kysrpex kysrpex marked this pull request as ready for review January 29, 2026 15:15
CVMFS_CACHE_disk_QUOTA_LIMIT: 50000

CVMFS_CACHE_nfs_TYPE: posix
CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pre-created an empty cache there using cvmfs2 __MK_ALIEN_CACHE__ /data/db/cvmfs_cache/ $(id -u cvmfs) $(id -g cvmfs), but we might might want to use A400, see this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this cache using rm -rf /data/db/cvmfs_cache/.

CVMFS_CACHE_cvmfs_LOWER_READONLY: no

CVMFS_CACHE_disk_TYPE: posix
CVMFS_CACHE_disk_QUOTA_LIMIT: 50000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also afford to use ~100GB rather than 50GB given the new headnodes have quite large disks (although I am not sure the images have to be read at all).

Let's maybe leave it as it is and check how much does the cache grow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sn09 should not use much cvmfs right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I also don't understand. Is the headnote writing to the cache? I think it should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I also don't understand. Is the headnote writing to the cache? I think it should not.

I am unsure about it, certainly it doesn't look like the headnode is reading whole files at all from CVMFS.

root@sn09:~$ strace -f -t -e trace=file -p 1963558 2>&1 | grep "/cvmfs"
[pid 1966037] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/",  <unfinished ...>
[pid 1966037] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966039] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966039] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966038] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966038] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
...
[pid 1966039] 14:16:05 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966038] 14:17:01 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966038] 14:17:02 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966037] 14:17:15 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966037] 14:17:15 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>

Nevertheless, we still need some cache space for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, let's make it maybe 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although I guess it's not "trashing".

root@sn09:/home/centos$ du -h /var/lib/cvmfs/ | tail -n1
3.0G	/var/lib/cvmfs/

Copy link
Contributor

@gsaudade99 gsaudade99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me. I understand that most of the variables are just there for documentation purposes.

My opinion is let the clients populate on the go even if it means a slower first starts. We have super fast NICs and IO, so the impact should be almost not noticeable...

CVMFS_CACHE_cvmfs_LOWER_READONLY: no

CVMFS_CACHE_disk_TYPE: posix
CVMFS_CACHE_disk_QUOTA_LIMIT: 50000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sn09 should not use much cvmfs right?

CVMFS_CACHE_PRIMARY: cvmfs

CVMFS_CACHE_cvmfs_TYPE: tiered
CVMFS_CACHE_cvmfs_UPPER: disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point sn09 could even use memory we have 300G of unused RAM

Copy link
Contributor

@mira-miracoli mira-miracoli Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this parameter only sets the name for the "Upper" cache manager instance
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#tiered-cache
From how I understand the docs without a plugin there is only posix.
However they recommend to set CVMFS_MEMCACHE_SIZE=256 (MB). Maybe we can also do that (or increase it even more)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have an example how to use memory here

Copy link
Contributor Author

@kysrpex kysrpex Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this comment, if at the moment there seem to be no issues every time a container job is launched let's leave the memory cache for a later stage if the need arises.

If CVMFS_MEMCACHE_SIZE is "independent" from the tiered cache system, then let's increase it. Although I am not sure it will have an effect even if "it works", since they only recommend it for pure NFS deployments.

CVMFS_CACHE_nfs_TYPE: posix
CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/
CVMFS_CACHE_nfs_SHARED: no
CVMFS_CACHE_nfs_QUOTA_LIMIT: -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we need to provide a share, that is >= all cvmfs repos that we mount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only needed if CVMFS_CACHE_nfs_SHARED is set to yes, i.e. if the nfs cache manager instance belonged to the shared cache.

However, an alien cache cannot join the shared cache (see docs),

Since the alien cache is unmanaged, there is no automatic quota management provided by CernVM-FS; the alien cache directory is ever-growing. The CVMFS_ALIEN_CACHE requires CVMFS_QUOTA_LIMIT=-1 and CVMFS_SHARED_CACHE=no.

probably because CVMFS lacks an adequate synchronization mechanism to manage the quota in a distributed system. That's also why a script to manage the size of this cache would be needed.

Copy link
Contributor

@mira-miracoli mira-miracoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try once we have the share, looks cool 🚀

The setting may either slightly improve performance or have no noticeable effect. But it won't hurt.

> The default settings in CernVM-FS are tailored to the normal, non-NFS use case. For decent performance in the NFS deployment, the amount of memory given to the metadata cache should be increased. By default, this is 16M. It can be increased, for instance, to 256M by setting `CVMFS_MEMCACHE_SIZE` to 256. Furthermore, the maximum number of download retries should be increased to at least 2.

See https://cvmfs.readthedocs.io/en/2.13/cpt-configure.html#tuning.
It doesn't look like the headnode is reading whole files at all from CVMFS. Nevertheless, some cache space is needed for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, make it 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although it does not seem to be trashing.
@kysrpex
Copy link
Contributor Author

kysrpex commented Feb 3, 2026

Let's not merge until the storage situation is clarified (see https://github.com/usegalaxy-eu/issues/issues/157). A PR usegalaxy-eu/vgcn-infrastructure-playbook#58 for vgcn-infrastructure-playbook has been created too.

Use the new cvmfs08 share to store the CVMFS alien cache on NFS.
CVMFS_CACHE_disk_QUOTA_LIMIT: 5000

CVMFS_CACHE_nfs_TYPE: posix
CVMFS_CACHE_nfs_ALIEN: /data/cvmfs08/cache/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this should be rewritten in terms of variables from the mounts repository.

@kysrpex
Copy link
Contributor Author

kysrpex commented Feb 10, 2026

We cannot merge this PR at the moment because the uid for the cvmfs user is different in the compute nodes.

@kysrpex kysrpex closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants