Set up CVMFS alien cache on NFS by kysrpex · Pull Request #1839 · usegalaxy-eu/infrastructure-playbook

kysrpex · 2026-01-22T14:14:29Z

Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level.

To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache (that means to completely dump CVMFS one has to walk /cvmfs/singularity.galaxyproject.org/).

The alien cache directory [...] can be located anywhere including cluster and network file systems. If configured, all data chunks are stored there. CernVM-FS ensures atomic access to the cache directory. It is safe to have the alien directory shared by multiple CernVM-FS processes, and it is safe to unlink files from the alien cache directory anytime. The contents of files, however, must not be touched by third-party programs.

Requires galaxyproject/ansible-cvmfs#85 and re-syncing our fork of galaxyproject.cvmfs.

Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level. To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache. > The alien cache directory [...] can be located anywhere including cluster and network file systems. If configured, all data chunks are stored there. CernVM-FS ensures atomic access to the cache directory. It is safe to have the alien directory shared by multiple CernVM-FS processes, and it is safe to unlink files from the alien cache directory anytime. The contents of files, however, must not be touched by third-party programs.

…d sn09

kysrpex · 2026-01-22T14:20:07Z

This PR is only meant to show how it's done (and it was used to test this configuration). Most likely it will be closed in favor of a new PR in galaxyproject/ansible-cvmfs and another new (and shorter) PR in this repository.

bgruening · 2026-01-22T14:44:53Z

It changes do not look very complex, we could also deploy it - collect some experience and then push it upstream. If you think that's easier.

kysrpex · 2026-01-22T16:29:06Z

Unfortunately I noticed in the meantime that it's not ready for deployment.

While mount -t cvmfs data.galaxyproject.org /mnt works flawlessly, mounting via autofs yields Jan 22 16:22:40 cvmfs-client-1 cvmfs2[21981]: (data.galaxyproject.org) Failed to setup posix cache 'nfs' in /data/db/cvmfs_cache/: Permission denied (9 - cache directory/plugin problem).

I could not figure out yet what triggers the permission error when mounting via autofs.

bgruening · 2026-01-22T17:02:19Z

does mount -t cvmfs data.galaxyproject.org /data/db/cvmfs_cache/ work?

Switch from the `usegalaxy_eu.cvmfs_cache` role to a version of the `galaxyproject.cvmfs` role supporting the configuration of arbitrary CVMFS parameters.

kysrpex · 2026-01-29T15:17:04Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_disk_QUOTA_LIMIT: 50000
+
+  CVMFS_CACHE_nfs_TYPE: posix
+  CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/


I pre-created an empty cache there using cvmfs2 __MK_ALIEN_CACHE__ /data/db/cvmfs_cache/ $(id -u cvmfs) $(id -g cvmfs), but we might might want to use A400, see this comment.

I removed this cache using rm -rf /data/db/cvmfs_cache/.

kysrpex · 2026-01-29T15:21:31Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_cvmfs_LOWER_READONLY: no
+
+  CVMFS_CACHE_disk_TYPE: posix
+  CVMFS_CACHE_disk_QUOTA_LIMIT: 50000


I think we could also afford to use ~100GB rather than 50GB given the new headnodes have quite large disks (although I am not sure the images have to be read at all).

Let's maybe leave it as it is and check how much does the cache grow.

sn09 should not use much cvmfs right?

This is what I also don't understand. Is the headnote writing to the cache? I think it should not.

This is what I also don't understand. Is the headnote writing to the cache? I think it should not.

I am unsure about it, certainly it doesn't look like the headnode is reading whole files at all from CVMFS.

root@sn09:~$ strace -f -t -e trace=file -p 1963558 2>&1 | grep "/cvmfs" [pid 1966037] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", <unfinished ...> [pid 1966037] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...> [pid 1966039] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1966039] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...> [pid 1966038] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1966038] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...> ... [pid 1966039] 14:16:05 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...> [pid 1966038] 14:17:01 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1966038] 14:17:02 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...> [pid 1966037] 14:17:15 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1966037] 14:17:15 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>

Nevertheless, we still need some cache space for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, let's make it maybe 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although I guess it's not "trashing".

root@sn09:/home/centos$ du -h /var/lib/cvmfs/ | tail -n1 3.0G /var/lib/cvmfs/

gsaudade99

In general looks good to me. I understand that most of the variables are just there for documentation purposes.

My opinion is let the clients populate on the go even if it means a slower first starts. We have super fast NICs and IO, so the impact should be almost not noticeable...

gsaudade99 · 2026-01-29T15:40:28Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_cvmfs_LOWER_READONLY: no
+
+  CVMFS_CACHE_disk_TYPE: posix
+  CVMFS_CACHE_disk_QUOTA_LIMIT: 50000


sn09 should not use much cvmfs right?

gsaudade99 · 2026-01-29T15:45:11Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_PRIMARY: cvmfs
+
+  CVMFS_CACHE_cvmfs_TYPE: tiered
+  CVMFS_CACHE_cvmfs_UPPER: disk


at this point sn09 could even use memory we have 300G of unused RAM

I think this parameter only sets the name for the "Upper" cache manager instance
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#tiered-cache
From how I understand the docs without a plugin there is only posix.
However they recommend to set CVMFS_MEMCACHE_SIZE=256 (MB). Maybe we can also do that (or increase it even more)

They have an example how to use memory here

Given this comment, if at the moment there seem to be no issues every time a container job is launched let's leave the memory cache for a later stage if the need arises.

If CVMFS_MEMCACHE_SIZE is "independent" from the tiered cache system, then let's increase it. Although I am not sure it will have an effect even if "it works", since they only recommend it for pure NFS deployments.

group_vars/cvmfs_clients.yml

roles/usegalaxy_eu.cvmfs_cache/tasks/main.yml

hosts

mira-miracoli · 2026-02-02T14:06:52Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_nfs_TYPE: posix
+  CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/
+  CVMFS_CACHE_nfs_SHARED: no
+  CVMFS_CACHE_nfs_QUOTA_LIMIT: -1


If I understand correctly, we need to provide a share, that is >= all cvmfs repos that we mount?

That's only needed if CVMFS_CACHE_nfs_SHARED is set to yes, i.e. if the nfs cache manager instance belonged to the shared cache.

However, an alien cache cannot join the shared cache (see docs),

Since the alien cache is unmanaged, there is no automatic quota management provided by CernVM-FS; the alien cache directory is ever-growing. The CVMFS_ALIEN_CACHE requires CVMFS_QUOTA_LIMIT=-1 and CVMFS_SHARED_CACHE=no.

probably because CVMFS lacks an adequate synchronization mechanism to manage the quota in a distributed system. That's also why a script to manage the size of this cache would be needed.

The first attempt 673f620 was incomplete.

mira-miracoli

I think we can try once we have the share, looks cool 🚀

The setting may either slightly improve performance or have no noticeable effect. But it won't hurt. > The default settings in CernVM-FS are tailored to the normal, non-NFS use case. For decent performance in the NFS deployment, the amount of memory given to the metadata cache should be increased. By default, this is 16M. It can be increased, for instance, to 256M by setting `CVMFS_MEMCACHE_SIZE` to 256. Furthermore, the maximum number of download retries should be increased to at least 2. See https://cvmfs.readthedocs.io/en/2.13/cpt-configure.html#tuning.

It doesn't look like the headnode is reading whole files at all from CVMFS. Nevertheless, some cache space is needed for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, make it 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although it does not seem to be trashing.

kysrpex · 2026-02-03T14:14:03Z

Let's not merge until the storage situation is clarified (see https://github.com/usegalaxy-eu/issues/issues/157). A PR usegalaxy-eu/vgcn-infrastructure-playbook#58 for vgcn-infrastructure-playbook has been created too.

Use the new cvmfs08 share to store the CVMFS alien cache on NFS.

kysrpex · 2026-02-10T13:45:22Z

group_vars/cvmfs_clients.yml

+  CVMFS_CACHE_disk_QUOTA_LIMIT: 5000
+
+  CVMFS_CACHE_nfs_TYPE: posix
+  CVMFS_CACHE_nfs_ALIEN: /data/cvmfs08/cache/


I noticed this should be rewritten in terms of variables from the mounts repository.

kysrpex · 2026-02-10T14:17:29Z

We cannot merge this PR at the moment because the uid for the cvmfs user is different in the compute nodes.

kysrpex added 2 commits January 22, 2026 15:09

Use usegalaxy_eu.cvmfs_cache to set up CVMFS alien cache on sn07 an…

c22798f

…d sn09

kysrpex self-assigned this Jan 22, 2026

kysrpex added the enhancement label Jan 22, 2026

Setup CVMFS alien cache using galaxyproject.cvmfs

83d5213

Switch from the `usegalaxy_eu.cvmfs_cache` role to a version of the `galaxyproject.cvmfs` role supporting the configuration of arbitrary CVMFS parameters.

kysrpex mentioned this pull request Jan 29, 2026

Support configuring any CVMFS parameter galaxyproject/ansible-cvmfs#85

Merged

kysrpex marked this pull request as ready for review January 29, 2026 15:15

kysrpex commented Jan 29, 2026

View reviewed changes

kysrpex requested review from bgruening, gsaudade99 and mira-miracoli January 29, 2026 15:17

kysrpex commented Jan 29, 2026

View reviewed changes

gsaudade99 reviewed Jan 29, 2026

View reviewed changes

gsaudade99 and others added 2 commits January 29, 2026 16:56

Update main.yml

ebb4f0c

Remove role usegalaxy_eu.cvmfs_cache

673f620

gsaudade99 approved these changes Jan 29, 2026

View reviewed changes

bgruening reviewed Jan 29, 2026

View reviewed changes

roles/usegalaxy_eu.cvmfs_cache/tasks/main.yml Outdated Show resolved Hide resolved

bgruening reviewed Jan 29, 2026

View reviewed changes

roles/usegalaxy_eu.cvmfs_cache/tasks/main.yml Outdated Show resolved Hide resolved

bgruening reviewed Jan 29, 2026

View reviewed changes

hosts Outdated Show resolved Hide resolved

mira-miracoli reviewed Feb 2, 2026

View reviewed changes

kysrpex added 2 commits February 3, 2026 11:11

Remove sn07 from cvmfs_clients group

2dbeb2b

Remove role usegalaxy_eu.cvmfs_cache

765b591

The first attempt 673f620 was incomplete.

mira-miracoli approved these changes Feb 3, 2026

View reviewed changes

kysrpex added 2 commits February 3, 2026 14:22

kysrpex mentioned this pull request Feb 9, 2026

Set up CVMFS alien cache on NFS usegalaxy-eu/vgcn-infrastructure-playbook#58

Merged

Update path for CVMFS alien cache on NFS

8954203

Use the new cvmfs08 share to store the CVMFS alien cache on NFS.

kysrpex commented Feb 10, 2026

View reviewed changes

kysrpex closed this Feb 10, 2026

Conversation

kysrpex commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kysrpex commented Jan 22, 2026

Uh oh!

bgruening commented Jan 22, 2026

Uh oh!

kysrpex commented Jan 22, 2026

Uh oh!

bgruening commented Jan 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsaudade99 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mira-miracoli Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kysrpex Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mira-miracoli left a comment

Choose a reason for hiding this comment

Uh oh!

kysrpex commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kysrpex commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kysrpex commented Jan 22, 2026 •

edited

Loading

mira-miracoli Feb 2, 2026 •

edited

Loading

kysrpex Feb 3, 2026 •

edited

Loading

kysrpex commented Feb 3, 2026 •

edited

Loading