Set up CVMFS alien cache on NFS#1839
Conversation
Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level. To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache. > The alien cache directory [...] can be located anywhere including cluster and network file systems. If configured, all data chunks are stored there. CernVM-FS ensures atomic access to the cache directory. It is safe to have the alien directory shared by multiple CernVM-FS processes, and it is safe to unlink files from the alien cache directory anytime. The contents of files, however, must not be touched by third-party programs.
|
This PR is only meant to show how it's done (and it was used to test this configuration). Most likely it will be closed in favor of a new PR in galaxyproject/ansible-cvmfs and another new (and shorter) PR in this repository. |
|
It changes do not look very complex, we could also deploy it - collect some experience and then push it upstream. If you think that's easier. |
|
Unfortunately I noticed in the meantime that it's not ready for deployment. While I could not figure out yet what triggers the permission error when mounting via autofs. |
|
does |
Switch from the `usegalaxy_eu.cvmfs_cache` role to a version of the `galaxyproject.cvmfs` role supporting the configuration of arbitrary CVMFS parameters.
group_vars/cvmfs_clients.yml
Outdated
| CVMFS_CACHE_disk_QUOTA_LIMIT: 50000 | ||
|
|
||
| CVMFS_CACHE_nfs_TYPE: posix | ||
| CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/ |
There was a problem hiding this comment.
I pre-created an empty cache there using cvmfs2 __MK_ALIEN_CACHE__ /data/db/cvmfs_cache/ $(id -u cvmfs) $(id -g cvmfs), but we might might want to use A400, see this comment.
There was a problem hiding this comment.
I removed this cache using rm -rf /data/db/cvmfs_cache/.
group_vars/cvmfs_clients.yml
Outdated
| CVMFS_CACHE_cvmfs_LOWER_READONLY: no | ||
|
|
||
| CVMFS_CACHE_disk_TYPE: posix | ||
| CVMFS_CACHE_disk_QUOTA_LIMIT: 50000 |
There was a problem hiding this comment.
I think we could also afford to use ~100GB rather than 50GB given the new headnodes have quite large disks (although I am not sure the images have to be read at all).
Let's maybe leave it as it is and check how much does the cache grow.
There was a problem hiding this comment.
sn09 should not use much cvmfs right?
There was a problem hiding this comment.
This is what I also don't understand. Is the headnote writing to the cache? I think it should not.
There was a problem hiding this comment.
This is what I also don't understand. Is the headnote writing to the cache? I think it should not.
I am unsure about it, certainly it doesn't look like the headnode is reading whole files at all from CVMFS.
root@sn09:~$ strace -f -t -e trace=file -p 1963558 2>&1 | grep "/cvmfs"
[pid 1966037] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", <unfinished ...>
[pid 1966037] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966039] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966039] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966038] 13:53:42 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966038] 13:53:42 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
...
[pid 1966039] 14:16:05 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966038] 14:17:01 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966038] 14:17:02 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>
[pid 1966037] 14:17:15 stat("/cvmfs/singularity.galaxyproject.org/all/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1966037] 14:17:15 openat(AT_FDCWD, "/cvmfs/singularity.galaxyproject.org/all/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY <unfinished ...>Nevertheless, we still need some cache space for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, let's make it maybe 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although I guess it's not "trashing".
root@sn09:/home/centos$ du -h /var/lib/cvmfs/ | tail -n1
3.0G /var/lib/cvmfs/
gsaudade99
left a comment
There was a problem hiding this comment.
In general looks good to me. I understand that most of the variables are just there for documentation purposes.
My opinion is let the clients populate on the go even if it means a slower first starts. We have super fast NICs and IO, so the impact should be almost not noticeable...
group_vars/cvmfs_clients.yml
Outdated
| CVMFS_CACHE_cvmfs_LOWER_READONLY: no | ||
|
|
||
| CVMFS_CACHE_disk_TYPE: posix | ||
| CVMFS_CACHE_disk_QUOTA_LIMIT: 50000 |
There was a problem hiding this comment.
sn09 should not use much cvmfs right?
| CVMFS_CACHE_PRIMARY: cvmfs | ||
|
|
||
| CVMFS_CACHE_cvmfs_TYPE: tiered | ||
| CVMFS_CACHE_cvmfs_UPPER: disk |
There was a problem hiding this comment.
at this point sn09 could even use memory we have 300G of unused RAM
There was a problem hiding this comment.
I think this parameter only sets the name for the "Upper" cache manager instance
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#tiered-cache
From how I understand the docs without a plugin there is only posix.
However they recommend to set CVMFS_MEMCACHE_SIZE=256 (MB). Maybe we can also do that (or increase it even more)
There was a problem hiding this comment.
Given this comment, if at the moment there seem to be no issues every time a container job is launched let's leave the memory cache for a later stage if the need arises.
If CVMFS_MEMCACHE_SIZE is "independent" from the tiered cache system, then let's increase it. Although I am not sure it will have an effect even if "it works", since they only recommend it for pure NFS deployments.
| CVMFS_CACHE_nfs_TYPE: posix | ||
| CVMFS_CACHE_nfs_ALIEN: /data/db/cvmfs_cache/ | ||
| CVMFS_CACHE_nfs_SHARED: no | ||
| CVMFS_CACHE_nfs_QUOTA_LIMIT: -1 |
There was a problem hiding this comment.
If I understand correctly, we need to provide a share, that is >= all cvmfs repos that we mount?
There was a problem hiding this comment.
That's only needed if CVMFS_CACHE_nfs_SHARED is set to yes, i.e. if the nfs cache manager instance belonged to the shared cache.
However, an alien cache cannot join the shared cache (see docs),
Since the alien cache is unmanaged, there is no automatic quota management provided by CernVM-FS; the alien cache directory is ever-growing. The CVMFS_ALIEN_CACHE requires
CVMFS_QUOTA_LIMIT=-1andCVMFS_SHARED_CACHE=no.
probably because CVMFS lacks an adequate synchronization mechanism to manage the quota in a distributed system. That's also why a script to manage the size of this cache would be needed.
The first attempt 673f620 was incomplete.
mira-miracoli
left a comment
There was a problem hiding this comment.
I think we can try once we have the share, looks cool 🚀
The setting may either slightly improve performance or have no noticeable effect. But it won't hurt. > The default settings in CernVM-FS are tailored to the normal, non-NFS use case. For decent performance in the NFS deployment, the amount of memory given to the metadata cache should be increased. By default, this is 16M. It can be increased, for instance, to 256M by setting `CVMFS_MEMCACHE_SIZE` to 256. Furthermore, the maximum number of download retries should be increased to at least 2. See https://cvmfs.readthedocs.io/en/2.13/cpt-configure.html#tuning.
It doesn't look like the headnode is reading whole files at all from CVMFS. Nevertheless, some cache space is needed for the FS metadata (which is relatively large) but a few GB should suffice. The default is 4GB, make it 5GB to give the system slightly more room, as at the moment the headnode is using ~3GiB, although it does not seem to be trashing.
|
Let's not merge until the storage situation is clarified (see https://github.com/usegalaxy-eu/issues/issues/157). A PR usegalaxy-eu/vgcn-infrastructure-playbook#58 for vgcn-infrastructure-playbook has been created too. |
Use the new cvmfs08 share to store the CVMFS alien cache on NFS.
| CVMFS_CACHE_disk_QUOTA_LIMIT: 5000 | ||
|
|
||
| CVMFS_CACHE_nfs_TYPE: posix | ||
| CVMFS_CACHE_nfs_ALIEN: /data/cvmfs08/cache/ |
There was a problem hiding this comment.
I noticed this should be rewritten in terms of variables from the mounts repository.
|
We cannot merge this PR at the moment because the uid for the |
Configure the CVMFS client to use a tiered two-level cache, with the regular CVMFS disk cache as upper level and an alien cache on NFS as the lower level.
To retrieve a chunk, the CVMFS client will first attempt to find it in the disk cache; if there's a cache miss, it will look for it in the NFS cache. The CVMFS clients are responsible for filling the NFS cache (that means to completely dump CVMFS one has to walk
/cvmfs/singularity.galaxyproject.org/).Requires galaxyproject/ansible-cvmfs#85 and re-syncing our fork of
galaxyproject.cvmfs.