Summary
When running the two commands dnf install -y package-x and dnf clean all concurrently it is possible for the dnf install command to fail with the signature [Errno 2] No such file or directory: '/var/cache/dnf/*/packages/*.rpm. This due to dnf clean all running after dnf install downloaded rpms and before dnf install tries to actually install the rpms. Specificly this block has gaps where the dnf install does not hold any locks, thereby allowing time for dnf clean all to run.
There was a similar issues report via https://bugzilla.redhat.com/show_bug.cgi?id=1714706, however it was closed with the response Extending the critical section would lead to worse user experience (DNF would wait for other operations to finish more frequently). While true, I strongly think this should be reconsidered as asking every user to essentially wrap their dnf install commands in retries or create their own concurrency controls around dnf is cumbersome. There are also some proposed solutions which would minimize impact to concurrency.
Proposed Solution
The critical section here should always hold at least 1 lock, this could be either 1) extending the life of the the download_lock.pid and the rpmdb_lock.pid so that the critical section always holds at least 1 lock. OR 2) add a new lock gpg_check_lock.pid which is grabbed between the download_lock.pid and the rpmdb_lock.pid, which still allows concurrency between downloading rpms, checking rpm signatures, and installing rpms, while disallowing dnf clean all from running (Preferred).
How to Reproduce
Shell 1: run while true; do dnf clean all; done
Shell 2: run a large transaction (100+ packages, the more packages the more likely to reproduce) dnf install -y nodejs-devel java-21-openjdk-devel rust golang ruby perl gcc g++
Versions Used
DNF version:
[root@1140c318a820 /]# dnf --version
4.22.0
Installed: dnf-0:4.22.0-1.fc40.noarch at Sun Jan 19 05:47:54 2025
Built : Fedora Project at Mon Dec 2 23:49:06 2024
Installed: rpm-0:4.19.1.1-1.fc40.aarch64 at Sun Jan 19 05:47:54 2025
Built : Fedora Project at Wed Feb 7 15:57:52 2024
[root@1140c318a820 /]# cat /etc/fedora-release
Fedora release 40 (Forty)
Full error:
(183/186): ruby-libs-3.3.8-19.fc40.aarch64.rpm 5.4 MB/s | 4.0 MB 00:00
(184/186): systemtap-sdt-devel-5.2-1.fc40.aarch64.rpm 756 kB/s | 74 kB 00:00
(185/186): rust-1.86.0-1.fc40.aarch64.rpm 7.7 MB/s | 25 MB 00:03
(186/186): rust-std-static-1.86.0-1.fc40.aarch64.rpm 5.6 MB/s | 38 MB 00:06
----------------------------------------------------------------------------------------------------
Total 6.3 MB/s | 131 MB 00:20
[Errno 2] No such file or directory: '/var/cache/dnf/fedora-ffb33069eb0638b5/packages/perl-Module-Load-0.36-503.fc40.noarch.rpm'
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
I would be happy to work on the fix for this if we can align that this is something that should be fixed.
Summary
When running the two commands
dnf install -y package-xanddnf clean allconcurrently it is possible for thednf installcommand to fail with the signature[Errno 2] No such file or directory: '/var/cache/dnf/*/packages/*.rpm. This due todnf clean allrunning afterdnf installdownloaded rpms and beforednf installtries to actually install the rpms. Specificly this block has gaps where thednf installdoes not hold any locks, thereby allowing time fordnf clean allto run.There was a similar issues report via https://bugzilla.redhat.com/show_bug.cgi?id=1714706, however it was closed with the response
Extending the critical section would lead to worse user experience (DNF would wait for other operations to finish more frequently).While true, I strongly think this should be reconsidered as asking every user to essentially wrap theirdnf installcommands in retries or create their own concurrency controls around dnf is cumbersome. There are also some proposed solutions which would minimize impact to concurrency.Proposed Solution
The critical section here should always hold at least 1 lock, this could be either 1) extending the life of the the
download_lock.pidand therpmdb_lock.pidso that the critical section always holds at least 1 lock. OR 2) add a new lockgpg_check_lock.pidwhich is grabbed between thedownload_lock.pidand therpmdb_lock.pid, which still allows concurrency between downloading rpms, checking rpm signatures, and installing rpms, while disallowingdnf clean allfrom running (Preferred).How to Reproduce
Shell 1: run
while true; do dnf clean all; doneShell 2: run a large transaction (100+ packages, the more packages the more likely to reproduce)
dnf install -y nodejs-devel java-21-openjdk-devel rust golang ruby perl gcc g++Versions Used
DNF version:
Full error:
I would be happy to work on the fix for this if we can align that this is something that should be fixed.