Skip to content

Concurrent DNF processes cause installation failures #2268

@Car-byte

Description

@Car-byte

Summary

When running the two commands dnf install -y package-x and dnf clean all concurrently it is possible for the dnf install command to fail with the signature [Errno 2] No such file or directory: '/var/cache/dnf/*/packages/*.rpm. This due to dnf clean all running after dnf install downloaded rpms and before dnf install tries to actually install the rpms. Specificly this block has gaps where the dnf install does not hold any locks, thereby allowing time for dnf clean all to run.

There was a similar issues report via https://bugzilla.redhat.com/show_bug.cgi?id=1714706, however it was closed with the response Extending the critical section would lead to worse user experience (DNF would wait for other operations to finish more frequently). While true, I strongly think this should be reconsidered as asking every user to essentially wrap their dnf install commands in retries or create their own concurrency controls around dnf is cumbersome. There are also some proposed solutions which would minimize impact to concurrency.

Proposed Solution

The critical section here should always hold at least 1 lock, this could be either 1) extending the life of the the download_lock.pid and the rpmdb_lock.pid so that the critical section always holds at least 1 lock. OR 2) add a new lock gpg_check_lock.pid which is grabbed between the download_lock.pid and the rpmdb_lock.pid, which still allows concurrency between downloading rpms, checking rpm signatures, and installing rpms, while disallowing dnf clean all from running (Preferred).

How to Reproduce

Shell 1: run while true; do dnf clean all; done
Shell 2: run a large transaction (100+ packages, the more packages the more likely to reproduce) dnf install -y nodejs-devel java-21-openjdk-devel rust golang ruby perl gcc g++

Versions Used

DNF version:

[root@1140c318a820 /]# dnf --version
4.22.0
  Installed: dnf-0:4.22.0-1.fc40.noarch at Sun Jan 19 05:47:54 2025
  Built    : Fedora Project at Mon Dec  2 23:49:06 2024

  Installed: rpm-0:4.19.1.1-1.fc40.aarch64 at Sun Jan 19 05:47:54 2025
  Built    : Fedora Project at Wed Feb  7 15:57:52 2024
[root@1140c318a820 /]# cat /etc/fedora-release
Fedora release 40 (Forty)

Full error:

(183/186): ruby-libs-3.3.8-19.fc40.aarch64.rpm                      5.4 MB/s | 4.0 MB     00:00
(184/186): systemtap-sdt-devel-5.2-1.fc40.aarch64.rpm               756 kB/s |  74 kB     00:00
(185/186): rust-1.86.0-1.fc40.aarch64.rpm                           7.7 MB/s |  25 MB     00:03
(186/186): rust-std-static-1.86.0-1.fc40.aarch64.rpm                5.6 MB/s |  38 MB     00:06
----------------------------------------------------------------------------------------------------
Total                                                               6.3 MB/s | 131 MB     00:20
[Errno 2] No such file or directory: '/var/cache/dnf/fedora-ffb33069eb0638b5/packages/perl-Module-Load-0.36-503.fc40.noarch.rpm'
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.

I would be happy to work on the fix for this if we can align that this is something that should be fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions