Skip to content

MicroVM hangs during SMP boot with ≥16 vCPUs on dual-socket NUMA hosts #5744

@daGoattttt-mehhh

Description

@daGoattttt-mehhh

Summary:

When starting a microVM with a higher vCPU count (e.g., 16 vCPUs) on a dual-socket NUMA host, the guest occasionally hangs during SMP initialization.

During the hang, some Firecracker vCPU threads remain blocked in futex_wait_queue, and the guest kernel does not complete bringing up all secondary CPUs.

The issue occurs randomly during VM startup and has been observed when running Firecracker via jailer.

Firecracker Version:
Firecracker v1.13.1

Environment:
host kernel version:6.1.23

guest kernel version:5.4.116(The issue persists with 5.10.245 also ,when tried)

Architecture:             x86_64
CPU op-mode(s):         32-bit, 64-bit
Address sizes:          46 bits physical, 48 bits virtual
Byte Order:             Little Endian
CPU(s):                   32
On-line CPU(s) list:    0-31
Vendor ID:                GenuineIntel
Model name:             Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
CPU family:           6
Model:                85
Thread(s) per core:   2
Core(s) per socket:   8
Socket(s):            2
Stepping:             4
BogoMIPS:             4200.00


Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    512 KiB (16 instances)
  L1i:                    512 KiB (16 instances)
  L2:                     16 MiB (16 instances)
  L3:                     22 MiB (2 instances)
NUMA:                     
  NUMA node(s):           2
  NUMA node0 CPU(s):      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
  NUMA node1 CPU(s):      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Vulnerabilities:          
  Gather data sampling:   Mitigation; Microcode
  Itlb multihit:          KVM: Mitigation: Split huge pages
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected;   BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Mitigation; Clear CPU buffers; SMT vulnerable

VM Configuration:
"machine-config": {
"vcpu_count": 16,
"mem_size_mib": 72817,
"smt": false
}

Steps to Reproduce:
Run Firecracker using jailer on a dual-socket NUMA host.

Configure the microVM with 16 vCPUs.

Boot a Linux guest kernel.

Observe that VM startup occasionally hangs during SMP initialization.

The issue does not occur consistently, but appears randomly when starting the VM.

Guest Kernel Logs:
During the failure the guest kernel stops while bringing up secondary CPUs:

[    1.745873] smp: Bringing up secondary CPUs ...
[    1.746812] x86: Booting SMP configuration:

The boot process does not proceed further.

Firecracker Thread State:

During the hang, Firecracker shows all vCPU threads created:

   PID    SPID TTY          TIME CMD
2358731 2358731 pts/4    00:00:01 firecracker
2358731 2358743 pts/4    00:00:01 fc_vcpu 0
2358731 2358744 pts/4    00:00:00 fc_vcpu 1
2358731 2358745 pts/4    00:00:00 fc_vcpu 2
2358731 2358746 pts/4    00:00:00 fc_vcpu 3
2358731 2358747 pts/4    00:00:00 fc_vcpu 4
2358731 2358748 pts/4    00:00:00 fc_vcpu 5
2358731 2358749 pts/4    00:00:00 fc_vcpu 6
2358731 2358750 pts/4    00:00:00 fc_vcpu 7
2358731 2358751 pts/4    00:00:00 fc_vcpu 8
2358731 2358752 pts/4    00:00:00 fc_vcpu 9
2358731 2358753 pts/4    00:00:00 fc_vcpu 10
2358731 2358754 pts/4    00:00:00 fc_vcpu 11
2358731 2358755 pts/4    00:00:00 fc_vcpu 12
2358731 2358756 pts/4    00:00:00 fc_vcpu 13
2358731 2358757 pts/4    00:00:00 fc_vcpu 14
2358731 2358758 pts/4    00:00:00 fc_vcpu 15

Stuck vCPU Thread Stack:

Inspecting a stuck vCPU thread shows it blocked in a futex wait

[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Expected Behavior

The microVM should boot normally and the guest OS should complete SMP initialization and start executing the init process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: Awaiting authorIndicates that an issue or pull request requires author action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions