Skip to content

[Bug] Deadlocks when instrumenting tracing #5807

@lisasgoh

Description

@lisasgoh

Describe the bug

When instrumenting tracing in Firecracker, there are at least two sources of deadlocks using the default clippy-tracing command.

  1. Firecracker process hangs while starting up
    a. main_exec() in main.rs → LOGGER.update(config)
    b. Logger::update() at logging.rs → acquires LOGGER mutex → calls open_file_nonblock() when log-path is configured.
    c. open_file_nonblock() is instrumented → __Instrument::new() → log::trace!()
    d. Logger::log() tries to acquire LOGGER mutex → deadlock

  2. When resuming from snapshot:
    a. Main thread calls resume_vm()send_event() → sends Resume on channel, sets immediate_exit = 1 → sends RT signal to fc_vcpu
    b. fc_vcpu is in paused(), wakes up from recv(), receives Resume, checks immediate_exit = 1 and calls warn!() → Logger::log() → holds LOGGER mutex.
    c. RT signal arrives and interrupts fc_vcpu, the signal handler handle_signal is instrumented so it tries to acquire the LOGGER mutex as well but deadlocks.

To Reproduce

As above.

Expected behaviour

No deadlocks. I had to exclude utils/ and vpu.rs from the tracing instrumentation.

Environment

  • Firecracker version: 1.15.0
  • Host and guest kernel versions:
  • Rootfs used:
  • Architecture:
  • Any other relevant software versions:

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions