Skip to content

Allow configuring terminationGracePeriodSeconds #2180

@hanpeter

Description

@hanpeter

Is your feature request related to a problem? Please describe.

When running FluentD or FluentBit on Kubernetes clusters with frequent pod disruptions (node drains, spot instance terminations, rolling updates), pods may be forcefully killed before they can flush buffered logs. The terminationGracePeriodSeconds is a standard Kubernetes pod configuration that controls how long Kubernetes waits before sending SIGKILL after SIGTERM, but the Logging Operator does not expose this field in the Logging CRD (FluentD section) or FluentbitAgent CRD. This results in data loss when pods are terminated before completing their shutdown procedures.

Describe the solution you'd like

Add support for the terminationGracePeriodSeconds field in both the Logging CRD's FluentD spec and the FluentbitAgent CRD spec. This would allow users to configure adequate grace periods for their logging infrastructure to ensure buffered logs are flushed before pod termination.

Proposed addition to both CRDs:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
spec:
  fluentd:
    terminationGracePeriodSeconds: 120  # new field
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
spec:
  terminationGracePeriodSeconds: 120  # new field

This field would be passed to the underlying StatefulSet (for FluentD) and DaemonSet (for FluentBit) at spec.template.spec.terminationGracePeriodSeconds.

Describe alternatives you've considered

  • Patching StatefulSet/DaemonSet manually: This is not maintainable and gets overwritten by the operator
  • Using pod mutation webhooks: This adds unnecessary complexity and operational overhead

Additional context

This is particularly critical for:

  • Kubernetes clusters with frequent node rotations
  • High-throughput logging pipelines with large buffers
  • Production environments where log delivery guarantees are important

The terminationGracePeriodSeconds is a standard Kubernetes field supported by all workload resources (Deployment, StatefulSet, DaemonSet) in their pod template specs. Exposing it through the operator CRDs would provide users with proper control over graceful shutdown behavior without requiring workarounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions