-
Notifications
You must be signed in to change notification settings - Fork 359
Description
Is your feature request related to a problem? Please describe.
When running FluentD or FluentBit on Kubernetes clusters with frequent pod disruptions (node drains, spot instance terminations, rolling updates), pods may be forcefully killed before they can flush buffered logs. The terminationGracePeriodSeconds is a standard Kubernetes pod configuration that controls how long Kubernetes waits before sending SIGKILL after SIGTERM, but the Logging Operator does not expose this field in the Logging CRD (FluentD section) or FluentbitAgent CRD. This results in data loss when pods are terminated before completing their shutdown procedures.
Describe the solution you'd like
Add support for the terminationGracePeriodSeconds field in both the Logging CRD's FluentD spec and the FluentbitAgent CRD spec. This would allow users to configure adequate grace periods for their logging infrastructure to ensure buffered logs are flushed before pod termination.
Proposed addition to both CRDs:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
spec:
fluentd:
terminationGracePeriodSeconds: 120 # new field
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
spec:
terminationGracePeriodSeconds: 120 # new fieldThis field would be passed to the underlying StatefulSet (for FluentD) and DaemonSet (for FluentBit) at spec.template.spec.terminationGracePeriodSeconds.
Describe alternatives you've considered
- Patching StatefulSet/DaemonSet manually: This is not maintainable and gets overwritten by the operator
- Using pod mutation webhooks: This adds unnecessary complexity and operational overhead
Additional context
This is particularly critical for:
- Kubernetes clusters with frequent node rotations
- High-throughput logging pipelines with large buffers
- Production environments where log delivery guarantees are important
The terminationGracePeriodSeconds is a standard Kubernetes field supported by all workload resources (Deployment, StatefulSet, DaemonSet) in their pod template specs. Exposing it through the operator CRDs would provide users with proper control over graceful shutdown behavior without requiring workarounds.