-
Notifications
You must be signed in to change notification settings - Fork 353
Description
Describe the bug:
Fluentd metric "fluentd_router_records_total" is missing worker_id label on the aggregated_metrics endpoint which leads to problems when scraped by prometheus:
{"time":"2025-09-24T09:53:43.708254288Z","level":"WARN","source":"scrape.go:1884","msg":"Error on ingesting samples with different value but same timestamp","component":"scrape manager","scrape_pool":"serviceMonitor/logging/logging-fluentd-metrics/0","target":{},"num_dropped":243}
Expected behaviour:
I would expect the "fluentd_router_records_total" metric to have a worker_id label as all other metrics.
fluentd_router_records_total{flow="@d15dc1e90fc1d60534f40b874312ada1",id="flow:foo-bar:logs"} 1.0
Steps to reproduce the bug:
Deploy latest logging-operator chart and enable fluentd serviceMonitor:
fluentd:
metrics:
prometheusRules: false
serviceMonitor: true
Excluding the metric by dropping it resolves the error thrown by prometheus.
Environment details:
- Kubernetes version (e.g. v1.15.2): v1.31.7+rke2r1
- logging-operator version (e.g. 2.1.1): 6.1.0
- Install method (e.g. helm or static manifests): helm chart
- Resource definition (possibly in YAML format) that caused the issue, without sensitive data:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
banzaicloud.com/last-applied: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
labels:
app.kubernetes.io/component: fluentd
app.kubernetes.io/managed-by: logging
app.kubernetes.io/name: fluentd
prometheus: kps
name: logging-fluentd-metrics
namespace: logging
spec:
endpoints:
- interval: 15s
path: /aggregated_metrics
port: http-metrics
scrapeTimeout: 5s
namespaceSelector:
matchNames:
- logging
sampleLimit: 0
selector:
matchLabels:
app.kubernetes.io/component: fluentd
app.kubernetes.io/managed-by: logging
app.kubernetes.io/name: fluentd
/kind bug