-
Notifications
You must be signed in to change notification settings - Fork 104
fix: Enforce cluster information on monitoring alerts #3929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Rael Garcia <[email protected]>
Signed-off-by: Rael Garcia <[email protected]>
Signed-off-by: Rael Garcia <[email protected]>
stevekuznetsov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
/test e2e-parallel |
Signed-off-by: Rael Garcia <[email protected]>
|
@sclarkso There was a legit error parsing the |
|
/test e2e-parallel |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: raelga, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
AROSLSRE-91
https://redhat-external.slack.com/archives/C075PHEFZKQ/p1769620814361439
This pull request updates Prometheus alerting rules and their tests to add cluster-awareness to all relevant alerts. The changes ensure that alerts are correctly grouped and fired on a per-cluster basis, improving accuracy and scalability in multi-cluster environments. Several PromQL expressions are updated to use
group by (cluster)and to join service health checks with cluster membership. Corresponding test files are also updated to reflect these changes, ensuring correct alert firing and label expectations.Prometheus Alert Rule Updates for Cluster Awareness:
group by (cluster)and/orunless on(cluster)to ensure alerts are evaluated and fired per cluster. This affects both the main rules and the generated Bicep templates. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]Test Updates for Cluster Labeling and Cluster-Aware Logic:
clusterlabel in input series and expected alert labels, ensuring that alerts are validated for correct cluster-specific behavior. [1] [2] [3] [4] [5] [6] [7] [8] [9]Improvements to Service Discovery and Alert Coverage:
group by (cluster) (up{job="kube-state-metrics", cluster=~".*-svc-\d+"}) unless on(cluster) .... Test coverage is expanded for various scenarios (metric missing, metric present, metric goes down). [1] [2]These changes collectively make alerting more robust and accurate in multi-cluster Kubernetes environments, and the updated tests ensure the new logic is thoroughly validated.