Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions grafana-formula/grafana/files/containers/grafana.container
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[Unit]
Description=Grafana Container

[Container]
Label=app=grafana
ContainerName=grafana
Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/grafana:9.5.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the other images, it needs to be configurable and for suse manager it needs to come from suse.registry.com with proper authentication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Volume=/etc/grafana:/etc/grafana:ro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means the grafana config files must exist on the Host OS will be mounted as read-only on the container? To adapt the configuration users should change the config files on the host OS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the formula takes care of providing the config files.
And yes, users can modify these using the formula or manually.

Volume=grafana.volume:/var/lib/grafana
PublishPort=3000:3000

[Install]
WantedBy=multi-user.target default.target
5 changes: 5 additions & 0 deletions grafana-formula/grafana/files/containers/grafana.volume
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[Unit]
Description=Grafana Container Volume

[Volume]
Label=app=grafana
2 changes: 1 addition & 1 deletion grafana-formula/grafana/files/datasources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ datasources:
- name: {{ name }}
type: prometheus
access: proxy
url: {{ datasource.url }}
url: {{ datasource.url|replace('localhost', grains['fqdn']) }}
basicAuth: {{ basic_auth_enabled }}
isDefault: {{ loop.first }}
editable: true
Expand Down
58 changes: 57 additions & 1 deletion grafana-formula/grafana/init.sls
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# check for supported os version
{%- set supported_vers = ['42.3', '12.3', '12.4', '12.5', '15.0', '15.1', '15.2', '15.3', '15.4', '15.5'] %}
{%- set supported_vers = ['42.3', '12.3', '12.4', '12.5', '15.0', '15.1', '15.2', '15.3', '15.4', '15.5', '15.6'] %}

# check if supported
{%- if (grains['os_family'] == 'Suse' and grains['osrelease'] in supported_vers) %}
Expand All @@ -19,13 +19,31 @@
{%- else %}
{% set product_name = 'SUSE Manager' %}
{%- endif %}

{% set podman_version = salt['pkg.latest_version']('podman') %}
{% if not podman_version %}
{% set podman_version = salt['pkg.version']('podman') %}
{% endif %}
{% set use_podman = salt['pkg.version_cmp'](podman_version, '4.4.0') >= 0 %}

{% if use_podman %}
install_podman_for_grafana:
pkg.installed:
- name: podman

uninstall_grafana_package:
pkg.removed:
- name: grafana
{% endif %}

# setup and enable service
/etc/grafana/grafana.ini:
file.managed:
- source: salt://grafana/files/grafana.ini
- makedirs: True
- template: jinja


/etc/grafana/provisioning/datasources/datasources.yml:
file.managed:
- source: salt://grafana/files/datasources.yml
Expand Down Expand Up @@ -136,6 +154,29 @@ grafana-sap-netweaver-dashboards:
pkg.removed
{%- endif %}

{% if use_podman %}
grafana-container:
file.managed:
- names:
- /etc/containers/systemd/grafana.container:
- source: salt://grafana/files/containers/grafana.container
- /etc/containers/systemd/grafana.volume:
- source: salt://grafana/files/containers/grafana.volume
- user: root
- group: root
- mode: 644
module.run:
- name: service.systemctl_reload
service.running:
- name: grafana
- enable: true
- watch:
- file: /etc/containers/systemd/grafana.*
- file: /etc/grafana/provisioning/datasources/datasources.yml
Comment on lines +168 to +175
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service state/execution module calls won't work on SLE Micro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the limitations here? I read in the documentation that Podman integrates with systemd on SLE Micro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.apply is executed inside a transaction (think: transactional-update run salt-call state.apply ...) and there is no dbus access inside the transaction. At least not as of today, but enabling it also comes with problems...

We have a card to enable SUMA to avoid the transactional-update wrapping, but it's not picked up yet.

Just to be clear, systemd and podman work together. Controlling that with Salt won't work when targeting transactional systems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's a good point.

- file: /etc/grafana/provisioning/dashboards/*
- file: /etc/grafana/grafana.ini

{% else %}
grafana-server:
pkg.installed:
- names:
Expand All @@ -146,11 +187,26 @@ grafana-server:
- file: /etc/grafana/provisioning/datasources/datasources.yml
- file: /etc/grafana/provisioning/dashboards/*
- file: /etc/grafana/grafana.ini
{% endif %}

{%- else %}
# disable service
{% if use_podman %}
grafana-container:
service.dead:
- name: grafana
- enable: false
file.absent:
- names:
- /etc/containers/systemd/grafana.container
- /etc/containers/systemd/grafana.volume
module.run:
- name: service.systemctl_reload
Comment on lines +196 to +204
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service state/execution module calls won't work on SLE Micro.


{% else %}
grafana-server:
service.dead:
- enable: False
{% endif %}
{%- endif %}
{%- endif %}
7 changes: 7 additions & 0 deletions prometheus-formula/metadata/form.yml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,13 @@ prometheus:
$name: Enable local Alertmanager service
$help: Install and start local Alertmanager without clustering

alertmanager_config:
$name: Alertmanager configuration
$type: text
$default: /etc/prometheus/alertmanager.yml
$help: Please refer to the documentation for available options.
$visible: this.parent.value.alertmanager_service == true

use_local_alertmanager:
$type: boolean
$name: Use local Alertmanager
Expand Down
126 changes: 126 additions & 0 deletions prometheus-formula/prometheus/files/alertmanager.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Sample configuration.
# See https://prometheus.io/docs/alerting/configuration/ for documentation.

global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: "localhost:25"
smtp_from: "alertmanager@example.org"

# The root route on which each incoming alert enters.
route:
# The root route must not have any matchers as it is the entry point for
# all alerts. It needs to have a receiver configured so alerts that do not
# match any of the sub-routes are sent to someone.
receiver: "team-X-mails"

# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
#
# To aggregate by all possible labels use '...' as the sole label name.
# This effectively disables aggregation entirely, passing through all
# alerts as-is. This is unlikely to be what you want, unless you have
# a very low alert volume or your upstream notification system performs
# its own grouping. Example: group_by: [...]
group_by: ["alertname", "cluster"]

# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 30s

# When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m

# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 3h

# All the above attributes are inherited by all child routes and can
# overwritten on each.

# The child route trees.
routes:
# This routes performs a regular expression match on alert labels to
# catch alerts that are related to a list of services.
- match_re:
service: ^(foo1|foo2|baz)$
receiver: team-X-mails

# The service has a sub-route for critical alerts, any alerts
# that do not match, i.e. severity != critical, fall-back to the
# parent node and are sent to 'team-X-mails'
routes:
- match:
severity: critical
receiver: team-X-pager

- match:
service: files
receiver: team-Y-mails

routes:
- match:
severity: critical
receiver: team-Y-pager

# This route handles all alerts coming from a database service. If there's
# no team to handle it, it defaults to the DB team.
- match:
service: database

receiver: team-DB-pager
# Also group alerts by affected database.
group_by: [alertname, cluster, database]

routes:
- match:
owner: team-X
receiver: team-X-pager

- match:
owner: team-Y
receiver: team-Y-pager

# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
severity: "critical"
target_match:
severity: "warning"
# Apply inhibition if the alertname is the same.
# CAUTION:
# If all label names listed in `equal` are missing
# from both the source and target alerts,
# the inhibition rule will apply!
equal: ["alertname"]

receivers:
- name: "team-X-mails"
email_configs:
- to: "team-X+alerts@example.org, team-Y+alerts@example.org"

- name: "team-X-pager"
email_configs:
- to: "team-X+alerts-critical@example.org"
pagerduty_configs:
- routing_key: <team-X-key>

- name: "team-Y-mails"
email_configs:
- to: "team-Y+alerts@example.org"

- name: "team-Y-pager"
pagerduty_configs:
- routing_key: <team-Y-key>

- name: "team-DB-pager"
pagerduty_configs:
- routing_key: <team-DB-key>

51 changes: 51 additions & 0 deletions prometheus-formula/prometheus/files/blackbox.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
grpc:
prober: grpc
grpc:
tls: true
preferred_ip_protocol: "ip4"
grpc_plain:
prober: grpc
grpc:
tls: false
service: "service1"
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ttl5:
prober: icmp
timeout: 5s
icmp:
ttl: 5
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{%- set tls_enabled = salt['pillar.get']('prometheus:tls:enabled', False) %}
{% set config = salt['pillar.get']('prometheus:alerting:alertmanager_config') %}
{%- set entrypoint = ['/usr/bin/prometheus-alertmanager'] %}
{%- if config %}
{% do entrypoint.append('--config.file=' ~ config) %}
{% endif -%}
{%- if tls_enabled %}
{% do entrypoint.append('--web.config.file=' ~ web_config_file) %}
{%- endif -%}

[Unit]
Description=Alertmanager Container

[Container]
Label=app=alertmanager
ContainerName=alertmanager
Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/alertmanager:0.26.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want images from opensuse registry for SUMA as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the point I wanted to discuss. For now I have just hard-coded them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that monitoring images are not being published to registry.suse.com. Please let me know if we want to have them there for our purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@admd @rjmateus probably for you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view we cannot have opensuse images delivered for SUMA. We have a special SKU so we must have SUSE delivered images through registry.suse.com with special access control matching the needed SKU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! We must use the images through registry.suse.com

Volume=/etc/prometheus:/etc/prometheus:ro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as before, it will be set on the host os, and mount as read-only on the container, right?

Volume=alertmanager.volume:/var/lib/prometheus/alertmanager
{% if entrypoint|length > 1 -%}
PodmanArgs=--entrypoint '{{ entrypoint|tojson }}'
{%- endif %}
PublishPort=9093:9093

[Install]
WantedBy=multi-user.target default.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[Unit]
Description=Alertmanager Container Volume

[Volume]
Label=app=alertmanager
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{%- set tls_enabled = salt['pillar.get']('prometheus:tls:enabled', False) %}
{%- set args = salt['pillar.get']('prometheus:blackbox_exporter:args').split(' ') %}
{%- set entrypoint = ['/usr/bin/blackbox_exporter'] %}
{%- if args %}
{%- do entrypoint.extend(args) %}
{%- endif -%}
{%- if tls_enabled %}
{%- do entrypoint.append('--web.config.file=' ~ web_config_file) %}
{%- endif -%}

[Unit]
Description=Blackbox Exporter Container

[Container]
Label=app=blackbox_exporter
ContainerName=blackbox_exporter
Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/blackbox_exporter:0.24.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about the image coming from registry.suse.com

Volume=/etc/prometheus:/etc/prometheus:ro
{% if entrypoint|length > 1 -%}
PodmanArgs=--entrypoint '{{ entrypoint|tojson }}'
{%- endif %}
PublishPort=9115:9115

[Install]
WantedBy=multi-user.target default.target
Loading