-
Notifications
You must be signed in to change notification settings - Fork 24
Add containerized Prometheus/Grafana stack deployment #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: containers
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| [Unit] | ||
| Description=Grafana Container | ||
|
|
||
| [Container] | ||
| Label=app=grafana | ||
| ContainerName=grafana | ||
| Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/grafana:9.5.8 | ||
| Volume=/etc/grafana:/etc/grafana:ro | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this means the grafana config files must exist on the Host OS will be mounted as read-only on the container? To adapt the configuration users should change the config files on the host OS?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the formula takes care of providing the config files. |
||
| Volume=grafana.volume:/var/lib/grafana | ||
| PublishPort=3000:3000 | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target default.target | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| [Unit] | ||
| Description=Grafana Container Volume | ||
|
|
||
| [Volume] | ||
| Label=app=grafana |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| # check for supported os version | ||
| {%- set supported_vers = ['42.3', '12.3', '12.4', '12.5', '15.0', '15.1', '15.2', '15.3', '15.4', '15.5'] %} | ||
| {%- set supported_vers = ['42.3', '12.3', '12.4', '12.5', '15.0', '15.1', '15.2', '15.3', '15.4', '15.5', '15.6'] %} | ||
|
|
||
| # check if supported | ||
| {%- if (grains['os_family'] == 'Suse' and grains['osrelease'] in supported_vers) %} | ||
|
|
@@ -19,13 +19,31 @@ | |
| {%- else %} | ||
| {% set product_name = 'SUSE Manager' %} | ||
| {%- endif %} | ||
|
|
||
| {% set podman_version = salt['pkg.latest_version']('podman') %} | ||
| {% if not podman_version %} | ||
| {% set podman_version = salt['pkg.version']('podman') %} | ||
| {% endif %} | ||
| {% set use_podman = salt['pkg.version_cmp'](podman_version, '4.4.0') >= 0 %} | ||
|
|
||
| {% if use_podman %} | ||
| install_podman_for_grafana: | ||
| pkg.installed: | ||
| - name: podman | ||
|
|
||
| uninstall_grafana_package: | ||
| pkg.removed: | ||
| - name: grafana | ||
| {% endif %} | ||
|
|
||
| # setup and enable service | ||
| /etc/grafana/grafana.ini: | ||
| file.managed: | ||
| - source: salt://grafana/files/grafana.ini | ||
| - makedirs: True | ||
| - template: jinja | ||
|
|
||
|
|
||
| /etc/grafana/provisioning/datasources/datasources.yml: | ||
| file.managed: | ||
| - source: salt://grafana/files/datasources.yml | ||
|
|
@@ -136,6 +154,29 @@ grafana-sap-netweaver-dashboards: | |
| pkg.removed | ||
| {%- endif %} | ||
|
|
||
| {% if use_podman %} | ||
| grafana-container: | ||
| file.managed: | ||
| - names: | ||
| - /etc/containers/systemd/grafana.container: | ||
| - source: salt://grafana/files/containers/grafana.container | ||
| - /etc/containers/systemd/grafana.volume: | ||
| - source: salt://grafana/files/containers/grafana.volume | ||
| - user: root | ||
| - group: root | ||
| - mode: 644 | ||
| module.run: | ||
| - name: service.systemctl_reload | ||
| service.running: | ||
| - name: grafana | ||
| - enable: true | ||
| - watch: | ||
| - file: /etc/containers/systemd/grafana.* | ||
| - file: /etc/grafana/provisioning/datasources/datasources.yml | ||
|
Comment on lines
+168
to
+175
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The service state/execution module calls won't work on SLE Micro.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the limitations here? I read in the documentation that Podman integrates with systemd on SLE Micro.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We have a card to enable SUMA to avoid the Just to be clear, systemd and podman work together. Controlling that with Salt won't work when targeting transactional systems.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. That's a good point. |
||
| - file: /etc/grafana/provisioning/dashboards/* | ||
| - file: /etc/grafana/grafana.ini | ||
|
|
||
| {% else %} | ||
| grafana-server: | ||
| pkg.installed: | ||
| - names: | ||
|
|
@@ -146,11 +187,26 @@ grafana-server: | |
| - file: /etc/grafana/provisioning/datasources/datasources.yml | ||
| - file: /etc/grafana/provisioning/dashboards/* | ||
| - file: /etc/grafana/grafana.ini | ||
| {% endif %} | ||
|
|
||
| {%- else %} | ||
| # disable service | ||
| {% if use_podman %} | ||
| grafana-container: | ||
| service.dead: | ||
| - name: grafana | ||
| - enable: false | ||
| file.absent: | ||
| - names: | ||
| - /etc/containers/systemd/grafana.container | ||
| - /etc/containers/systemd/grafana.volume | ||
| module.run: | ||
| - name: service.systemctl_reload | ||
|
Comment on lines
+196
to
+204
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The service state/execution module calls won't work on SLE Micro. |
||
|
|
||
| {% else %} | ||
| grafana-server: | ||
| service.dead: | ||
| - enable: False | ||
| {% endif %} | ||
| {%- endif %} | ||
| {%- endif %} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # Sample configuration. | ||
| # See https://prometheus.io/docs/alerting/configuration/ for documentation. | ||
|
|
||
| global: | ||
| # The smarthost and SMTP sender used for mail notifications. | ||
| smtp_smarthost: "localhost:25" | ||
| smtp_from: "alertmanager@example.org" | ||
|
|
||
| # The root route on which each incoming alert enters. | ||
| route: | ||
| # The root route must not have any matchers as it is the entry point for | ||
| # all alerts. It needs to have a receiver configured so alerts that do not | ||
| # match any of the sub-routes are sent to someone. | ||
| receiver: "team-X-mails" | ||
|
|
||
| # The labels by which incoming alerts are grouped together. For example, | ||
| # multiple alerts coming in for cluster=A and alertname=LatencyHigh would | ||
| # be batched into a single group. | ||
| # | ||
| # To aggregate by all possible labels use '...' as the sole label name. | ||
| # This effectively disables aggregation entirely, passing through all | ||
| # alerts as-is. This is unlikely to be what you want, unless you have | ||
| # a very low alert volume or your upstream notification system performs | ||
| # its own grouping. Example: group_by: [...] | ||
| group_by: ["alertname", "cluster"] | ||
|
|
||
| # When a new group of alerts is created by an incoming alert, wait at | ||
| # least 'group_wait' to send the initial notification. | ||
| # This way ensures that you get multiple alerts for the same group that start | ||
| # firing shortly after another are batched together on the first | ||
| # notification. | ||
| group_wait: 30s | ||
|
|
||
| # When the first notification was sent, wait 'group_interval' to send a batch | ||
| # of new alerts that started firing for that group. | ||
| group_interval: 5m | ||
|
|
||
| # If an alert has successfully been sent, wait 'repeat_interval' to | ||
| # resend them. | ||
| repeat_interval: 3h | ||
|
|
||
| # All the above attributes are inherited by all child routes and can | ||
| # overwritten on each. | ||
|
|
||
| # The child route trees. | ||
| routes: | ||
| # This routes performs a regular expression match on alert labels to | ||
| # catch alerts that are related to a list of services. | ||
| - match_re: | ||
| service: ^(foo1|foo2|baz)$ | ||
| receiver: team-X-mails | ||
|
|
||
| # The service has a sub-route for critical alerts, any alerts | ||
| # that do not match, i.e. severity != critical, fall-back to the | ||
| # parent node and are sent to 'team-X-mails' | ||
| routes: | ||
| - match: | ||
| severity: critical | ||
| receiver: team-X-pager | ||
|
|
||
| - match: | ||
| service: files | ||
| receiver: team-Y-mails | ||
|
|
||
| routes: | ||
| - match: | ||
| severity: critical | ||
| receiver: team-Y-pager | ||
|
|
||
| # This route handles all alerts coming from a database service. If there's | ||
| # no team to handle it, it defaults to the DB team. | ||
| - match: | ||
| service: database | ||
|
|
||
| receiver: team-DB-pager | ||
| # Also group alerts by affected database. | ||
| group_by: [alertname, cluster, database] | ||
|
|
||
| routes: | ||
| - match: | ||
| owner: team-X | ||
| receiver: team-X-pager | ||
|
|
||
| - match: | ||
| owner: team-Y | ||
| receiver: team-Y-pager | ||
|
|
||
| # Inhibition rules allow to mute a set of alerts given that another alert is | ||
| # firing. | ||
| # We use this to mute any warning-level notifications if the same alert is | ||
| # already critical. | ||
| inhibit_rules: | ||
| - source_match: | ||
| severity: "critical" | ||
| target_match: | ||
| severity: "warning" | ||
| # Apply inhibition if the alertname is the same. | ||
| # CAUTION: | ||
| # If all label names listed in `equal` are missing | ||
| # from both the source and target alerts, | ||
| # the inhibition rule will apply! | ||
| equal: ["alertname"] | ||
|
|
||
| receivers: | ||
| - name: "team-X-mails" | ||
| email_configs: | ||
| - to: "team-X+alerts@example.org, team-Y+alerts@example.org" | ||
|
|
||
| - name: "team-X-pager" | ||
| email_configs: | ||
| - to: "team-X+alerts-critical@example.org" | ||
| pagerduty_configs: | ||
| - routing_key: <team-X-key> | ||
|
|
||
| - name: "team-Y-mails" | ||
| email_configs: | ||
| - to: "team-Y+alerts@example.org" | ||
|
|
||
| - name: "team-Y-pager" | ||
| pagerduty_configs: | ||
| - routing_key: <team-Y-key> | ||
|
|
||
| - name: "team-DB-pager" | ||
| pagerduty_configs: | ||
| - routing_key: <team-DB-key> | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| modules: | ||
| http_2xx: | ||
| prober: http | ||
| http: | ||
| preferred_ip_protocol: "ip4" | ||
| http_post_2xx: | ||
| prober: http | ||
| http: | ||
| method: POST | ||
| tcp_connect: | ||
| prober: tcp | ||
| pop3s_banner: | ||
| prober: tcp | ||
| tcp: | ||
| query_response: | ||
| - expect: "^+OK" | ||
| tls: true | ||
| tls_config: | ||
| insecure_skip_verify: false | ||
| grpc: | ||
| prober: grpc | ||
| grpc: | ||
| tls: true | ||
| preferred_ip_protocol: "ip4" | ||
| grpc_plain: | ||
| prober: grpc | ||
| grpc: | ||
| tls: false | ||
| service: "service1" | ||
| ssh_banner: | ||
| prober: tcp | ||
| tcp: | ||
| query_response: | ||
| - expect: "^SSH-2.0-" | ||
| - send: "SSH-2.0-blackbox-ssh-check" | ||
| irc_banner: | ||
| prober: tcp | ||
| tcp: | ||
| query_response: | ||
| - send: "NICK prober" | ||
| - send: "USER prober prober prober :prober" | ||
| - expect: "PING :([^ ]+)" | ||
| send: "PONG ${1}" | ||
| - expect: "^:[^ ]+ 001" | ||
| icmp: | ||
| prober: icmp | ||
| icmp_ttl5: | ||
| prober: icmp | ||
| timeout: 5s | ||
| icmp: | ||
| ttl: 5 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| {%- set tls_enabled = salt['pillar.get']('prometheus:tls:enabled', False) %} | ||
| {% set config = salt['pillar.get']('prometheus:alerting:alertmanager_config') %} | ||
| {%- set entrypoint = ['/usr/bin/prometheus-alertmanager'] %} | ||
| {%- if config %} | ||
| {% do entrypoint.append('--config.file=' ~ config) %} | ||
| {% endif -%} | ||
| {%- if tls_enabled %} | ||
| {% do entrypoint.append('--web.config.file=' ~ web_config_file) %} | ||
| {%- endif -%} | ||
|
|
||
| [Unit] | ||
| Description=Alertmanager Container | ||
|
|
||
| [Container] | ||
| Label=app=alertmanager | ||
| ContainerName=alertmanager | ||
| Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/alertmanager:0.26.0 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want images from opensuse registry for SUMA as well?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is the point I wanted to discuss. For now I have just hard-coded them here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked that monitoring images are not being published to registry.suse.com. Please let me know if we want to have them there for our purposes.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my point of view we cannot have opensuse images delivered for SUMA. We have a special SKU so we must have SUSE delivered images through registry.suse.com with special access control matching the needed SKU.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct! We must use the images through registry.suse.com |
||
| Volume=/etc/prometheus:/etc/prometheus:ro | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same question as before, it will be set on the host os, and mount as read-only on the container, right? |
||
| Volume=alertmanager.volume:/var/lib/prometheus/alertmanager | ||
| {% if entrypoint|length > 1 -%} | ||
| PodmanArgs=--entrypoint '{{ entrypoint|tojson }}' | ||
| {%- endif %} | ||
| PublishPort=9093:9093 | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target default.target | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| [Unit] | ||
| Description=Alertmanager Container Volume | ||
|
|
||
| [Volume] | ||
| Label=app=alertmanager |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| {%- set tls_enabled = salt['pillar.get']('prometheus:tls:enabled', False) %} | ||
| {%- set args = salt['pillar.get']('prometheus:blackbox_exporter:args').split(' ') %} | ||
| {%- set entrypoint = ['/usr/bin/blackbox_exporter'] %} | ||
| {%- if args %} | ||
| {%- do entrypoint.extend(args) %} | ||
| {%- endif -%} | ||
| {%- if tls_enabled %} | ||
| {%- do entrypoint.append('--web.config.file=' ~ web_config_file) %} | ||
| {%- endif -%} | ||
|
|
||
| [Unit] | ||
| Description=Blackbox Exporter Container | ||
|
|
||
| [Container] | ||
| Label=app=blackbox_exporter | ||
| ContainerName=blackbox_exporter | ||
| Image=registry.opensuse.org/devel/bci/sle-15-sp6/containerfile/suse/blackbox_exporter:0.24.0 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same question about the image coming from registry.suse.com |
||
| Volume=/etc/prometheus:/etc/prometheus:ro | ||
| {% if entrypoint|length > 1 -%} | ||
| PodmanArgs=--entrypoint '{{ entrypoint|tojson }}' | ||
| {%- endif %} | ||
| PublishPort=9115:9115 | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target default.target | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the other images, it needs to be configurable and for suse manager it needs to come from suse.registry.com with proper authentication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack