Skip to content

Add prometheus metrics#1426

Open
wcawijngaards wants to merge 25 commits intomasterfrom
add-prometheus-metrics
Open

Add prometheus metrics#1426
wcawijngaards wants to merge 25 commits intomasterfrom
add-prometheus-metrics

Conversation

@wcawijngaards
Copy link
Copy Markdown
Member

@wcawijngaards wcawijngaards commented Mar 27, 2026

This change adds prometheus metrics support.

It is enabled with metrics-enable: yes in the unbound.conf config file. It can have settings with metrics-interface: 127.0.0.1 and metrics-port: 9100 and metrics-path: "/metrics".

The settings statistics-cumulative: no and extended-statistics: yes, can make more out of the metrics printout. It can also work with cumulative enabled, but the graphs would go up. It prints the same output style as the contrib/metrics.awk script does. It prints more values.

There is also an added unit test, that checks if the output works. The metrics support needs libevent 2 or later. The sockets for serving it put on the same event base as that the remote control uses, it is the event base of the first worker.

Fixes #352 (feature request for prometheus metrics support).

  daemon/metrics.c and daemon/metrics.h for statistics in prometheus metrics.
…e disabled

  there is no http service created.
  add documentation in man page, and in the example config file.
@wcawijngaards wcawijngaards self-assigned this Mar 27, 2026
@wcawijngaards wcawijngaards requested a review from gthess March 27, 2026 12:39
@edmonds
Copy link
Copy Markdown
Contributor

edmonds commented Apr 1, 2026

The settings statistics-cumulative: no and extended-statistics: yes, can make more out of the metrics printout. It can also work with cumulative enabled, but the graphs would go up.

It sounds a bit odd to me that a server would expose Prometheus metrics but there would be a configuration option that resets the metrics (or not). That would be a weird thing to do in the Prometheus ecosystem, e.g. sometimes you have multiple metrics scrapers running simultaneously for a high availability setup, or you hit the /metrics endpoint manually with curl at the same time that a scraper is running. So you don't want to reset the metrics after a fetch because that would be inherently racy.

Looking at the source in this PR it looks like every metric is a gauge, except for unbound_time_up_seconds_total which is a counter. In the Prometheus data model (https://prometheus.io/docs/concepts/metric_types/):

A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors. [...] Do not use a counter to expose a value that can decrease. [...]

A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. [...]
Gauges are typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of concurrent requests.

So metrics that count the number of events that occur, like DNS queries, should be counters, and metrics that count the size of data structures that can increase or decrease in size like the number of request list entries, should be gauges. There should not be an option to reset metrics to zero, and metrics that should be defined as counters should not be defined as gauges instead in order to support resetting them to zero.

I am not quite sure what you mean by "It can also work with cumulative enabled, but the graphs would go up." For something like a DNS server or an HTTP server that is counting the number of requests it has processed, you would typically want the server's Prometheus metrics endpoint to expose the total (i.e. cumulative) number of requests it has processed since startup, as a counter (and these counter values would necessarily increase monotonically over time), and then you would use a Prometheus function like rate or irate to calculate per-second rates based on that metric.

@wcawijngaards
Copy link
Copy Markdown
Member Author

Thank you for looking at the statistics set up. Currently it is the same as the contrib/metrics.awk set up. But of course, with statistics-cumulative: yes , then it could be set as counters for the output types. Would it be better to warn if statistics are not set to be cumulative, and then have the metric types as counter? Another, confusing option can be to expose type gauge when cumulative is disabled, and type counter when cumulative is enabled.

@edmonds
Copy link
Copy Markdown
Contributor

edmonds commented Apr 1, 2026

Looking at contrib/metrics.awk, it has the same problem of using gauges instead of counters for the metrics that are counting events, and recommending that the counters be reset, instead of fetching them with unbound-control stats_noreset (or setting statistics-cumulative: yes).

So contrib/metrics.awk could be fixed as well, I suppose, or it could be deprecated and removed in favor of the native HTTP export for Prometheus format metrics in this PR.

Would it be better to warn if statistics are not set to be cumulative, and then have the metric types as counter?

Yes, I suppose if the statistics-cumulative: no setting is going to be retained (and if there are only a single set of server metrics kept internally then I guess it is not possible to keep both cumulative and non-cumulative metrics simultaneously) then I guess it would be better to have the metric types correctly defined (counter for the metrics that are events) and warn if metrics-enable: yes but statistics-cumulative: yes is not set.

Probably the safest thing to do would be to have the HTTP /metrics endpoint never reset the metrics (i.e. have it behave like unbound-control stats_noreset) and also generate a warning if metrics-enable: yes is set without statistics-cumulative: yes. That way if the warning is neglected by not setting statistics-cumulative: yes, the metrics will only be reset if the user intentionally runs unbound-control stats.

Another, confusing option can be to expose type gauge when cumulative is disabled, and type counter when cumulative is enabled.

That would probably be very confusing.

…f query

  metrics, and do not reset the stats. There is a warning when
  statistics-cumulative has the wrong value. But the stats are not reset
  from the metrics endpoint regardless. The contrib/metrics.awk script
  is also updated, and the documentation recommends the cumulative setting.
@wcawijngaards
Copy link
Copy Markdown
Member Author

Changed it, it does not reset the stats when queried with the metrics interface. And documentation suggests cumulative stats. Also fixed the contrib version. The metrics are set to be of type counter, for the query count statistics, but not for requestlist size and memory sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prometheus metrics support

2 participants