Skip to content

prometheus: Add Prometheus exporter and example Grafana dashboard#996

Draft
sbates130272 wants to merge 1 commit intomainfrom
batesste/prometheus-exporter
Draft

prometheus: Add Prometheus exporter and example Grafana dashboard#996
sbates130272 wants to merge 1 commit intomainfrom
batesste/prometheus-exporter

Conversation

@sbates130272
Copy link

  • Add prometheus/lemonade-exporter.py: Prometheus exporter that scrapes Lemonade Server metrics from /api/v1/stats and /api/v1/health endpoints
  • Add prometheus/grafana/grafana-dashboard.json: Example Grafana dashboard for monitoring CPU, GPU, and Lemonade Server metrics
  • Add prometheus/lemonade-exporter.service: systemd service file for running the lemonade exporter as a systemd service.
  • Add README.md: Documentation for exporter setup and usage
  • Update llamacpp_server.cpp: Add --metrics flag to enable a llama.cpp Prometheus metrics endpoint for backend monitoring
  • Add decode_token_times tracking to streaming_proxy for accurate per-token metrics instead of estimating from average TPS

The exporter exposes metrics for:

  • Server status and availability
  • Token generation performance (tokens/sec, TTFT)
  • Per-token decode timing (histogram)
  • Model loading and management
  • llama.cpp backend metrics (throughput, requests, decode calls)
  • Cache hit rates and token processing rates

The Grafana dashboard visualizes:

  • GPU utilization, VRAM, temperature, power consumption
  • CPU load, memory, disk and network I/O
  • Lemonade Server performance metrics
  • Backend-specific metrics per model
  • GPU kernel dispatch rate and workload patterns

This provides comprehensive monitoring of the Lemonade Server with an example of potential Grafana dashboard visualization.

- Add prometheus/lemonade-exporter.py: Prometheus exporter that
  scrapes Lemonade Server metrics from /api/v1/stats and
  /api/v1/health endpoints
- Add prometheus/grafana/grafana-dashboard.json: Example Grafana
  dashboard for monitoring CPU, GPU, and Lemonade Server metrics
- Add prometheus/lemonade-exporter.service: systemd service file for
  running the lemonade exporter as a systemd service.
- Add README.md: Documentation for exporter setup and usage
- Update llamacpp_server.cpp: Add --metrics flag to enable a llama.cpp
  Prometheus metrics endpoint for backend monitoring
- Add decode_token_times tracking to streaming_proxy for accurate
  per-token  metrics instead of estimating from average TPS

The exporter exposes metrics for:
- Server status and availability
- Token generation performance (tokens/sec, TTFT)
- Per-token decode timing (histogram)
- Model loading and management
- llama.cpp backend metrics (throughput, requests, decode calls)
- Cache hit rates and token processing rates

The Grafana dashboard visualizes:
- GPU utilization, VRAM, temperature, power consumption
- CPU load, memory, disk and network I/O
- Lemonade Server performance metrics
- Backend-specific metrics per model
- GPU kernel dispatch rate and workload patterns

This provides comprehensive monitoring of the Lemonade Server with
an example of potential Grafana dashboard visualization.

Signed-off-by: Stephen Bates <[email protected]>
@sbates130272 sbates130272 self-assigned this Jan 30, 2026
@sbates130272 sbates130272 added the enhancement New feature or request label Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant