prometheus: Add Prometheus exporter and example Grafana dashboard by sbates130272 · Pull Request #996 · lemonade-sdk/lemonade

sbates130272 · 2026-01-30T22:57:32Z

Add prometheus/lemonade-exporter.py: Prometheus exporter that scrapes Lemonade Server metrics from /api/v1/stats and /api/v1/health endpoints
Add prometheus/grafana/grafana-dashboard.json: Example Grafana dashboard for monitoring CPU, GPU, and Lemonade Server metrics
Add prometheus/lemonade-exporter.service: systemd service file for running the lemonade exporter as a systemd service.
Add README.md: Documentation for exporter setup and usage
Update llamacpp_server.cpp: Add --metrics flag to enable a llama.cpp Prometheus metrics endpoint for backend monitoring
Add decode_token_times tracking to streaming_proxy for accurate per-token metrics instead of estimating from average TPS

The exporter exposes metrics for:

Server status and availability
Token generation performance (tokens/sec, TTFT)
Per-token decode timing (histogram)
Model loading and management
llama.cpp backend metrics (throughput, requests, decode calls)
Cache hit rates and token processing rates

The Grafana dashboard visualizes:

GPU utilization, VRAM, temperature, power consumption
CPU load, memory, disk and network I/O
Lemonade Server performance metrics
Backend-specific metrics per model
GPU kernel dispatch rate and workload patterns

This provides comprehensive monitoring of the Lemonade Server with an example of potential Grafana dashboard visualization.

- Add prometheus/lemonade-exporter.py: Prometheus exporter that scrapes Lemonade Server metrics from /api/v1/stats and /api/v1/health endpoints - Add prometheus/grafana/grafana-dashboard.json: Example Grafana dashboard for monitoring CPU, GPU, and Lemonade Server metrics - Add prometheus/lemonade-exporter.service: systemd service file for running the lemonade exporter as a systemd service. - Add README.md: Documentation for exporter setup and usage - Update llamacpp_server.cpp: Add --metrics flag to enable a llama.cpp Prometheus metrics endpoint for backend monitoring - Add decode_token_times tracking to streaming_proxy for accurate per-token metrics instead of estimating from average TPS The exporter exposes metrics for: - Server status and availability - Token generation performance (tokens/sec, TTFT) - Per-token decode timing (histogram) - Model loading and management - llama.cpp backend metrics (throughput, requests, decode calls) - Cache hit rates and token processing rates The Grafana dashboard visualizes: - GPU utilization, VRAM, temperature, power consumption - CPU load, memory, disk and network I/O - Lemonade Server performance metrics - Backend-specific metrics per model - GPU kernel dispatch rate and workload patterns This provides comprehensive monitoring of the Lemonade Server with an example of potential Grafana dashboard visualization. Signed-off-by: Stephen Bates <[email protected]>

sbates130272 requested review from amd-pworfolk and jeremyfowers January 30, 2026 22:57

sbates130272 self-assigned this Jan 30, 2026

sbates130272 added the enhancement New feature or request label Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus: Add Prometheus exporter and example Grafana dashboard#996

prometheus: Add Prometheus exporter and example Grafana dashboard#996
sbates130272 wants to merge 1 commit intomainfrom
batesste/prometheus-exporter

sbates130272 commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sbates130272 commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant