prometheus: Add Prometheus exporter and example Grafana dashboard#996
Draft
sbates130272 wants to merge 1 commit intomainfrom
Draft
prometheus: Add Prometheus exporter and example Grafana dashboard#996sbates130272 wants to merge 1 commit intomainfrom
sbates130272 wants to merge 1 commit intomainfrom
Conversation
- Add prometheus/lemonade-exporter.py: Prometheus exporter that scrapes Lemonade Server metrics from /api/v1/stats and /api/v1/health endpoints - Add prometheus/grafana/grafana-dashboard.json: Example Grafana dashboard for monitoring CPU, GPU, and Lemonade Server metrics - Add prometheus/lemonade-exporter.service: systemd service file for running the lemonade exporter as a systemd service. - Add README.md: Documentation for exporter setup and usage - Update llamacpp_server.cpp: Add --metrics flag to enable a llama.cpp Prometheus metrics endpoint for backend monitoring - Add decode_token_times tracking to streaming_proxy for accurate per-token metrics instead of estimating from average TPS The exporter exposes metrics for: - Server status and availability - Token generation performance (tokens/sec, TTFT) - Per-token decode timing (histogram) - Model loading and management - llama.cpp backend metrics (throughput, requests, decode calls) - Cache hit rates and token processing rates The Grafana dashboard visualizes: - GPU utilization, VRAM, temperature, power consumption - CPU load, memory, disk and network I/O - Lemonade Server performance metrics - Backend-specific metrics per model - GPU kernel dispatch rate and workload patterns This provides comprehensive monitoring of the Lemonade Server with an example of potential Grafana dashboard visualization. Signed-off-by: Stephen Bates <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The exporter exposes metrics for:
The Grafana dashboard visualizes:
This provides comprehensive monitoring of the Lemonade Server with an example of potential Grafana dashboard visualization.