feat(analytics): add read-only analytics API endpoints (#238)#243
feat(analytics): add read-only analytics API endpoints (#238)#243tinu-hareesswar wants to merge 1 commit intomainfrom
Conversation
Add GET endpoints under /analytics/ that expose operational metrics from Prometheus counters for gateway scoring, decisions, feedbacks, and routing stats — giving operators a single API surface to monitor engine behavior without querying Prometheus/Redis directly. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a new /analytics sub-router exposing read-only operational analytics endpoints backed by in-process Prometheus counters, intended for real-time monitoring without DB/Redis access.
Changes:
- Introduces
src/routes/analytics.rsimplementing 4 newGETendpoints (gateway-scores,decisions,feedbacks,routing-stats). - Exposes the analytics module via
src/routes.rs. - Nests the analytics router under
/analyticsinsrc/app.rs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/routes/analytics.rs | Adds analytics router, request/response types, metric-gathering helpers, and handlers. |
| src/routes.rs | Exports the new analytics routes module. |
| src/app.rs | Wires the analytics router into the main app at /analytics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn collect_status_counts() -> HashMap<String, HashMap<String, u64>> { | ||
| let mut counts: HashMap<String, HashMap<String, u64>> = HashMap::new(); | ||
| let metric_families = prometheus::gather(); | ||
| for mf in &metric_families { | ||
| if mf.get_name() == "api_requests_by_status" { | ||
| for m in mf.get_metric() { |
There was a problem hiding this comment.
Same as collect_total_counts: this function calls prometheus::gather() again, so most analytics handlers gather the entire Prometheus registry twice per request. Refactor to share a single gather result across both totals + status extraction to reduce overhead on the hot path.
| let decision_endpoints: Vec<&str> = match params.group_by.as_deref() { | ||
| Some("gateway") => vec!["decide_gateway", "decision_gateway"], | ||
| Some("approach") => vec!["decide_gateway"], | ||
| _ => totals.keys().map(|k| k.as_str()).collect(), | ||
| }; |
There was a problem hiding this comment.
group_by is described as a grouping control, but here it only switches between hard-coded endpoint lists and the response is still per-endpoint (no grouping by gateway/approach is actually performed). This also makes group_by=approach misleading since no "approach" label is available from these metrics. Consider rejecting unsupported group_by values with a 400, or implementing real grouping semantics that match the documented API.
| pub async fn routing_stats( | ||
| Query(_params): Query<RoutingStatsParams>, | ||
| ) -> Json<RoutingStatsResponse> { |
There was a problem hiding this comment.
RoutingStatsParams includes a range query param but the handler ignores it entirely. Either apply the range to the returned stats (e.g., windowed rates) or remove the param to avoid silently ignoring client input.
| pub async fn routing_stats( | |
| Query(_params): Query<RoutingStatsParams>, | |
| ) -> Json<RoutingStatsResponse> { | |
| pub async fn routing_stats() -> Json<RoutingStatsResponse> { |
| pub struct TimeRangeParams { | ||
| /// Time range: 15m, 1h, 6h, 24h, 7d | ||
| pub range: Option<String>, | ||
| /// Bucket granularity: 10s, 1m, 5m, 1h | ||
| pub granularity: Option<String>, | ||
| } | ||
|
|
||
| #[derive(Debug, Deserialize)] | ||
| pub struct GatewayScoreParams { | ||
| pub merchant: Option<String>, | ||
| pub pmt: Option<String>, | ||
| pub gateway: Option<String>, | ||
| #[serde(flatten)] | ||
| pub time: TimeRangeParams, | ||
| } | ||
|
|
||
| #[derive(Debug, Deserialize)] | ||
| pub struct DecisionParams { | ||
| pub group_by: Option<String>, | ||
| #[serde(flatten)] | ||
| pub time: TimeRangeParams, |
There was a problem hiding this comment.
range and granularity are defined as query params but are not used by any handler in this module. This makes the API contract misleading (clients may assume time-windowed/bucketed results). Either implement time-range bucketing/filtering, or remove these params until supported.
| pub struct TimeRangeParams { | |
| /// Time range: 15m, 1h, 6h, 24h, 7d | |
| pub range: Option<String>, | |
| /// Bucket granularity: 10s, 1m, 5m, 1h | |
| pub granularity: Option<String>, | |
| } | |
| #[derive(Debug, Deserialize)] | |
| pub struct GatewayScoreParams { | |
| pub merchant: Option<String>, | |
| pub pmt: Option<String>, | |
| pub gateway: Option<String>, | |
| #[serde(flatten)] | |
| pub time: TimeRangeParams, | |
| } | |
| #[derive(Debug, Deserialize)] | |
| pub struct DecisionParams { | |
| pub group_by: Option<String>, | |
| #[serde(flatten)] | |
| pub time: TimeRangeParams, | |
| pub struct GatewayScoreParams { | |
| pub merchant: Option<String>, | |
| pub pmt: Option<String>, | |
| pub gateway: Option<String>, | |
| } | |
| #[derive(Debug, Deserialize)] | |
| pub struct DecisionParams { | |
| pub group_by: Option<String>, |
| pub merchant: Option<String>, | ||
| pub pmt: Option<String>, | ||
| pub gateway: Option<String>, |
There was a problem hiding this comment.
merchant and pmt query params are accepted but never used for filtering, and gateway filtering is implemented as a substring match against the endpoint label. Consider either (a) implementing merchant/pmt filtering, and renaming the filter to endpoint (or making it an exact match) to avoid implying this filters by PSP/gateway name.
| pub merchant: Option<String>, | |
| pub pmt: Option<String>, | |
| pub gateway: Option<String>, | |
| /// Filters by endpoint label. | |
| /// Accept `gateway` as a backward-compatible alias for older clients. | |
| #[serde(alias = "gateway")] | |
| pub endpoint: Option<String>, |
| /// Collect per-endpoint totals from `API_REQUEST_TOTAL_COUNTER`. | ||
| fn collect_total_counts() -> HashMap<String, u64> { | ||
| let mut totals: HashMap<String, u64> = HashMap::new(); | ||
| let metric_families = prometheus::gather(); | ||
| for mf in &metric_families { |
There was a problem hiding this comment.
collect_total_counts calls prometheus::gather(). Since handlers also call collect_status_counts (which gathers again), a single request typically gathers all metrics twice. Consider gathering once per request and passing the gathered MetricFamily list into both parsing functions (or a single function that extracts both totals and status counts).
| /// Collect per-endpoint totals from `API_REQUEST_TOTAL_COUNTER`. | |
| fn collect_total_counts() -> HashMap<String, u64> { | |
| let mut totals: HashMap<String, u64> = HashMap::new(); | |
| let metric_families = prometheus::gather(); | |
| for mf in &metric_families { | |
| /// Collect per-endpoint totals from an already-gathered Prometheus snapshot. | |
| fn collect_total_counts( | |
| metric_families: &[prometheus::proto::MetricFamily], | |
| ) -> HashMap<String, u64> { | |
| let mut totals: HashMap<String, u64> = HashMap::new(); | |
| for mf in metric_families { |
Summary
GETendpoints under/analytics/for real-time operational monitoring:/analytics/gateway-scores— per-endpoint success rates and request counts from Prometheus counters/analytics/decisions— decision throughput with optionalgroup_byfiltering (gateway/approach)/analytics/feedbacks— feedback ingestion stats (update_score,update_gateway_score)/analytics/routing-stats— per-endpoint error rates and request volumesrc/routes/analytics.rswith a sub-router wired via.nest("/analytics", ...)following the same pattern as/healthCloses #238
Test plan
cargo checkpasses (verified locally with--features postgres)curl http://localhost:8080/analytics/gateway-scorescurl "http://localhost:8080/analytics/gateway-scores?gateway=decide"🤖 Generated with Claude Code