Skip to content

Logos: UI Enhancements and Fair lane Eviction Algo#499

Open
JakubJakubczyk wants to merge 15 commits intomainfrom
ui-integration
Open

Logos: UI Enhancements and Fair lane Eviction Algo#499
JakubJakubczyk wants to merge 15 commits intomainfrom
ui-integration

Conversation

@JakubJakubczyk
Copy link
Copy Markdown
Contributor

@JakubJakubczyk JakubJakubczyk commented Apr 8, 2026

Logos UI Integration, ETTFT-Corrected Classification Scheduling, and Capacity Planner Hardening

Summary

This PR brings three major workstreams together:

  1. Logos UI remaster — a rebuilt /statistics page with per-worker GPU panels, live lane-health metrics, VRAM allocation pie, and paginated request history wired to the new v2 stats WebSocket.
  2. ETTFT-Corrected Classification Scheduling (ECCS) — a reworked ETTFT estimator with a range-scaled additive penalty formulation, VRAM-aware reclaim tiers, multi-provider candidate expansion, and decision logging for ablation studies.
  3. Capacity planner hardening — replaces forward VRAM reservations with a per-provider capacity lock, adds anti-thrashing with tenure + adaptive cooldown + queue-depth fairness, VRAM headroom gate, sleeping-lane reclaim, multi-lane drain, and UNAVAILABLE fallback

Also included: dynamic logosnode deployment migration, worker-node p50 latency plumbing into the SDI, large benchmark/ablation infrastructure, and new architecture docs.


1. UI integration (logos-ui)

New /statistics page replaces the previous layout with a component-driven architecture.

New components (logos-ui/components/statistics/)

  • worker-gpu-panel.tsx — per-worker GPU card with connection state, device info, VRAM bars, and loaded/sleeping/cold lane counts.
  • lane-metrics-panel.tsx — live lane-health rows with state dot, runtime mode (vLLM badge), KV-cache usage bar (green/amber/red thresholds), and p95 TTFT color coding.
  • lane-vram-pie.tsx / lane-vram-pie.web.tsx — Plotly VRAM allocation pie (web) with native fallback.
  • paginated-request-list.tsx — paginated request history with filtering.
  • types.ts — shared types for DeviceInfo, LaneSignalData, and related scheduler-signal payloads.
  • constants.ts — centralized lane-state color palette and API base.

app/statistics.tsx (rewrite)

  • 40 / 60 column split, combined GPU + VRAM card, lane-health section, compact summary-header cards.
  • Dark-mode theming fixes across the pie chart and new panels.
  • New skeletons for all new sections.

hooks/use-stats-websocket-v2.ts

  • Typed VramV2Sample / VramV2Provider payloads with scheduler_signals.provider, scheduler_signals.lanes, and devices[].
  • Tracks connection state, runtime modes, and last heartbeat per provider.

Backend stats fix

  • fix(stats): handle the renamed vLLM kv_cache metric (f8e3227f) so GPU cache usage continues to render after vLLM's metric rename.

2. ETTFT / ECCS — pipeline

ettft_estimator.py — full rewrite

Replaces the fixed-penalty tier table with a range-scaled additive correction:

corrected(m, p) = w(m) - penalty(m, p)
penalty        = min(E[wait] / H, 1.0) × S × α
  • H = 60 s normalization horizon, α = 1.5 correction strength.
  • S = weight span with a 20 % floor and absolute floor of 1.0, so close or negative weights still yield meaningful corrections.
  • E[wait] = state_overhead + reclaim_overhead + queue_wait, where queue wait uses observed per-request service time and the 3× vLLM effective-parallel multiplier.
  • Six VRAM-aware tiers: WARM, SLEEPING, SLEEPING_RECLAIM, COLD, COLD_RECLAIM, UNAVAILABLE. Reclaim overhead distinguishes idle eviction (3 s) from busy drain (~30 s).
  • Preserves same-state classification ordering (same state → same penalty).
  • Backward-compatible ettft_ms property for older callers.

correcting_scheduler.py

  • All schedulers (FCFS / utilization / correcting) now enumerate every (model_id, provider_id) deployment instead of next()-matching the first. Each pair is an independent scored candidate.
  • Queue fallback prefers logosnode deployments.
  • Lane-readiness gating: skips candidates whose lanes are not in a ready state.
  • ECCS decision logging — JSON-lines output (env ECCS_DECISION_LOG) with candidates, classification weights, corrected scores, tiers, selection, and a correction_changed flag for ablation.
  • Weight override — env ECCS_WEIGHT_OVERRIDE injects known weights for controlled experiments.

Supporting changes

  • sdi/logosnode_facade.py: new get_parallel_capacity, get_model_name, get_scheduler_queue_depth_by_model_name, is_model_lane_ready.
  • sdi/models.py, sdi/providers/logosnode_provider.py, logos/logosnode_registry.py, logos_worker_node/vllm_process.py: p50 latency propagated end-to-end into LaneSchedulerSignals and into ettft_estimator as observed service time.
  • capacity/capacity_planner.py: lane-ready signal fed to ETTFT.

3. Capacity planner — big fixes

Chronological fixes to src/logos/capacity/capacity_planner.py (1 562 lines changed net):

Commit Issue fixed Result
cf4c51c0 Thrashing under dual-model GPU contention ~99 → ~6 drains; min-tenure, adaptive cooldown, queue fairness, fire-and-forget triggers, lane-readiness gating
f57d2747 release() TOCTOU + insufficient resolver retries Lane tenure protection added; context resolver retry window extended
9febf08b Forward VRAM reservation deadlocks (committed VRAM ballooned to 48 GB on a 32 GB GPU) Replaced with per-provider asyncio.Lock; 147/150 success, 0 cold starts, 0 lock timeouts
74852e75 Demand-blind reclaim, per-lane VRAM gate blocking multi-lane drain, vestigial thrash code, mismatched drain-reason string Demand-gated reclaim, multi-lane drain, dead-code cleanup; 148/150 success, 0 thrash incidents
bcf78771 vLLM under-utilized (worst-case-context concurrency) VLLM_CONCURRENCY_OVERSUBSCRIPTION=3 (Qwen 7B 36→108, Qwen 14B 10→30, Mistral 7B 16→48); BACKEND_QUEUE_PRESSURE_THRESHOLD 2→8; _drain_lane also checks vLLM queue_waiting / requests_running. P50 90.3 s → 61.0 s (−32 %), P95 −11 %, 0 % failures
11ea3774 Drain overshoot during continuous batching Context resolver retry 30 s → 60 s. 100 % success at 150/300/600 vs 98.7 % / 81 % / 76 %
cc38dcd0 Unfair model switching when scheduler queue is skewed; no fallback when provider becomes UNAVAILABLE Queue-fair model switching + UNAVAILABLE fallback + anti-thrash v4
4431461a vLLM utilization floor causing early evictions; sleeping lanes not being reclaimed VRAM headroom gate; sleeping-lane reclaim; vLLM utilization floor disabled; multi-provider scheduling
da59856b Swap fairness Stable fairness algo for swaps, secure VRAM allocations/loads

Supporting

  • capacity/vram_ledger.py: cleaned up reservations API.
  • logosnode_registry.py, main.py: dynamic worker announcements, capacity trigger retry.
  • monitoring/prometheus_metrics.py: new counters (switch events, reclaim reasons).

4. Worker node

  • lane_manager.py, logos_bridge.py, calibration.py, model_profiles.py, vllm_process.py: p50 latency, per-GPU KV-cache size calculation fix (5b3de664), dynamic model announcement support.
  • config.yml / model_profiles.yml: updated tuning (3× oversubscription, queue threshold 8, lane profiles).
  • docker-compose.dev.yml: sdi/ bind-mount for live code reload.

5. Database

  • db/migrations/027_logosnode_dynamic_deployments.sql — adds logosnode_provider_keys table so workernode providers don't need per-model rows in model_api_keys; application layer auto-syncs model_provider rows from WebSocket capabilities.
  • db/init.sql updated with the new table so fresh dev environments match production.

6. Infrastructure / benchmarks / ablation

  • tests/performance/run_eccs_benchmark.py, run_eccs_ablation.sh, analyze_eccs_ablation.py, generate_eccs_ablation_workloads.py, plot_benchmark_distributions.py, plot_ollama_vs_vllm{,_3x,_ttft}.py, run_ollama_benchmark.py, run_ollama_benchmark_suite.sh, benchmark_spread.sh, regenerate_charts.py.
  • New workloads under performance/workloads/ablation_5model/, explicit/10m/, resource/10m/.
  • New comparative/ Ollama-vs-vLLM result sets (150/300/600) and refreshed logosworkernode-singular/ results.
  • tests/integration/test_capacity_fixes.sh, tests/smoke_test_capacity.sh.
  • Cleaned up the previous explicit/ result folders (replaced by the organized 6-folder resolver30s/60s layout).

Unit tests updated

  • tests/unit/capacity/test_capacity_planner.py (+335 lines)
  • tests/unit/pipeline/test_correcting_scheduler.py (+565 lines)
  • tests/unit/pipeline/test_ettft_estimator.py (+517 lines)

7. Docs

  • logos/docs/capacity-management-architecture.md (1 544 lines) — end-to-end capacity planner architecture.
  • logos/docs/ettft-scheduling.md (205 lines) — ECCS formal model.
  • logos/docs/benchmark-results-analysis.md (169 lines) — benchmark methodology & results.

JakubJakubczyk and others added 5 commits April 6, 2026 22:19
…ayout with combined GPU/VRAM card and 40/60 column split
…lane readiness gating

Add minimum tenure, adaptive cooldown, queue-depth fairness checks, fire-and-forget capacity triggers, lane readiness gating in scheduler, and context resolver retry to reduce model switching from ~99 to ~6 drains on dual-model GPU setups.
@JakubJakubczyk JakubJakubczyk requested a review from a team as a code owner April 8, 2026 11:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Integrates lane- and GPU-aware telemetry into the UI and websocket typings, adds a paginated requests endpoint/UI, wires capacity-planner callbacks into scheduling, changes scheduler capacity/slot semantics, updates worker-node vLLM/runtime model configs, adjusts Docker/Traefik/Uvicorn dev deployment, and adds capacity-related Prometheus metrics.

Changes

Cohort / File(s) Summary
Container & Compose
logos/Dockerfile, logos/docker-compose.dev.yaml, logos/logos-workernode/docker-compose.dev.yml
Uvicorn startup added websocket keepalive flags; dev compose moved to HTTP entrypoints, Traefik router labels/ports updated, dashboard disabled; added GPU-enabled workernode service, external ollama-models volume; worker compose switched from gpus: all to explicit NVIDIA device reservations.
Server — API & DB
logos/src/logos/dbutils/dbmanager.py, logos/src/logos/main.py
Added DBManager.get_paginated_requests(...) plus POST/OPTIONS /logosdb/paginated_requests; merged normalized devices into vram scheduler signals; wired scheduler._on_capacity_needed callback in startup.
Capacity Planner & Scheduling
logos/src/logos/capacity/capacity_planner.py, logos/src/logos/pipeline/base_scheduler.py, logos/src/logos/pipeline/correcting_scheduler.py, logos/src/logos/pipeline/context_resolver.py, logos/src/logos/pipeline/scheduler_interface.py, logos/src/logos/pipeline/ettft_estimator.py, logos/src/logos/pipeline/fcfs_scheduler.py, logos/src/logos/pipeline/utilization_scheduler.py
Tuned competitive ratios and added tenure/cooldown/queue-depth guards, provider reconnect polling, per-provider capacity locks, deterministic waiting/retry paths, drain suppression, prefer sleep_l1 over stop for reclaims, added on_capacity_needed wiring and fire-and-forget signaling, introduced slot_transferred to scheduling results, and replaced tiered penalty model with an expected-wait decomposition.
LogosNode Facade & Provider
logos/src/logos/sdi/logosnode_facade.py, logos/src/logos/sdi/providers/logosnode_provider.py
Added facade queries for parallel capacity, model name, scheduler queue depth, and lane readiness; provider adds _is_model_lane_ready, blocks reservations for not-ready lanes, and applies a 256 capacity hint for missing vLLM num_parallel.
VRAM Ledger & Runtime
logos/src/logos/capacity/vram_ledger.py, logos/src/logos/monitoring/prometheus_metrics.py, logos/logos-workernode/logos_worker_node/model_profiles.py, logos/logos-workernode/logos_worker_node/vllm_process.py
Adjusted VRAM release accounting to avoid unconditional clamping and relaxed per-GPU gating for multi-GPU placements; added Prometheus metrics for planner switches and switch gaps; model-profile metadata now tracks TP changes and influences residency updates; vLLM process parses max concurrency and exposes max_concurrency.
Workernode Config & Data
logos/logos-workernode/config.yml, logos/logos-workernode/data/lanes.json, logos/logos-workernode/data/model_profiles.yml, logos/logos-workernode/logos_worker_node/config.py, logos/logos-workernode/logos_worker_node/logos_bridge.py
Converted capabilities models to structured objects, added Mistral AWQ entry, applied vLLM engine overrides (eager/NCCL/all-reduce knobs), added lanes JSON and updated model profiles; removed lane-state persistence/load-save logic and bridge save calls.
UI — Statistics Page & Components
logos/logos-ui/app/statistics.tsx, logos/logos-ui/tsconfig.json, logos/logos-ui/components/statistics/*, logos/logos-ui/hooks/use-stats-websocket-v2.ts
Reworked statistics page to ingest devices/lanes and derive provider/lane state; removed provider select; added WorkerGpuPanel, LaneMetricsPanel, web LaneVramPie + stub, PaginatedRequestList, Plotly prop extensions, lane state colors, new TS types (DeviceInfo, LaneSignalData, paginated request shapes), and stronger websocket typings; set JSX runtime to react-native.
Charts & Plotly
logos/logos-ui/components/statistics/plotly-vram-chart.tsx, .../plotly-vram-chart.web.tsx, plotly-pie-chart.tsx
Added laneStateByProvider propagation, derived per-lane time series and dashed lane traces (legendonly), included lanes in incremental trace updates, relabeled y-axis to “VRAM (GB)”, and extended pie chart props (pieScale, hover options).
UI — Styling & Small Props
logos/logos-ui/components/statistics/chart-card.tsx
Minor spacing/typography adjustments (padding/margins and title size).
Requests UI & Hooks
logos/logos-ui/components/statistics/paginated-request-list.tsx
Added paginated requests list UI merging live WS requests for page 1, paging controls, retry behavior, and card rendering for request stages and metrics.
Server — SDI Integration
logos/src/logos/sdi/logosnode_facade.py
Added several scheduling/capacity query helpers used by planner/schedulers (get_parallel_capacity, get_model_name, get_scheduler_queue_depth_by_model_name, is_model_lane_ready).
DB & Tests — Capacity
logos/tests/unit/capacity/test_capacity_planner.py
Extended unit tests and mocks for new tenure, fairness, pending-capacity retry, and reclaim semantics; updated expectations to prefer sleeping over stopping and exercised queue-depth gating.
Performance Tools & Benchmarks
logos/tests/performance/*, logos/tests/benchmark_spread.sh, logos/tests/integration/test_capacity_fixes.sh
Added many workload generators, benchmark runners, plotting/analyzers, bash orchestration scripts, and an integration test script for capacity fixes; added/removed performance result artifacts and updated .gitignore to ignore results dir.
Misc — Deleted Artifacts
logos/tests/performance/results/*, logos/.gitignore
Removed several archived performance JSON/run_meta files and added ignore rule for performance results directory.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Scheduler as CorrectingScheduler
    participant CapacityPlanner
    participant Provider as LogosNodeProvider
    participant LogosNode as LogosNode

    Client->>Scheduler: Submit request (model)
    Scheduler->>Scheduler: Enqueue / rank candidates
    Scheduler->>CapacityPlanner: on_capacity_needed(provider_id, model_name)
    CapacityPlanner->>CapacityPlanner: Evaluate tenure, queue-depth, reclaim options
    CapacityPlanner->>Provider: prepare_lane_for_request(provider_id, model_name)
    alt Lane ready
        Provider->>LogosNode: wake/load lane
        LogosNode-->>Provider: lane active
    else Needs reclaim
        CapacityPlanner->>Provider: select reclaim (prefer sleep)
        Provider->>LogosNode: sleep/stop victim lane
        LogosNode-->>Provider: capacity freed
        Provider->>LogosNode: wake target lane
    end
    Scheduler->>Provider: Dispatch request to lane
    Provider-->>Scheduler: Dispatch result
    Scheduler-->>Client: Request accepted/started
Loading
sequenceDiagram
    participant UI as Statistics UI
    participant WS as WebSocket
    participant Backend as Backend Service
    participant DB as Database

    WS->>UI: vram_init / vram_delta (providers, devices, lanes)
    UI->>UI: derive devicesByProvider & lanesByProvider
    UI->>UI: render WorkerGpuPanel, LaneVramPie, LaneMetricsPanel
    UI->>Backend: POST /logosdb/paginated_requests (page=1)
    Backend->>DB: get_paginated_requests (LIMIT/OFFSET)
    DB-->>Backend: rows + total
    Backend-->>UI: JSON paginated response
    UI->>UI: render PaginatedRequestList merging liveRequests for page 1
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

github

Poem

🐰 I counted lanes and rustled VRAM bright,

Charts hop up like carrots in the night,
Planners tuck models snug, avoiding a fight,
Paginated logs keep requests in sight,
Workernodes hum—my whiskers twitch with delight!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ui-integration

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
logos/src/logos/pipeline/correcting_scheduler.py (1)

41-50: 🛠️ Refactor suggestion | 🟠 Major

Add type annotation to the on_capacity_needed parameter.

Line 48 adds the on_capacity_needed parameter, which is used as an async callable at runtime (invoked as self._on_capacity_needed(provider_id, model_name) and passed to asyncio.create_task()), but it lacks a type hint. The base class BaseScheduler also lacks this annotation. Both must be typed to comply with the coding guideline: "Use type hints consistently throughout all function signatures and class attributes."

Suggested annotation
-from typing import List, Tuple, Optional
+from typing import Awaitable, Callable, List, Optional, Tuple
@@
-        on_capacity_needed=None,
-    ):
+        on_capacity_needed: Optional[Callable[[int, str], Awaitable[None]]] = None,
+    ) -> None:

Update both base_scheduler.py and correcting_scheduler.py to apply this annotation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/correcting_scheduler.py` around lines 41 - 50, The
new on_capacity_needed parameter is an async callable but lacks a type hint;
update the signatures for BaseScheduler.__init__ and
CorrectingScheduler.__init__ to declare on_capacity_needed:
Optional[Callable[[str, str], Awaitable[Any]]] (and import Optional, Callable,
Awaitable, Any from typing), and also annotate the instance attribute (e.g.,
self._on_capacity_needed) with the same type so tools and linters recognize it's
an async callback invoked as self._on_capacity_needed(provider_id, model_name)
and passed to asyncio.create_task().
🧹 Nitpick comments (10)
logos/tests/performance/plot_benchmark_distributions.py (1)

46-69: Please give the public CSV helpers repo-standard docs and types.

percentile, stats_block, load_records, and main are public helpers in logos/**/*.py, but they currently either lack the required docstrings or pass around raw dict/list shapes. A small TypedDict for benchmark rows would make the CSV contract much easier to maintain.

As per coding guidelines, logos/**/*.py: Include docstrings for all public functions explaining parameters, return values, and exceptions; Use type hints consistently throughout all function signatures and class attributes.

Also applies to: 397-415

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 46 -
69, Add public-facing docs and precise types: write docstrings for percentile,
stats_block, load_records, and main describing parameters, returns, and raised
errors; replace raw dict/list shapes with a TypedDict (e.g., BenchmarkRow) for
CSV rows and use concrete typing like list[BenchmarkRow] and dict[str, float] in
signatures; update stats_block to accept Sorted list[float] or sort internally
but keep types explicit; ensure percentile and stats_block have type hints for
inputs/outputs and document edge-case behavior (empty data returning 0.0) in
their docstrings.
logos/tests/performance/generate_coder_dual_model_workloads.py (1)

131-162: Please tighten the workload schema typing and docstrings on the new public helpers.

VARIANTS is still a tuple of bare dicts keyed by magic strings, and helpers like choose_priority, choose_mode, build_payload, and main are public but undocumented. A small TypedDict/dataclass for variants and rows would make this script safer to evolve.

As per coding guidelines, logos/**/*.py: Include docstrings for all public functions explaining parameters, return values, and exceptions; Use type hints consistently throughout all function signatures and class attributes.

Also applies to: 169-186, 415-425

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/generate_coder_dual_model_workloads.py` around lines
131 - 162, VARIANTS is an untyped tuple of plain dicts and several helpers
(choose_priority, choose_mode, build_payload, main) are public without
docstrings or full type hints; define a Variant TypedDict or dataclass (e.g.,
Variant with name: str, total_requests: int, counts_override: dict[str,int],
seed_suffix: str, layout: str, burst_min: int|None, burst_max: int|None,
intra_step_ms: tuple[int,int]|None, output_relpath: Path) and a Row
TypedDict/dataclass for generated rows, update VARIANTS to be typed as
tuple[Variant, ...], add explicit return and parameter type hints to
choose_priority, choose_mode, build_payload, main and any other public helpers,
and add docstrings to each public function describing parameters, return values
and raised exceptions so the module follows the coding guidelines and is safer
to evolve.
logos/src/logos/dbutils/dbmanager.py (1)

1312-1320: Add the repository-required return type and full docstring here.

This new public method still omits a return type annotation and the docstring does not describe parameters, return values, or failure modes.

♻️ Suggested shape
     def get_paginated_requests(
         self,
         logos_key: str,
         page: int = 1,
         per_page: int = 20,
-    ):
+    ) -> Tuple[Dict[str, Any], int]:
         """
-        Fetch paginated request logs with provider type for Cloud/Local classification.
+        Fetch paginated request logs for the authenticated process.
+
+        Args:
+            logos_key: API key of the owning process.
+            page: 1-based page number.
+            per_page: Number of rows per page, clamped to [1, 100].
+
+        Returns:
+            Tuple of response payload and HTTP status code.
+
+        Raises:
+            ValueError: If `page` or `per_page` cannot be converted to integers.
         """

As per coding guidelines "Include docstrings for all public functions explaining parameters, return values, and exceptions" and "Use type hints consistently throughout all function signatures and class attributes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/dbutils/dbmanager.py` around lines 1312 - 1320, Add an
explicit return type annotation (e.g., -> Tuple[List[dict], int]) to
get_paginated_requests and import any needed typing names (List, Dict, Tuple) so
the signature consistently uses type hints; expand the docstring to a full
Google/NumPy-style description that documents parameters (logos_key: str, page:
int, per_page: int), the precise return structure (list of request-log dicts and
an integer total/count or total pages), and possible failure modes/exceptions
raised (e.g., KeyError/ValueError/DB errors) so callers know what to expect.
logos/logos-ui/hooks/use-stats-websocket-v2.ts (1)

7-33: Finish the typing pass by removing the new any escape hatches.

loaded_models, scheduler_signals.models, and the requests payload are still typed as any, so downstream statistics components still lose most compile-time protection where shape drift is most likely.

Also applies to: 92-105

logos/logos-ui/components/statistics/lane-vram-pie.web.tsx (1)

36-45: Reuse the shared lane-state sort order.

This file orders sleeping before starting, while lane-metrics-panel.tsx does the opposite. Keeping two local state-order tables in sync will drift quickly and makes the pie legend jump relative to the panel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-ui/components/statistics/lane-vram-pie.web.tsx` around lines 36 -
45, The local stateOrder in lane-vram-pie.web.tsx is duplicating and diverging
from the canonical order used in lane-metrics-panel.tsx; replace the local
stateOrder with the shared canonical sort order (import the shared constant used
by lane-metrics-panel.tsx) and use that in the sortedLanes comparator so the
runtime_state ordering (e.g., starting vs sleeping) is consistent across
components; update the import and ensure sortedLanes still falls back to a high
value (e.g., 99) for unknown states and keeps the tiebreaker using
a.model.localeCompare(b.model).
logos/logos-ui/app/statistics.tsx (1)

1227-1230: Type assertion for devices field.

Using (provider as any).devices suggests the VramProviderPayload type is missing the devices property. Consider extending the type definition to include devices?: DeviceInfo[] to ensure type safety and avoid the as any cast.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-ui/app/statistics.tsx` around lines 1227 - 1230, The code uses
(provider as any).devices because VramProviderPayload lacks a devices property;
update the VramProviderPayload type to include an optional devices?:
DeviceInfo[] (or import/define DeviceInfo) and then replace casts by accessing
provider.devices directly in the statistics.tsx block that assigns
nextDevices[provider.name], removing the (provider as any) cast and ensuring any
usage of devices is type-guarded (Array.isArray(provider.devices)) or null-safe.
logos/logos-ui/components/statistics/paginated-request-list.tsx (1)

255-271: Error response may not be JSON.

If the server returns an error response (e.g., 500), resp.json() is called unconditionally, which will throw a different error if the body isn't valid JSON. Consider parsing JSON only for successful responses.

🔧 Proposed fix
 async function fetchPage(
   apiKey: string,
   page: number,
   perPage = 20
 ): Promise<PaginatedRequestResponse> {
   const resp = await fetch(`${API_BASE}/logosdb/paginated_requests`, {
     method: "POST",
     headers: {
       "Content-Type": "application/json",
       logos_key: apiKey,
       Authorization: `Bearer ${apiKey}`,
     },
     body: JSON.stringify({ page, per_page: perPage }),
   });
-  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
-  return resp.json();
+  if (!resp.ok) {
+    const text = await resp.text().catch(() => "");
+    throw new Error(`HTTP ${resp.status}${text ? `: ${text}` : ""}`);
+  }
+  return resp.json() as Promise<PaginatedRequestResponse>;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-ui/components/statistics/paginated-request-list.tsx` around lines
255 - 271, fetchPage currently calls resp.json() unconditionally which can throw
if the error response isn't JSON; change fetchPage so it only parses JSON for
successful responses (when resp.ok) and for non-OK responses read the response
body as text (or attempt json parse in a try/catch) and then throw a descriptive
Error that includes resp.status and the response body; update references in the
function (resp, API_BASE, PaginatedRequestResponse) so successful returns still
return parsed JSON while error paths surface the raw body/status instead of
causing a secondary parse error.
logos/src/logos/pipeline/base_scheduler.py (2)

294-300: Silent exception swallowing loses diagnostic information.

The try-except-pass block silently discards any exceptions from is_model_lane_ready(). While the optimistic continuation is appropriate for resilience, logging the exception would help diagnose issues in production.

🔧 Proposed fix
             try:
                 if not self._logosnode.is_model_lane_ready(model_id, provider_id):
                     continue
-            except Exception:
-                pass
+            except Exception:
+                logger.debug(
+                    "Failed to check lane readiness for model %s provider %s",
+                    model_id, provider_id, exc_info=True,
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/base_scheduler.py` around lines 294 - 300, The
try/except currently swallows all errors from
self._logosnode.is_model_lane_ready(model_id, provider_id); change the except to
capture the exception and log it (e.g., except Exception as e:
self._logger.exception(...) or self._logger.error(..., exc_info=True)) while
preserving the continue behavior so the loop remains resilient; update the block
around is_model_lane_ready to log the error with context (include model_id and
provider_id) instead of silently passing.

32-32: Missing type hint for on_capacity_needed callback.

The on_capacity_needed parameter lacks a type annotation. Per coding guidelines, type hints should be used consistently throughout function signatures.

🔧 Proposed fix
+from typing import Awaitable, Callable, Dict
+
 class BaseScheduler(SchedulerInterface):
     def __init__(
         self,
         queue_manager: PriorityQueueManager,
         logosnode_facade: LogosNodeSchedulingDataFacade,
         azure_facade: AzureSchedulingDataFacade,
         model_registry: Dict[tuple[int, int], str] | None = None,
-        on_capacity_needed=None,
+        on_capacity_needed: Callable[[int, str], Awaitable[None]] | None = None,
     ):

As per coding guidelines: "Use type hints consistently throughout all function signatures and class attributes"

Also applies to: 38-41

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/base_scheduler.py` at line 32, Add a type annotation
for the on_capacity_needed callback in the BaseScheduler signature (and the
other unlabeled params around lines 38-41) by importing typing types and
declaring it as Optional[Callable[[int], None]] (or Optional[Callable[[int],
Awaitable[None]]] if the callback can be async); update the function/class
signature that defines on_capacity_needed to use that type (e.g.,
on_capacity_needed: Optional[Callable[[int], None]] = None) and ensure
typing.Optional and typing.Callable (and typing.Awaitable if needed) are
imported at the top of the module.
logos/src/logos/capacity/capacity_planner.py (1)

1136-1143: Consider logging Prometheus metric failures instead of silently ignoring.

The bare except Exception: pass silently swallows all errors when recording switch metrics. While defensive coding for non-critical metrics is reasonable, completely silent failures can hide configuration issues or bugs in the metrics module.

♻️ Suggested improvement
             try:
                 from logos.monitoring import prometheus_metrics as prom
                 prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc()
                 if len(self._switch_timestamps) >= 2:
                     gap = self._switch_timestamps[-1] - self._switch_timestamps[-2]
                     prom.CAPACITY_PLANNER_SWITCH_GAP_SECONDS.observe(gap)
-            except Exception:
-                pass
+            except Exception:
+                logger.debug("Failed to record switch metrics", exc_info=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 1136 - 1143, The
current try/except around prometheus updates in capacity_planner.py silently
swallows all errors; replace the bare except with logging of the exception so
metric recording failures are visible. Catch Exception as e around the block
that updates prom.CAPACITY_PLANNER_SWITCHES_TOTAL and
prom.CAPACITY_PLANNER_SWITCH_GAP_SECONDS (using self._switch_timestamps for the
gap calculation), and call the module logger (or the existing logger used in
this file) to emit an error/ warning message including the exception details and
context (e.g., "Failed to record capacity planner prometheus metrics") so
failures are visible but do not propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logos/docker-compose.dev.yaml`:
- Around line 196-199: Remove the hard-coded LOGOS_API_KEY value from the
docker-compose environment block and replace it with a reference to an external
secret or environment variable provider (e.g., use LOGOS_API_KEY from an .env
file or Docker secret) so the key is not checked into VCS; update the
environment entry for LOGOS_API_KEY to pull from the external source and add a
note in docs/README (or .env.example) describing how to supply/rotate the worker
API key and ensure the compromised key is removed from the repository history
and rotated.

In `@logos/logos-ui/components/statistics/lane-metrics-panel.tsx`:
- Around line 160-163: The current useMemo for computing `lanes` only falls back
when `selectedProvider` is null/undefined; update the provider selection logic
in the `useMemo` so that `providerName` is set to `selectedProvider` only if
`selectedProvider` exists as a key in `lanesByProvider`, otherwise pick the
first key from `Object.keys(lanesByProvider)` (or return [] if none); adjust
references to `providerName` in the `lanes` computation so stale
`selectedProvider` values won't produce an empty state when other providers have
data.

In `@logos/logos-ui/components/statistics/plotly-vram-chart.web.tsx`:
- Around line 384-407: The chart mixes provider freeGb (provider series) with
per-lane effective_vram_mb (lane traces) on the same y-axis causing misleading
scaling; update the plotting so these are comparable by either converting the
provider trace in traces to used VRAM (so traces and laneTraces share the same
unit) or move laneTraces onto a secondary y-axis: locate laneTraces and
allTraces in plotly-vram-chart.web.tsx and either (A) change the provider series
data generation (the trace object(s) inside traces) to compute used_vram_mb/GB
instead of freeGb, or (B) add yaxis: "y2" (and set layout.yaxis2 properties) to
each laneTraces entry so lane traces use a separate axis while keeping traces on
the primary axis; ensure hovertemplate units match the chosen axis.

In `@logos/logos-workernode/logos_worker_node/lane_manager.py`:
- Around line 1484-1489: The current logic sets num_parallel to 0 when vLLM's
handle.max_concurrency is missing/zero, which can temporarily mark a running
vLLM lane as having no capacity; change the fallback so that when
getattr(handle, "max_concurrency", None) is None or <= 0 you use lc.num_parallel
(or the previously known/default capacity) instead of 0. Update the block around
lc.vllm / handle.max_concurrency so: if vllm_max is a positive int use it,
otherwise set num_parallel = lc.num_parallel (preserving existing non-vLLM
behavior).

In `@logos/logos-workernode/logos_worker_node/model_profiles.py`:
- Around line 303-317: The calibrated base residency must be invalidated when
tensor-parallelism changes: update the logic around tp_changed in
model_profiles.py so that when tp_changed is true you clear
profile.base_residency_mb and reset profile.residency_source (e.g., to None or
"unknown") before any runtime measurement logic (this affects tp_changed,
profile.base_residency_mb, profile.residency_source, loaded_vram_mb,
sleeping_residual_mb and the estimate_vram_mb() behavior); alternatively
implement calibration keyed by (model_name, tensor_parallel_size) so calibrated
values are stored per-TP variant rather than reused across tp_changed events.

In `@logos/logos-workernode/logos_worker_node/vllm_process.py`:
- Around line 1009-1016: The issue is that _max_concurrency is only parsed once
in _stream_logs() when it's None but never reset on respawn; clear
self._max_concurrency (set to None) at the start of spawn() before creating the
new subprocess so the startup log from the new vLLM process will be re-parsed
and the lane_manager gets the updated concurrency for self.lane_id.

In `@logos/src/logos/dbutils/dbmanager.py`:
- Around line 1381-1383: The is_cloud computation treats only "logosnode" and
"ollama" as local; normalize legacy local provider types before checking by
lowercasing pt and mapping legacy values ("node", "node_controller",
"logos_worker_node") to a canonical local type (or include them in the local
set) so that is_cloud = pt not in ("logosnode", "ollama", "") correctly
classifies legacy rows; update the code around the pt/is_cloud computation in
dbmanager.py (the block that builds results.append) to convert or check these
legacy names as local.
- Around line 1371-1372: The ORDER BY clause that currently sorts only by
le.timestamp_request (ORDER BY le.timestamp_request DESC NULLS LAST LIMIT
:per_page OFFSET :offset) is non-deterministic for rows with identical
timestamps; update that ORDER BY to include a unique tiebreaker such as le.id
DESC (e.g., ORDER BY le.timestamp_request DESC NULLS LAST, le.id DESC) so
pagination via LIMIT/OFFSET is stable and pages don't skip or duplicate rows;
modify the SQL string where the ORDER BY is assembled in dbmanager.py to add the
secondary key (le.id) accordingly.

In `@logos/src/logos/main.py`:
- Around line 328-343: The memory_* fields in the provider_signals["devices"]
mapping are converted with float(...) which will raise on non-numeric
worker-supplied payloads; replace those float(...) conversions with the existing
helper _safe_float (use _safe_float(d.get("memory_used_mb")),
_safe_float(d.get("memory_total_mb")), _safe_float(d.get("memory_free_mb"))),
and update the other duplicated comprehension site that constructs devices (the
second similar block) to use _safe_float there as well so both call sites
consistently sanitize numeric fields.
- Around line 3281-3302: The paginated_requests_endpoint currently casts
page/per_page with int(...) which raises on invalid inputs; replace the ad-hoc
casting with a Pydantic request model from src/logos/dbutils/dbrequest.py (e.g.,
a model that defines page:int and per_page:int with defaults) and use it to
validate the incoming body (parse via model.parse_obj or model(**body)) after
reading JSON, and if validation fails raise
fastapi.HTTPException(status_code=400, detail=errors) so clients get a 400
instead of a 500; keep the call to DBManager().get_paginated_requests(logos_key,
page=..., per_page=...) unchanged.

In `@logos/src/logos/pipeline/context_resolver.py`:
- Around line 122-136: The comment and log around the retry loop for selecting a
lane are incorrect: update the comment near the loop to state the loop can wait
up to ~30s (10×1s + 10×2s) instead of “~10s”, and change the logger call in the
block that handles a found lane (inside the retry loop that calls
_logosnode_registry.select_lane_for_model(provider_id, model_name)) to compute
and log the real elapsed seconds using the retry index and the 1s/2s backoff
pattern (e.g., calculate elapsed_seconds = (attempt + 1) if attempt < 10 else 10
+ (attempt - 9) * 2) and log elapsed_seconds rather than attempt + 1 so the
message reflects actual wait time.

In `@logos/src/logos/pipeline/correcting_scheduler.py`:
- Around line 210-216: The detached asyncio.create_task created in the block
that checks self._on_capacity_needed and provider_type == "logosnode" can be
garbage-collected before it runs; keep a strong reference by adding the Task to
a short-lived set (e.g., self._pending_capacity_tasks), and register
task.add_done_callback(lambda t: self._pending_capacity_tasks.discard(t)) so
completed tasks are removed; locate the call site around
self._on_capacity_needed and self._logosnode.get_model_name(model_id,
provider_id) and replace the bare asyncio.create_task(...) with creating the
Task, adding it to the set, and attaching the cleanup callback to ensure the
wake-up callback runs reliably.

In `@logos/src/logos/pipeline/scheduler_interface.py`:
- Around line 56-60: The field slot_transferred on the scheduler request/slot
object is defaulting to True and thus misclassifies normal dispatches as
transferred slots; change the default to False on the declaration in
scheduler_interface.py (slot_transferred: bool = False) and then explicitly set
slot_transferred=True only at the release/reuse transfer call sites in
BaseScheduler (e.g., in base_scheduler.py where reuse_slot or release paths
perform slot handoff) so normal construction paths do not suppress active-count
increments.

In `@logos/tests/performance/generate_coder_dual_model_workloads.py`:
- Around line 342-355: The current build_poisson_offsets function biases
arrivals into the last millisecond by using an exponential walk with clamping;
instead, generate the arrival times as the sorted order statistics of uniform
samples across the window: draw total_requests samples via rng.random() *
duration_ms, sort them, convert to ints and clamp each to the range [0,
duration_ms - 1] (or use min(int(t), duration_ms - 1)), and return that list;
update the body of build_poisson_offsets accordingly so it produces sorted
uniform offsets rather than an exponential walk with tail-clamping.

In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Around line 448-457: The timeline code is discarding each record's recorded
arrival_offset and forcing offsets by index; update the block that builds
arrival_offsets/sets r["_offset_s"] (handling ok_records and r entries) to
prefer the existing r.get("arrival_offset") (convert to seconds if it's in ms)
and set r["_offset_s"] from that when present; only fall back to computing from
queue_wait_ms + processing_ms or the sequential index (i * 0.5) when
arrival_offset is missing, so burst spacing and scheduler timing are preserved
in the plotted timeline.
- Line 28: Remove the unused import "matplotlib.ticker as ticker" (delete that
import), stop capturing unused outputs from the histogram calls by changing any
assignments that unpack plt.hist(...) into unused placeholders or by calling
plt.hist(...) without assignment, and remove the stray f-prefix from the plain
string literal (the f'...' that has no interpolation) so it is a normal string;
target the occurrences around the plt.hist calls and the f-prefixed string in
this script to fix the lint errors.

---

Outside diff comments:
In `@logos/src/logos/pipeline/correcting_scheduler.py`:
- Around line 41-50: The new on_capacity_needed parameter is an async callable
but lacks a type hint; update the signatures for BaseScheduler.__init__ and
CorrectingScheduler.__init__ to declare on_capacity_needed:
Optional[Callable[[str, str], Awaitable[Any]]] (and import Optional, Callable,
Awaitable, Any from typing), and also annotate the instance attribute (e.g.,
self._on_capacity_needed) with the same type so tools and linters recognize it's
an async callback invoked as self._on_capacity_needed(provider_id, model_name)
and passed to asyncio.create_task().

---

Nitpick comments:
In `@logos/logos-ui/app/statistics.tsx`:
- Around line 1227-1230: The code uses (provider as any).devices because
VramProviderPayload lacks a devices property; update the VramProviderPayload
type to include an optional devices?: DeviceInfo[] (or import/define DeviceInfo)
and then replace casts by accessing provider.devices directly in the
statistics.tsx block that assigns nextDevices[provider.name], removing the
(provider as any) cast and ensuring any usage of devices is type-guarded
(Array.isArray(provider.devices)) or null-safe.

In `@logos/logos-ui/components/statistics/lane-vram-pie.web.tsx`:
- Around line 36-45: The local stateOrder in lane-vram-pie.web.tsx is
duplicating and diverging from the canonical order used in
lane-metrics-panel.tsx; replace the local stateOrder with the shared canonical
sort order (import the shared constant used by lane-metrics-panel.tsx) and use
that in the sortedLanes comparator so the runtime_state ordering (e.g., starting
vs sleeping) is consistent across components; update the import and ensure
sortedLanes still falls back to a high value (e.g., 99) for unknown states and
keeps the tiebreaker using a.model.localeCompare(b.model).

In `@logos/logos-ui/components/statistics/paginated-request-list.tsx`:
- Around line 255-271: fetchPage currently calls resp.json() unconditionally
which can throw if the error response isn't JSON; change fetchPage so it only
parses JSON for successful responses (when resp.ok) and for non-OK responses
read the response body as text (or attempt json parse in a try/catch) and then
throw a descriptive Error that includes resp.status and the response body;
update references in the function (resp, API_BASE, PaginatedRequestResponse) so
successful returns still return parsed JSON while error paths surface the raw
body/status instead of causing a secondary parse error.

In `@logos/src/logos/capacity/capacity_planner.py`:
- Around line 1136-1143: The current try/except around prometheus updates in
capacity_planner.py silently swallows all errors; replace the bare except with
logging of the exception so metric recording failures are visible. Catch
Exception as e around the block that updates
prom.CAPACITY_PLANNER_SWITCHES_TOTAL and
prom.CAPACITY_PLANNER_SWITCH_GAP_SECONDS (using self._switch_timestamps for the
gap calculation), and call the module logger (or the existing logger used in
this file) to emit an error/ warning message including the exception details and
context (e.g., "Failed to record capacity planner prometheus metrics") so
failures are visible but do not propagate.

In `@logos/src/logos/dbutils/dbmanager.py`:
- Around line 1312-1320: Add an explicit return type annotation (e.g., ->
Tuple[List[dict], int]) to get_paginated_requests and import any needed typing
names (List, Dict, Tuple) so the signature consistently uses type hints; expand
the docstring to a full Google/NumPy-style description that documents parameters
(logos_key: str, page: int, per_page: int), the precise return structure (list
of request-log dicts and an integer total/count or total pages), and possible
failure modes/exceptions raised (e.g., KeyError/ValueError/DB errors) so callers
know what to expect.

In `@logos/src/logos/pipeline/base_scheduler.py`:
- Around line 294-300: The try/except currently swallows all errors from
self._logosnode.is_model_lane_ready(model_id, provider_id); change the except to
capture the exception and log it (e.g., except Exception as e:
self._logger.exception(...) or self._logger.error(..., exc_info=True)) while
preserving the continue behavior so the loop remains resilient; update the block
around is_model_lane_ready to log the error with context (include model_id and
provider_id) instead of silently passing.
- Line 32: Add a type annotation for the on_capacity_needed callback in the
BaseScheduler signature (and the other unlabeled params around lines 38-41) by
importing typing types and declaring it as Optional[Callable[[int], None]] (or
Optional[Callable[[int], Awaitable[None]]] if the callback can be async); update
the function/class signature that defines on_capacity_needed to use that type
(e.g., on_capacity_needed: Optional[Callable[[int], None]] = None) and ensure
typing.Optional and typing.Callable (and typing.Awaitable if needed) are
imported at the top of the module.

In `@logos/tests/performance/generate_coder_dual_model_workloads.py`:
- Around line 131-162: VARIANTS is an untyped tuple of plain dicts and several
helpers (choose_priority, choose_mode, build_payload, main) are public without
docstrings or full type hints; define a Variant TypedDict or dataclass (e.g.,
Variant with name: str, total_requests: int, counts_override: dict[str,int],
seed_suffix: str, layout: str, burst_min: int|None, burst_max: int|None,
intra_step_ms: tuple[int,int]|None, output_relpath: Path) and a Row
TypedDict/dataclass for generated rows, update VARIANTS to be typed as
tuple[Variant, ...], add explicit return and parameter type hints to
choose_priority, choose_mode, build_payload, main and any other public helpers,
and add docstrings to each public function describing parameters, return values
and raised exceptions so the module follows the coding guidelines and is safer
to evolve.

In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Around line 46-69: Add public-facing docs and precise types: write docstrings
for percentile, stats_block, load_records, and main describing parameters,
returns, and raised errors; replace raw dict/list shapes with a TypedDict (e.g.,
BenchmarkRow) for CSV rows and use concrete typing like list[BenchmarkRow] and
dict[str, float] in signatures; update stats_block to accept Sorted list[float]
or sort internally but keep types explicit; ensure percentile and stats_block
have type hints for inputs/outputs and document edge-case behavior (empty data
returning 0.0) in their docstrings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e78a1785-698b-4da9-a9c0-b80b68a03451

📥 Commits

Reviewing files that changed from the base of the PR and between 65b7618 and f57d274.

⛔ Files ignored due to path filters (3)
  • logos/tests/performance/workloads/explicit/10m/workload_coder2_big_bursty_2400_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_coder2_big_bursty_600_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_coder2_fully_random_600_10m.csv is excluded by !**/*.csv
📒 Files selected for processing (33)
  • logos/Dockerfile
  • logos/docker-compose.dev.yaml
  • logos/logos-ui/app/statistics.tsx
  • logos/logos-ui/components/statistics/chart-card.tsx
  • logos/logos-ui/components/statistics/constants.ts
  • logos/logos-ui/components/statistics/lane-metrics-panel.tsx
  • logos/logos-ui/components/statistics/lane-vram-pie.tsx
  • logos/logos-ui/components/statistics/lane-vram-pie.web.tsx
  • logos/logos-ui/components/statistics/paginated-request-list.tsx
  • logos/logos-ui/components/statistics/plotly-pie-chart.tsx
  • logos/logos-ui/components/statistics/plotly-vram-chart.tsx
  • logos/logos-ui/components/statistics/plotly-vram-chart.web.tsx
  • logos/logos-ui/components/statistics/types.ts
  • logos/logos-ui/components/statistics/worker-gpu-panel.tsx
  • logos/logos-ui/hooks/use-stats-websocket-v2.ts
  • logos/logos-ui/tsconfig.json
  • logos/logos-workernode/config.yml
  • logos/logos-workernode/docker-compose.dev.yml
  • logos/logos-workernode/logos_worker_node/lane_manager.py
  • logos/logos-workernode/logos_worker_node/model_profiles.py
  • logos/logos-workernode/logos_worker_node/vllm_process.py
  • logos/src/logos/capacity/capacity_planner.py
  • logos/src/logos/dbutils/dbmanager.py
  • logos/src/logos/main.py
  • logos/src/logos/monitoring/prometheus_metrics.py
  • logos/src/logos/pipeline/base_scheduler.py
  • logos/src/logos/pipeline/context_resolver.py
  • logos/src/logos/pipeline/correcting_scheduler.py
  • logos/src/logos/pipeline/scheduler_interface.py
  • logos/src/logos/sdi/logosnode_facade.py
  • logos/src/logos/sdi/providers/logosnode_provider.py
  • logos/tests/performance/generate_coder_dual_model_workloads.py
  • logos/tests/performance/plot_benchmark_distributions.py

Comment on lines +196 to +199
environment:
LOGOS_URL: "http://logos-server:8080"
LOGOS_API_KEY: "DzlgIX6bEhjZexiVUgo0DuG575-6PcWOXoYvUq1xwUJ59pLG6_xvXxqWOtx9h4Lo"
LOGOS_ALLOW_INSECURE_HTTP: "true"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the checked-in worker API key.

Line 198 stores a concrete LOGOS_API_KEY in version control. That key is now compromised and can be reused to impersonate the dev worker until it is rotated.

🔐 Proposed fix
-      LOGOS_API_KEY: "DzlgIX6bEhjZexiVUgo0DuG575-6PcWOXoYvUq1xwUJ59pLG6_xvXxqWOtx9h4Lo"
+      LOGOS_API_KEY: "${LOGOS_WORKERNODE_API_KEY:?set LOGOS_WORKERNODE_API_KEY in .env}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
environment:
LOGOS_URL: "http://logos-server:8080"
LOGOS_API_KEY: "DzlgIX6bEhjZexiVUgo0DuG575-6PcWOXoYvUq1xwUJ59pLG6_xvXxqWOtx9h4Lo"
LOGOS_ALLOW_INSECURE_HTTP: "true"
environment:
LOGOS_URL: "http://logos-server:8080"
LOGOS_API_KEY: "${LOGOS_WORKERNODE_API_KEY:?set LOGOS_WORKERNODE_API_KEY in .env}"
LOGOS_ALLOW_INSECURE_HTTP: "true"
🧰 Tools
🪛 Betterleaks (1.1.1)

[high] 198-198: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/docker-compose.dev.yaml` around lines 196 - 199, Remove the hard-coded
LOGOS_API_KEY value from the docker-compose environment block and replace it
with a reference to an external secret or environment variable provider (e.g.,
use LOGOS_API_KEY from an .env file or Docker secret) so the key is not checked
into VCS; update the environment entry for LOGOS_API_KEY to pull from the
external source and add a note in docs/README (or .env.example) describing how
to supply/rotate the worker API key and ensure the compromised key is removed
from the repository history and rotated.

Comment on lines +160 to +163
const lanes = useMemo(() => {
const providerName = selectedProvider ?? Object.keys(lanesByProvider)[0];
if (!providerName) return [];
const lanesForProvider = lanesByProvider[providerName] ?? {};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fallback when the selected provider disappears.

Line 161 only falls back for null/undefined. If selectedProvider is a stale string that is no longer present in lanesByProvider, this panel renders the empty state even though another provider still has lane data.

🔁 Suggested fallback
-    const providerName = selectedProvider ?? Object.keys(lanesByProvider)[0];
+    const providerName =
+      selectedProvider && lanesByProvider[selectedProvider]
+        ? selectedProvider
+        : Object.keys(lanesByProvider)[0];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const lanes = useMemo(() => {
const providerName = selectedProvider ?? Object.keys(lanesByProvider)[0];
if (!providerName) return [];
const lanesForProvider = lanesByProvider[providerName] ?? {};
const lanes = useMemo(() => {
const providerName =
selectedProvider && lanesByProvider[selectedProvider]
? selectedProvider
: Object.keys(lanesByProvider)[0];
if (!providerName) return [];
const lanesForProvider = lanesByProvider[providerName] ?? {};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-ui/components/statistics/lane-metrics-panel.tsx` around lines 160
- 163, The current useMemo for computing `lanes` only falls back when
`selectedProvider` is null/undefined; update the provider selection logic in the
`useMemo` so that `providerName` is set to `selectedProvider` only if
`selectedProvider` exists as a key in `lanesByProvider`, otherwise pick the
first key from `Object.keys(lanesByProvider)` (or return [] if none); adjust
references to `providerName` in the `lanes` computation so stale
`selectedProvider` values won't produce an empty state when other providers have
data.

Comment on lines +384 to +407
/* ── Lane traces (dashed, hidden by default) ───────────────────────── */
const laneTraces = useMemo(
() =>
laneEntries.map((entry) => ({
type: "scattergl" as const,
mode: "lines" as const,
name: `${entry.laneId} (${entry.modelName})`,
x: entry.points.map((pt) => new Date(pt.ts)),
y: entry.points.map((pt) => pt.vramMb / 1024),
line: { color: entry.color, width: 1.6, dash: "dot" as const },
connectgaps: false,
visible: "legendonly" as const,
customdata: entry.points.map((pt) => [entry.runtimeState, entry.modelName, pt.vramMb]),
hovertemplate:
"Lane: <b>%{fullData.name}</b>" +
"<br>State: %{customdata[0]}" +
"<br>VRAM: %{customdata[2]:.0f} MB (%{y:.2f} GB)" +
"<extra></extra>",
})),
[laneEntries],
);

/* ── Combined traces ────────────────────────────────────────────────── */
const allTraces = useMemo(() => [...traces, ...laneTraces], [traces, laneTraces]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't plot free-memory and per-lane used-memory on the same y-axis.

Lines 361-362 still plot provider freeGb, but Lines 391-392 add lane effective_vram_mb, and Line 505 renames the axis to VRAM (GB). Those are different quantities, so the chart becomes misleading, and the auto-fit logic at Lines 578-582 can clip the lane traces completely on low-free-memory systems. Either switch the provider series to used VRAM as well, or move the lane traces onto a separate y-axis.

Also applies to: 505-505

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-ui/components/statistics/plotly-vram-chart.web.tsx` around lines
384 - 407, The chart mixes provider freeGb (provider series) with per-lane
effective_vram_mb (lane traces) on the same y-axis causing misleading scaling;
update the plotting so these are comparable by either converting the provider
trace in traces to used VRAM (so traces and laneTraces share the same unit) or
move laneTraces onto a secondary y-axis: locate laneTraces and allTraces in
plotly-vram-chart.web.tsx and either (A) change the provider series data
generation (the trace object(s) inside traces) to compute used_vram_mb/GB
instead of freeGb, or (B) add yaxis: "y2" (and set layout.yaxis2 properties) to
each laneTraces entry so lane traces use a separate axis while keeping traces on
the primary axis; ensure hovertemplate units match the chosen axis.

Comment on lines +1484 to +1489
if lc.vllm:
# Use vLLM-reported max concurrency (KV-budget-derived) when available.
vllm_max = getattr(handle, "max_concurrency", None)
num_parallel = vllm_max if vllm_max and vllm_max > 0 else 0
else:
num_parallel = lc.num_parallel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback to num_parallel = 0 can temporarily disable viable vLLM lanes.

Because max_concurrency is populated asynchronously, Line 1487 can report zero concurrency for running lanes. That can propagate as “no capacity” even when the lane is usable.

Proposed fix
             if lc.vllm:
                 # Use vLLM-reported max concurrency (KV-budget-derived) when available.
                 vllm_max = getattr(handle, "max_concurrency", None)
-                num_parallel = vllm_max if vllm_max and vllm_max > 0 else 0
+                if isinstance(vllm_max, int) and vllm_max > 0:
+                    num_parallel = vllm_max
+                else:
+                    # Conservative non-zero fallback until vLLM reports runtime value.
+                    num_parallel = max(1, int(lc.num_parallel or 1))
             else:
                 num_parallel = lc.num_parallel
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-workernode/logos_worker_node/lane_manager.py` around lines 1484 -
1489, The current logic sets num_parallel to 0 when vLLM's
handle.max_concurrency is missing/zero, which can temporarily mark a running
vLLM lane as having no capacity; change the fallback so that when
getattr(handle, "max_concurrency", None) is None or <= 0 you use lc.num_parallel
(or the previously known/default capacity) instead of 0. Update the block around
lc.vllm / handle.max_concurrency so: if vllm_max is a positive int use it,
otherwise set num_parallel = lc.num_parallel (preserving existing non-vLLM
behavior).

Comment on lines 303 to +317
if engine == "vllm" and kv_cache_sent_mb > 0:
measured_base = max(effective_vram_mb - kv_cache_sent_mb, 0.0)
if measured_base > 0:
profile.base_residency_mb = _ema(profile.base_residency_mb, measured_base)
profile.residency_source = "measured"
# Never let runtime measurements overwrite a calibrated
# base_residency — calibration measures on a clean GPU and
# is authoritative. Runtime measurements can be lower when
# multiple models share GPU memory.
if profile.residency_source == "calibrated":
pass # keep calibrated value
elif tp_changed or profile.base_residency_mb is None:
profile.base_residency_mb = measured_base
profile.residency_source = "measured"
else:
profile.base_residency_mb = _ema(profile.base_residency_mb, measured_base)
profile.residency_source = "measured"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TP changes must invalidate calibrated residency too.

When tp_changed is true, you reset loaded_vram_mb and sleeping_residual_mb, but this branch still preserves an old "calibrated" base_residency_mb. After the TP switch, estimate_vram_mb() will keep returning the previous footprint, so placement can under-estimate VRAM and over-pack the GPU.

📏 Suggested fix
-                    if profile.residency_source == "calibrated":
-                        pass  # keep calibrated value
-                    elif tp_changed or profile.base_residency_mb is None:
+                    if tp_changed:
+                        profile.base_residency_mb = measured_base
+                        profile.residency_source = "measured"
+                    elif profile.residency_source == "calibrated":
+                        pass
+                    elif profile.base_residency_mb is None:
                         profile.base_residency_mb = measured_base
                         profile.residency_source = "measured"

If calibrated values need to survive across TP variants, they should be keyed by (model_name, tensor_parallel_size) instead of being carried over unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-workernode/logos_worker_node/model_profiles.py` around lines 303
- 317, The calibrated base residency must be invalidated when tensor-parallelism
changes: update the logic around tp_changed in model_profiles.py so that when
tp_changed is true you clear profile.base_residency_mb and reset
profile.residency_source (e.g., to None or "unknown") before any runtime
measurement logic (this affects tp_changed, profile.base_residency_mb,
profile.residency_source, loaded_vram_mb, sleeping_residual_mb and the
estimate_vram_mb() behavior); alternatively implement calibration keyed by
(model_name, tensor_parallel_size) so calibrated values are stored per-TP
variant rather than reused across tp_changed events.

Comment on lines +210 to +216
if self._on_capacity_needed and provider_type == "logosnode":
model_name = self._logosnode.get_model_name(model_id, provider_id)
if model_name:
asyncio.create_task(
self._on_capacity_needed(provider_id, model_name),
name=f"capacity-needed-{model_name}-{provider_id}",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify this create_task call has no retention/discard pattern in the scheduler path.
sed -n '208,216p' logos/src/logos/pipeline/correcting_scheduler.py
rg -n -C2 '_background_tasks|add_done_callback|create_task' logos/src/logos/pipeline

Repository: ls1intum/edutelligence

Length of output: 1134


🏁 Script executed:

#!/bin/bash
# Get the full class definition and __init__ method
head -100 logos/src/logos/pipeline/correcting_scheduler.py | cat -n

# Also check what BaseScheduler looks like
rg -n "class BaseScheduler" logos/ -A 20

Repository: ls1intum/edutelligence

Length of output: 6086


🏁 Script executed:

#!/bin/bash
# Check if there are any existing task retention patterns in the codebase
rg -n "_background_tasks|_tasks" logos/src/logos/pipeline/ -B 2 -A 2

Repository: ls1intum/edutelligence

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the broader context around the create_task call
sed -n '1,50p' logos/src/logos/pipeline/correcting_scheduler.py | cat -n
sed -n '100,220p' logos/src/logos/pipeline/correcting_scheduler.py | cat -n

Repository: ls1intum/edutelligence

Length of output: 8198


Retain the wake-up task.

Lines 114-117 create a detached task but never keep a strong reference to it. Python's asyncio garbage-collects unreferenced tasks before completion. If that task is dropped, the _on_capacity_needed callback will never fire, and the queued request can silently fall back to the slow 30-second background wake cycle this code is intended to avoid.

Store the task in a set and use add_done_callback() to clean it up:

Suggested fix
 class ClassificationCorrectingScheduler(BaseScheduler):
     def __init__(
         self,
         queue_manager,
         logosnode_facade,
         azure_facade,
         model_registry=None,
         ettft_enabled: bool = True,
         on_capacity_needed=None,
     ):
         super().__init__(queue_manager, logosnode_facade, azure_facade, model_registry, on_capacity_needed)
         self._ettft_enabled = ettft_enabled
+        self._background_tasks: set[asyncio.Task[None]] = set()
 
     async def _queue_and_wait(self, best_scored: tuple, request: SchedulingRequest) -> Optional[SchedulingResult]:
         ...
         if self._on_capacity_needed and provider_type == "logosnode":
             model_name = self._logosnode.get_model_name(model_id, provider_id)
             if model_name:
-                asyncio.create_task(
+                task = asyncio.create_task(
                     self._on_capacity_needed(provider_id, model_name),
                     name=f"capacity-needed-{model_name}-{provider_id}",
                 )
+                self._background_tasks.add(task)
+                task.add_done_callback(self._background_tasks.discard)
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 213-216: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/correcting_scheduler.py` around lines 210 - 216, The
detached asyncio.create_task created in the block that checks
self._on_capacity_needed and provider_type == "logosnode" can be
garbage-collected before it runs; keep a strong reference by adding the Task to
a short-lived set (e.g., self._pending_capacity_tasks), and register
task.add_done_callback(lambda t: self._pending_capacity_tasks.discard(t)) so
completed tasks are removed; locate the call site around
self._on_capacity_needed and self._logosnode.get_model_name(model_id,
provider_id) and replace the bare asyncio.create_task(...) with creating the
Task, adding it to the set, and attaching the cleanup callback to ensure the
wake-up callback runs reliably.

Comment on lines +56 to +60
# True when capacity slot was transferred from a completing request
# (release path with reuse_slot=True). False when dispatched fresh
# (reevaluate_model_queues after load/wake). Controls whether
# on_request_begin_processing should increment the active count.
slot_transferred: bool = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Defaulting slot_transferred to True can misclassify normal schedules.

Line 60 creates a silent behavior change: constructors that don’t set this field (e.g., the provided base_scheduler.py construction paths) now look like slot transfers, which can suppress active-count increments.

Proposed fix
-    slot_transferred: bool = True
+    slot_transferred: bool = False

Then set slot_transferred=True only in explicit release/reuse transfer paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/scheduler_interface.py` around lines 56 - 60, The
field slot_transferred on the scheduler request/slot object is defaulting to
True and thus misclassifies normal dispatches as transferred slots; change the
default to False on the declaration in scheduler_interface.py (slot_transferred:
bool = False) and then explicitly set slot_transferred=True only at the
release/reuse transfer call sites in BaseScheduler (e.g., in base_scheduler.py
where reuse_slot or release paths perform slot handoff) so normal construction
paths do not suppress active-count increments.

Comment on lines +342 to +355
def build_poisson_offsets(
duration_ms: int,
total_requests: int,
rng: random.Random,
) -> list[int]:
"""Exponential (memoryless) inter-arrival times, capped to the window."""
mean_gap_ms = duration_ms / total_requests
offsets: list[int] = []
t = 0.0
for _ in range(total_requests):
# Exponential inter-arrival: -mean * ln(U), U ~ Uniform(0, 1)
t += -mean_gap_ms * math.log(max(rng.random(), 1e-12))
offsets.append(min(int(t), duration_ms - 1))
return sorted(offsets)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The fixed-size “Poisson” generator currently biases traffic into the last millisecond.

Here total_requests is fixed up front, so the arrival times should be the sorted order statistics of uniform samples over the window. The current exponential walk plus clamping to duration_ms - 1 will bunch any overshoot tail at the end, which changes the workload shape you are trying to benchmark.

💡 Suggested fix
 def build_poisson_offsets(
     duration_ms: int,
     total_requests: int,
     rng: random.Random,
 ) -> list[int]:
-    """Exponential (memoryless) inter-arrival times, capped to the window."""
-    mean_gap_ms = duration_ms / total_requests
-    offsets: list[int] = []
-    t = 0.0
-    for _ in range(total_requests):
-        # Exponential inter-arrival: -mean * ln(U), U ~ Uniform(0, 1)
-        t += -mean_gap_ms * math.log(max(rng.random(), 1e-12))
-        offsets.append(min(int(t), duration_ms - 1))
-    return sorted(offsets)
+    """Arrival times for a fixed-size Poisson sample over the benchmark window."""
+    if total_requests <= 0:
+        return []
+    return sorted(rng.randrange(duration_ms) for _ in range(total_requests))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/generate_coder_dual_model_workloads.py` around lines
342 - 355, The current build_poisson_offsets function biases arrivals into the
last millisecond by using an exponential walk with clamping; instead, generate
the arrival times as the sorted order statistics of uniform samples across the
window: draw total_requests samples via rng.random() * duration_ms, sort them,
convert to ints and clamp each to the range [0, duration_ms - 1] (or use
min(int(t), duration_ms - 1)), and return that list; update the body of
build_poisson_offsets accordingly so it produces sorted uniform offsets rather
than an exponential walk with tail-clamping.

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This new script still has a few lint-blocking leftovers.

Line 28 imports matplotlib.ticker but never uses it, Line 114 keeps two unused histogram outputs, and Line 530 has an f prefix with no interpolation. If Ruff/Flake8 are part of CI, this file will fail as written.

Also applies to: 114-114, 530-530

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 28-28: 'matplotlib.ticker' imported but unused

(F401)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` at line 28, Remove
the unused import "matplotlib.ticker as ticker" (delete that import), stop
capturing unused outputs from the histogram calls by changing any assignments
that unpack plt.hist(...) into unused placeholders or by calling plt.hist(...)
without assignment, and remove the stray f-prefix from the plain string literal
(the f'...' that has no interpolation) so it is a normal string; target the
occurrences around the plt.hist calls and the f-prefixed string in this script
to fix the lint errors.

Comment on lines +448 to +457
# Compute time offset for timeline
arrival_offsets = []
for r in ok_records:
rid = r.get("request_id", "")
# Try to extract from workload; fallback to sequential index
arrival_offsets.append(r)
# Use queue_wait_ms + processing_ms to infer relative time, or just use index
for i, r in enumerate(ok_records):
r["_offset_s"] = i # will be overridden below if we have arrival data

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the recorded arrival_offset for the timeline.

The detailed CSV already contains arrival_offset, but this block discards it and replaces the x-axis with i * 0.5. That makes “seconds into benchmark” mean record order instead of actual benchmark time, so burst spacing and scheduler timing will be misrepresented.

💡 Suggested fix
-    # Compute time offset for timeline
-    arrival_offsets = []
-    for r in ok_records:
-        rid = r.get("request_id", "")
-        # Try to extract from workload; fallback to sequential index
-        arrival_offsets.append(r)
-    # Use queue_wait_ms + processing_ms to infer relative time, or just use index
     for i, r in enumerate(ok_records):
-        r["_offset_s"] = i  # will be overridden below if we have arrival data
+        arrival_offset = r.get("arrival_offset")
+        if isinstance(arrival_offset, (int, float)) and arrival_offset >= 0:
+            r["_offset_s"] = arrival_offset / 1000.0
+        else:
+            r["_offset_s"] = float(i)
-        r["_offset_s"] = i * 0.5  # rough timeline spacing

Also applies to: 502-502

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 451-451: local variable 'rid' is assigned to but never used

(F841)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 448 -
457, The timeline code is discarding each record's recorded arrival_offset and
forcing offsets by index; update the block that builds arrival_offsets/sets
r["_offset_s"] (handling ok_records and r entries) to prefer the existing
r.get("arrival_offset") (convert to seconds if it's in ms) and set
r["_offset_s"] from that when present; only fall back to computing from
queue_wait_ms + processing_ms or the sequential index (i * 0.5) when
arrival_offset is missing, so burst spacing and scheduler timing are preserved
in the plotted timeline.

@wasnertobias wasnertobias changed the title Logos- UI Enhancements and Fair lane Eviction Algo Logos: UI Enhancements and Fair lane Eviction Algo Apr 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
logos/logos-workernode/logos_worker_node/vllm_process.py (1)

1014-1021: ⚠️ Potential issue | 🟠 Major

Reset cached max concurrency before each respawn.

_stream_logs() only parses this value while _max_concurrency is None. Without resetting it in spawn(), a restarted lane can keep stale concurrency and report wrong capacity.

🔧 Proposed fix
 async def spawn(self, lane_config: LaneConfig) -> ProcessStatus:
@@
-        self._recent_logs.clear()
+        self._recent_logs.clear()
+        self._max_concurrency = None
         self._lane_config = lane_config
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-workernode/logos_worker_node/vllm_process.py` around lines 1014 -
1021, Reset the cached max concurrency before each respawn by setting
self._max_concurrency = None at the start of the spawn() method (or immediately
before the process is launched) so that _stream_logs() will re-parse and update
the value for restarted lanes; reference the _stream_logs() method and the
_max_concurrency attribute when making the change.
logos/logos-workernode/logos_worker_node/lane_manager.py (1)

1492-1497: ⚠️ Potential issue | 🟠 Major

Avoid num_parallel=0 fallback while vLLM concurrency is still warming up.

When max_concurrency is not parsed yet, this can briefly advertise zero capacity for an otherwise healthy lane.

🔧 Proposed fix
             if lc.vllm:
                 # Use vLLM-reported max concurrency (KV-budget-derived) when available.
                 vllm_max = getattr(handle, "max_concurrency", None)
-                num_parallel = vllm_max if vllm_max and vllm_max > 0 else 0
+                if isinstance(vllm_max, int) and vllm_max > 0:
+                    num_parallel = vllm_max
+                else:
+                    num_parallel = max(1, int(lc.num_parallel or 1))
             else:
                 num_parallel = lc.num_parallel
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/logos-workernode/logos_worker_node/lane_manager.py` around lines 1492 -
1497, The current logic sets num_parallel to 0 when a vLLM handle lacks a parsed
max_concurrency, which temporarily advertises zero capacity; change the fallback
so that when lc.vllm is true and getattr(handle, "max_concurrency", None) is
None or non-positive, you do not set num_parallel to 0 but instead fall back to
using lc.num_parallel (or leave it unset/None to be handled by downstream
capacity logic); update the branch that inspects lc.vllm,
handle.max_concurrency, and assigns num_parallel to use lc.num_parallel as the
safe default instead of 0 so warming vLLM lanes aren’t advertised as
capacity-zero.
logos/src/logos/main.py (2)

3353-3374: ⚠️ Potential issue | 🟠 Major

Validate pagination input with a Pydantic request model and return 400 on bad input.

At Line 3369-Line 3370, direct int(...) coercion still throws on invalid values ("foo", null) and returns 500 instead of a client error. Also, this new route bypasses the required request-model validation pattern.

💡 Suggested fix
 `@app.post`("/logosdb/paginated_requests", tags=["admin"])
 async def paginated_requests_endpoint(request: Request):
@@
-    try:
-        body = await request.json()
-    except Exception:
-        body = {}
-
-    if not isinstance(body, dict):
-        body = {}
-
-    page = int(body.get("page", 1))
-    per_page = int(body.get("per_page", 20))
+    try:
+        body = await request.json()
+    except json.JSONDecodeError:
+        raise HTTPException(status_code=400, detail="Invalid JSON body")
+    if body is None:
+        body = {}
+    if not isinstance(body, dict):
+        raise HTTPException(status_code=400, detail="JSON payload must be an object")
+
+    try:
+        parsed = PaginatedRequestsRequest(**body)  # define in logos.dbutils.dbrequest
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc))
+    page = parsed.page
+    per_page = parsed.per_page

As per coding guidelines "Use Pydantic request models from src/logos/dbutils/dbrequest.py for all new API request validation" and "Raise HTTPException with appropriate status codes for API error handling in routes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/main.py` around lines 3353 - 3374, The endpoint
paginated_requests_endpoint currently coerces page and per_page with int(...)
which can raise and cause 500s; instead, import and use the Pydantic request
model from src/logos/dbutils/dbrequest.py (the proper request model for
paginated requests) to validate the body at the top of
paginated_requests_endpoint, catch pydantic.ValidationError and raise
fastapi.HTTPException(status_code=400, detail=...) for bad input, then read
validated.page/validated.per_page and pass them to
db.get_paginated_requests(logos_key, page=..., per_page=...); ensure
non-dict/malformed JSON also results in HTTPException 400 rather than falling
back to defaults.

352-367: ⚠️ Potential issue | 🟠 Major

Use _safe_float() for worker-supplied device memory fields.

At Line 358-Line 360 and Line 771-Line 773, float(...) can raise on non-numeric runtime payloads and turn these endpoints into 500s. This is duplicated in both normalization blocks.

💡 Suggested fix
-            "memory_used_mb": float(d.get("memory_used_mb") or 0.0),
-            "memory_total_mb": float(d.get("memory_total_mb") or 0.0),
-            "memory_free_mb": float(d.get("memory_free_mb") or 0.0),
+            "memory_used_mb": _safe_float(d.get("memory_used_mb")) or 0.0,
+            "memory_total_mb": _safe_float(d.get("memory_total_mb")) or 0.0,
+            "memory_free_mb": _safe_float(d.get("memory_free_mb")) or 0.0,

Apply the same replacement in both changed comprehensions.

Also applies to: 762-781

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/main.py` around lines 352 - 367, The device normalization
comprehension that builds provider_signals["devices"] uses float(d.get(...)) for
memory_used_mb, memory_total_mb and memory_free_mb which can raise on malformed
inputs; update those three fields to use the existing _safe_float(d.get(...))
helper instead (replace float(d.get("memory_used_mb") or 0.0) etc. with
_safe_float(d.get("memory_used_mb"))), and make the identical change in the
other device-normalization comprehension later in the file (the second block
that also constructs device dicts around lines handling worker-supplied devices)
so both normalization blocks consistently use _safe_float for memory fields.
🧹 Nitpick comments (1)
logos/src/logos/capacity/capacity_planner.py (1)

1138-1145: Avoid silent except Exception: pass in switch metrics path

Swallowing all exceptions here removes signal when metric emission breaks and makes thrash telemetry debugging hard.

🔧 Proposed fix
-            except Exception:
-                pass
+            except Exception:
+                logger.debug(
+                    "Failed to emit capacity planner switch metrics",
+                    exc_info=True,
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 1138 - 1145, The
current try/except around importing and emitting metrics for
prom.CAPACITY_PLANNER_SWITCHES_TOTAL and
prom.CAPACITY_PLANNER_SWITCH_GAP_SECONDS silently swallows all errors; replace
the bare except with targeted handling that logs the exception instead of
suppressing it. Specifically, in capacity_planner.py around the block that
references logos.monitoring.prometheus_metrics, catch ImportError (for the
import) and Exception as e for metric emission failures and call the module or
instance logger (e.g., self.logger or a module-level logger) with
logger.exception or logger.error including the exception details and context
("failed to emit capacity planner switch metrics"), rather than using "except
Exception: pass"; ensure you still increment CAPACITY_PLANNER_SWITCHES_TOTAL and
observe gap only when prom is present and valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logos/src/logos/capacity/capacity_planner.py`:
- Around line 2625-2635: The check in _is_reclaim_sleep only looks for the
string "Request-time reclaim" but the CapacityPlanAction reason uses
"Request-time drain+sleep …", so reclaim-sleep actions bypass the
drain/cold-mark safety path; update _is_reclaim_sleep to detect the request-time
reclaim pattern used in emitted actions (e.g., match
reason.startswith("Request-time") or allow both "Request-time reclaim" and
"Request-time drain+sleep") and/or ensure that when creating CapacityPlanAction
entries for sleep_l1 (see the CapacityPlanAction with action="sleep_l1" and
reason="Request-time drain+sleep ...") you also invoke the same pre-mark/call to
_drain_lane that the reclaim path uses so those sleeps go through the
drain+cold-mark safety flow.
- Around line 179-180: The thrashing logic currently uses a single global list
self._switch_timestamps which makes thrash detection cross-provider; change
self._switch_timestamps to a dict mapping provider_id → list[float] and update
all thrash-related methods (_is_thrashing, _get_effective_tenure and any
callers) to accept a provider_id and read/write timestamps from
self._switch_timestamps[provider_id]; ensure when recording a switch you append
to the per-provider list and when evaluating tenure you only consider that
provider's timestamps (also apply the same per-provider change to the other
occurrences of thrash logic referenced in the file, e.g., the code around
_is_thrashing/_get_effective_tenure call sites).
- Around line 1173-1178: The parameter type for lanes and the return type for
_wait_for_provider are missing; update _get_queue_depth_for_model to use a
concrete typed collection (e.g., lanes: Sequence[Lane] or lanes: List[Lane]) and
add an explicit return annotation for _wait_for_provider (likely -> None if it
only waits) and apply the same lanes typing change at the other occurrence noted
(around lines 2234-2235); also import the necessary typing symbols (List or
Sequence) and the Lane type (or use a forward reference) so all new helper
signatures are fully typed and consistent with project guidelines.
- Around line 118-123: The inline block comment under WAKE_PER_GPU_SAFETY_MARGIN
(and similarly the wrapped lines for TP_RANK0_VRAM_FRACTION) is indented like a
code continuation and triggers Flake8 E114/E116; fix it by reflowing the wrapped
comment lines so they align with the first comment marker (i.e., each wrapped
line begins with the same "# " column as the initial comment) or by shortening
the comment lines so they don't wrap, keeping the comments directly above the
constants WAKE_PER_GPU_SAFETY_MARGIN and TP_RANK0_VRAM_FRACTION and preserving
the explanatory text without extra leading indentation.

---

Duplicate comments:
In `@logos/logos-workernode/logos_worker_node/lane_manager.py`:
- Around line 1492-1497: The current logic sets num_parallel to 0 when a vLLM
handle lacks a parsed max_concurrency, which temporarily advertises zero
capacity; change the fallback so that when lc.vllm is true and getattr(handle,
"max_concurrency", None) is None or non-positive, you do not set num_parallel to
0 but instead fall back to using lc.num_parallel (or leave it unset/None to be
handled by downstream capacity logic); update the branch that inspects lc.vllm,
handle.max_concurrency, and assigns num_parallel to use lc.num_parallel as the
safe default instead of 0 so warming vLLM lanes aren’t advertised as
capacity-zero.

In `@logos/logos-workernode/logos_worker_node/vllm_process.py`:
- Around line 1014-1021: Reset the cached max concurrency before each respawn by
setting self._max_concurrency = None at the start of the spawn() method (or
immediately before the process is launched) so that _stream_logs() will re-parse
and update the value for restarted lanes; reference the _stream_logs() method
and the _max_concurrency attribute when making the change.

In `@logos/src/logos/main.py`:
- Around line 3353-3374: The endpoint paginated_requests_endpoint currently
coerces page and per_page with int(...) which can raise and cause 500s; instead,
import and use the Pydantic request model from src/logos/dbutils/dbrequest.py
(the proper request model for paginated requests) to validate the body at the
top of paginated_requests_endpoint, catch pydantic.ValidationError and raise
fastapi.HTTPException(status_code=400, detail=...) for bad input, then read
validated.page/validated.per_page and pass them to
db.get_paginated_requests(logos_key, page=..., per_page=...); ensure
non-dict/malformed JSON also results in HTTPException 400 rather than falling
back to defaults.
- Around line 352-367: The device normalization comprehension that builds
provider_signals["devices"] uses float(d.get(...)) for memory_used_mb,
memory_total_mb and memory_free_mb which can raise on malformed inputs; update
those three fields to use the existing _safe_float(d.get(...)) helper instead
(replace float(d.get("memory_used_mb") or 0.0) etc. with
_safe_float(d.get("memory_used_mb"))), and make the identical change in the
other device-normalization comprehension later in the file (the second block
that also constructs device dicts around lines handling worker-supplied devices)
so both normalization blocks consistently use _safe_float for memory fields.

---

Nitpick comments:
In `@logos/src/logos/capacity/capacity_planner.py`:
- Around line 1138-1145: The current try/except around importing and emitting
metrics for prom.CAPACITY_PLANNER_SWITCHES_TOTAL and
prom.CAPACITY_PLANNER_SWITCH_GAP_SECONDS silently swallows all errors; replace
the bare except with targeted handling that logs the exception instead of
suppressing it. Specifically, in capacity_planner.py around the block that
references logos.monitoring.prometheus_metrics, catch ImportError (for the
import) and Exception as e for metric emission failures and call the module or
instance logger (e.g., self.logger or a module-level logger) with
logger.exception or logger.error including the exception details and context
("failed to emit capacity planner switch metrics"), rather than using "except
Exception: pass"; ensure you still increment CAPACITY_PLANNER_SWITCHES_TOTAL and
observe gap only when prom is present and valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f963b768-b209-4906-8486-a3a4a90458ac

📥 Commits

Reviewing files that changed from the base of the PR and between f57d274 and c30a40a.

📒 Files selected for processing (8)
  • logos/logos-workernode/docker-compose.dev.yml
  • logos/logos-workernode/logos_worker_node/lane_manager.py
  • logos/logos-workernode/logos_worker_node/vllm_process.py
  • logos/src/logos/capacity/capacity_planner.py
  • logos/src/logos/dbutils/dbmanager.py
  • logos/src/logos/main.py
  • logos/src/logos/pipeline/context_resolver.py
  • logos/src/logos/sdi/logosnode_facade.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • logos/logos-workernode/docker-compose.dev.yml
  • logos/src/logos/pipeline/context_resolver.py
  • logos/src/logos/sdi/logosnode_facade.py
  • logos/src/logos/dbutils/dbmanager.py

Comment thread logos/src/logos/capacity/capacity_planner.py
Comment thread logos/src/logos/capacity/capacity_planner.py Outdated
Comment thread logos/src/logos/capacity/capacity_planner.py
Comment thread logos/src/logos/capacity/capacity_planner.py Outdated
JakubJakubczyk and others added 2 commits April 9, 2026 22:12
…y lock

Forward reservations caused deadlocks when concurrent ensure_capacity
calls each added reservations that blocked each other — committed VRAM
ballooned to 48GB+ (exceeding total 32GB), driving available negative.

Replace with per-provider asyncio.Lock that serializes capacity
operations so only one reclaim/wake/load runs at a time per provider.
Benchmark result: 147/150 success (98%), 0 cold starts, 0 lock timeouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…capacity retry

Fixes five bugs in the capacity planner identified during deep architecture analysis:

1. Demand-gated idle reclaim: prevents low-demand targets from stealing VRAM
   from models with higher scheduler queue depth (10 activations in 150-req benchmark)
2. Scheduler queue checks in idle sleep and eviction: prevents sleeping/evicting
   lanes that have requests queued in the Logos scheduler
3. Pending capacity retry: re-attempts failed fire-and-forget capacity triggers
   when reclaim actions confirm and free VRAM
4. Dead code removal: removes vestigial _record_switch_event/_is_thrashing
   (~50 lines), preserves Prometheus counter inline
5. Multi-lane busy drain: removes per-lane VRAM feasibility gate that blocked
   draining two lanes when neither alone freed enough VRAM (6 drain+sleep
   actions in benchmark)
6. Fixes drain+sleep reason string to match drain gate substring check,
   ensuring busy lanes go through mark_cold + drain_lane path

Benchmark: 148/150 success (98.7%), 24 model switches, 10 demand gate blocks,
zero dead code references, zero thrashing incidents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (2)
logos/src/logos/capacity/capacity_planner.py (2)

1299-1304: ⚠️ Potential issue | 🟡 Minor

Complete the helper type hints (still incomplete).

lanes: list and missing return type on _wait_for_provider are still untyped/partial signatures.

As per coding guidelines, "Use type hints consistently throughout all function signatures and class attributes."

Also applies to: 2403-2403

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 1299 - 1304, The
signatures for _get_queue_depth_for_model and _wait_for_provider lack full type
hints; update _get_queue_depth_for_model to use a concrete type for lanes (e.g.,
List[Lane] or List[str]/List[int] as appropriate) and import typing List (or
Sequence) and the Lane type if defined, and add the correct return type for
_wait_for_provider (e.g., -> Optional[Provider] or -> None or -> bool depending
on its behavior). Ensure you update the function annotations for
_get_queue_depth_for_model (lanes: List[...]) and the _wait_for_provider
signature to include the precise return type, and run type checks to confirm no
other callers need adjusted annotations.

116-121: ⚠️ Potential issue | 🟡 Minor

Flake8 comment-indentation issue is still present.

The wrapped constant comments are still indented like continuation code, which keeps triggering E114/E116.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 116 - 121, The
block-level comments after the constant assignments WAKE_PER_GPU_SAFETY_MARGIN
and TP_RANK0_VRAM_FRACTION are indented as if they were continuation lines,
triggering Flake8 E114/E116; fix by left-aligning/dedenting each wrapped comment
line to the same column as the constant assignment (so the comment text starts
at the same indentation as the variable, not as a continuation), ensuring all
wrapped lines for both WAKE_PER_GPU_SAFETY_MARGIN and TP_RANK0_VRAM_FRACTION are
flush with the assignment rather than indented.
🧹 Nitpick comments (3)
logos/src/logos/capacity/vram_ledger.py (1)

129-144: Duplicate O(n) iteration on every release.

The check any(r.provider_id == res.provider_id for r in self._reservations.values()) iterates all reservations twice per release() call (lines 131 and 141). Since releases can be frequent, consider computing the result once:

♻️ Suggested optimization
         self._provider_committed[res.provider_id] = committed - res.vram_mb
-        # Clamp only when no reservations remain for this provider (floating-point cleanup)
-        if not any(r.provider_id == res.provider_id for r in self._reservations.values()):
+        # Clamp only when no reservations remain for this provider (floating-point cleanup)
+        provider_has_remaining = any(r.provider_id == res.provider_id for r in self._reservations.values())
+        if not provider_has_remaining:
             self._provider_committed[res.provider_id] = max(0.0, self._provider_committed[res.provider_id])
         # Release per-GPU committed
         if res.gpu_devices:
             per_gpu = res.vram_mb / len(res.gpu_devices)
             for dev in res.gpu_devices:
                 key = (res.provider_id, dev)
                 old = self._gpu_committed.get(key, 0.0)
                 self._gpu_committed[key] = old - per_gpu
             # Clamp per-GPU only when no reservations remain for this provider
-            if not any(r.provider_id == res.provider_id for r in self._reservations.values()):
+            if not provider_has_remaining:
                 for dev in res.gpu_devices:
                     key = (res.provider_id, dev)
                     self._gpu_committed[key] = max(0.0, self._gpu_committed[key])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/vram_ledger.py` around lines 129 - 144, The release
logic performs the same O(n) scan of self._reservations twice using
any(r.provider_id == res.provider_id ...) which is wasteful; in the release()
function compute a single boolean (e.g., has_remaining = any(r.provider_id ==
res.provider_id for r in self._reservations.values())) before adjusting
self._provider_committed and self._gpu_committed and reuse that flag for both
the provider-level clamp and the per-GPU clamp so the reservations are only
iterated once per release; update references to self._provider_committed,
self._gpu_committed, _reservations and res.provider_id accordingly.
logos/src/logos/pipeline/base_scheduler.py (2)

304-307: Consider narrowing the exception type.

Catching bare Exception for get_parallel_capacity() could hide unexpected errors. If specific exceptions are expected (e.g., KeyError, ValueError), catch those explicitly.

♻️ Suggested refinement
             try:
                 max_capacity, _ = self._logosnode.get_parallel_capacity(model_id, provider_id)
-            except (KeyError, Exception):
+            except (KeyError, ValueError):
                 max_capacity = 1

Alternatively, if truly any exception should fall back to max_capacity = 1, add a debug log to aid troubleshooting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/base_scheduler.py` around lines 304 - 307, The
current except block around self._logosnode.get_parallel_capacity(model_id,
provider_id) catches bare Exception which can mask unexpected errors; update the
handler in base_scheduler.py to catch specific expected exceptions (e.g.,
KeyError, ValueError) and set max_capacity = 1 for those, and if you must keep a
broad fallback also log the exception (using the module logger or self._logger)
at debug/error level before setting max_capacity = 1 so unexpected failures are
visible; refer to get_parallel_capacity, max_capacity, and self._logosnode to
locate the change.

32-41: Add type annotation for on_capacity_needed.

The parameter lacks a type hint. Per coding guidelines, type hints should be used consistently throughout function signatures.

♻️ Suggested fix
+from typing import Callable, Awaitable
+
 def __init__(
     self,
     queue_manager: PriorityQueueManager,
     logosnode_facade: LogosNodeSchedulingDataFacade,
     azure_facade: AzureSchedulingDataFacade,
     model_registry: Dict[tuple[int, int], str] | None = None,
-    on_capacity_needed=None,
+    on_capacity_needed: Callable[[int, str], Awaitable[None]] | None = None,
 ):

As per coding guidelines: "Use type hints consistently throughout all function signatures and class attributes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/base_scheduler.py` around lines 32 - 41, The
__init__ parameter on_capacity_needed is missing a type hint; add a annotation
like on_capacity_needed: Optional[Callable[[str, str], Awaitable[None]]] (or
Optional[Callable[[str, str], None]] if it's synchronous) and import the
necessary typing symbols (Optional, Callable, Awaitable) at the top of
logos/src/logos/pipeline/base_scheduler.py, and keep assigning it to
self._on_capacity_needed as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logos/src/logos/capacity/capacity_planner.py`:
- Around line 2526-2534: The loop in ensure_capacity that uses "while True"
(referencing provider_id, target.model_name, and deadline) can busy-spin while
holding the provider lock because several continue paths re-enter the loop
without sleeping; update the logic inside ensure_capacity to avoid holding the
lock during tight retries—either release the provider lock before any
retry/continue and re-acquire it after a short backoff or insert an await/sleep
(e.g., time.sleep or asyncio.sleep consistent with the surrounding code) before
continuing; apply the same change to the similar retry loop around lines
handling the second instance (the block noted around 2835-2853) so no retry path
re-enters the while loop while holding the lock.
- Line 2526: Indentation in capacity_planner.py contains non-4-space indents
(e.g., the "while True:" loop and other occurrences), causing Flake8 E111/E114;
fix by normalizing those blocks to use 4-space indentation consistently so the
"while True:" line and its inner block align with surrounding code. Locate the
offending "while True:" and the other listed occurrences and replace tabs or
wrong-space counts with standard 4-space indents for the loop body and any
nested blocks so linting passes.
- Around line 615-617: The generator expression assigning current_target uses
the ambiguous single-letter variable l; change that to a descriptive name like
lane_signal (e.g., next((lane_signal for lane_signal in current_lanes if
lane_signal.lane_id == target.lane_id), None)) so it passes E741; apply the same
rename to the other generator/comprehension occurrences flagged (the ones
iterating over current_lanes or similar collections that use `l`) around the
later block where target.lane_id is compared (previously using `l`) to ensure
consistent, descriptive naming across capacity_planner.py.
- Around line 179-183: _pending_capacity is currently keyed only by model_name
and can be clobbered across providers; change its keying to be provider-scoped
(e.g., use key (provider_id, model_name) or a nested dict provider_id →
model_name → (registered_at)) and update the type annotation from dict[str,
tuple[int, float]] accordingly; then update every read/write/clear of
_pending_capacity in this module (places that set, check, or pop entries) to use
the new provider-scoped key so retries and reclaims are correctly isolated per
provider.
- Around line 928-932: The current try/except around the prometheus increment
(prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc()) swallows errors; replace the bare
except/pass with targeted error handling that logs the exception so
instrumentation failures are visible: catch import/attribute/runtime exceptions
(or catch Exception as e) and call a module logger (e.g.,
logging.getLogger(__name__) or the existing class logger) with logger.exception
or logger.error including context like "failed to increment
CAPACITY_PLANNER_SWITCHES_TOTAL" and the exception details; do this around the
import and the prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc() call so failures are
recorded instead of silently ignored.
- Around line 1358-1360: The fire-and-forget task created for
prepare_lane_for_request(provider_id, m, timeout_seconds=30.0) is untracked and
can swallow exceptions; change the call site to track the Task (e.g., add it to
a persistent container like self._background_tasks or similar), attach a
done-callback that logs/exposes exceptions, and remove the Task from the
container when complete; ensure any tracking container is created on the class
(initialize self._background_tasks if missing) and that the done-callback
records failures via the existing logger so retries/failures aren’t silent.

In `@logos/src/logos/pipeline/base_scheduler.py`:
- Around line 292-296: The catch-all except in the loop around
self._logosnode.is_model_lane_ready(model_id, provider_id) silently swallows
errors; modify the except to log the exception (at debug or appropriate level)
before continuing so connectivity or facade errors are visible—use the existing
logger (e.g., self._logger or module logger) and include context (model_id,
provider_id) and the exception when handling failures in the is_model_lane_ready
call.

In `@logos/tests/unit/capacity/test_capacity_planner.py`:
- Around line 2603-2622: The test function
test_pending_capacity_retry_on_reclaim_confirm should not use the
`@pytest.mark.asyncio` decorator; remove that decorator from the test definition
so the test runs under the project's asyncio_mode="auto". Ensure the async test
signature remains (async def test_pending_capacity_retry_on_reclaim_confirm():)
and keep the rest of the test logic that interacts with _make_planner,
planner._pending_capacity, planner.prepare_lane_for_request (mock
_mock_prepare), and planner._retry_pending_capacity(10) unchanged.

---

Duplicate comments:
In `@logos/src/logos/capacity/capacity_planner.py`:
- Around line 1299-1304: The signatures for _get_queue_depth_for_model and
_wait_for_provider lack full type hints; update _get_queue_depth_for_model to
use a concrete type for lanes (e.g., List[Lane] or List[str]/List[int] as
appropriate) and import typing List (or Sequence) and the Lane type if defined,
and add the correct return type for _wait_for_provider (e.g., ->
Optional[Provider] or -> None or -> bool depending on its behavior). Ensure you
update the function annotations for _get_queue_depth_for_model (lanes:
List[...]) and the _wait_for_provider signature to include the precise return
type, and run type checks to confirm no other callers need adjusted annotations.
- Around line 116-121: The block-level comments after the constant assignments
WAKE_PER_GPU_SAFETY_MARGIN and TP_RANK0_VRAM_FRACTION are indented as if they
were continuation lines, triggering Flake8 E114/E116; fix by
left-aligning/dedenting each wrapped comment line to the same column as the
constant assignment (so the comment text starts at the same indentation as the
variable, not as a continuation), ensuring all wrapped lines for both
WAKE_PER_GPU_SAFETY_MARGIN and TP_RANK0_VRAM_FRACTION are flush with the
assignment rather than indented.

---

Nitpick comments:
In `@logos/src/logos/capacity/vram_ledger.py`:
- Around line 129-144: The release logic performs the same O(n) scan of
self._reservations twice using any(r.provider_id == res.provider_id ...) which
is wasteful; in the release() function compute a single boolean (e.g.,
has_remaining = any(r.provider_id == res.provider_id for r in
self._reservations.values())) before adjusting self._provider_committed and
self._gpu_committed and reuse that flag for both the provider-level clamp and
the per-GPU clamp so the reservations are only iterated once per release; update
references to self._provider_committed, self._gpu_committed, _reservations and
res.provider_id accordingly.

In `@logos/src/logos/pipeline/base_scheduler.py`:
- Around line 304-307: The current except block around
self._logosnode.get_parallel_capacity(model_id, provider_id) catches bare
Exception which can mask unexpected errors; update the handler in
base_scheduler.py to catch specific expected exceptions (e.g., KeyError,
ValueError) and set max_capacity = 1 for those, and if you must keep a broad
fallback also log the exception (using the module logger or self._logger) at
debug/error level before setting max_capacity = 1 so unexpected failures are
visible; refer to get_parallel_capacity, max_capacity, and self._logosnode to
locate the change.
- Around line 32-41: The __init__ parameter on_capacity_needed is missing a type
hint; add a annotation like on_capacity_needed: Optional[Callable[[str, str],
Awaitable[None]]] (or Optional[Callable[[str, str], None]] if it's synchronous)
and import the necessary typing symbols (Optional, Callable, Awaitable) at the
top of logos/src/logos/pipeline/base_scheduler.py, and keep assigning it to
self._on_capacity_needed as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c1719eea-8a12-4640-b2fc-c7d40d4e11a9

📥 Commits

Reviewing files that changed from the base of the PR and between c30a40a and 74852e7.

📒 Files selected for processing (5)
  • logos/src/logos/capacity/capacity_planner.py
  • logos/src/logos/capacity/vram_ledger.py
  • logos/src/logos/pipeline/base_scheduler.py
  • logos/tests/performance/generate_explicit_model_workloads.py
  • logos/tests/unit/capacity/test_capacity_planner.py

Comment on lines +179 to +183
# Pending capacity: models whose fire-and-forget capacity trigger failed.
# Re-attempted when any reclaim action confirms (freed VRAM may suffice).
# Key: model_name → (provider_id, registered_at)
self._pending_capacity: dict[str, tuple[int, float]] = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope pending-capacity keys by provider to avoid cross-provider overwrite.

_pending_capacity is keyed by model_name only, so two providers failing capacity prep for the same model will overwrite each other and one retry path is lost.

🔧 Proposed fix
-        # Key: model_name → (provider_id, registered_at)
-        self._pending_capacity: dict[str, tuple[int, float]] = {}
+        # Key: (provider_id, model_name) → registered_at
+        self._pending_capacity: dict[tuple[int, str], float] = {}

@@
-                self._pending_capacity[model_name] = (provider_id, time.time())
+                self._pending_capacity[(provider_id, model_name)] = time.time()

@@
-                self._pending_capacity[model_name] = (provider_id, time.time())
+                self._pending_capacity[(provider_id, model_name)] = time.time()

@@
-        to_retry: list[str] = []
-        stale: list[str] = []
+        to_retry: list[tuple[int, str]] = []
+        stale: list[tuple[int, str]] = []

-        for model_name, (pid, registered_at) in list(self._pending_capacity.items()):
-            if pid != provider_id:
+        for (pid, model_name), registered_at in list(self._pending_capacity.items()):
+            if pid != provider_id:
                 continue
@@
-                stale.append(model_name)
+                stale.append((pid, model_name))
                 continue
@@
-                stale.append(model_name)
+                stale.append((pid, model_name))
                 continue
-            to_retry.append(model_name)
+            to_retry.append((pid, model_name))

-        for m in stale:
-            self._pending_capacity.pop(m, None)
+        for key in stale:
+            self._pending_capacity.pop(key, None)

-        for m in to_retry:
-            self._pending_capacity.pop(m, None)
-            logger.info("Retrying pending capacity for %s on provider %d", m, provider_id)
+        for pid, model_name in to_retry:
+            self._pending_capacity.pop((pid, model_name), None)
+            logger.info("Retrying pending capacity for %s on provider %d", model_name, pid)
             asyncio.create_task(
-                self.prepare_lane_for_request(provider_id, m, timeout_seconds=30.0)
+                self.prepare_lane_for_request(pid, model_name, timeout_seconds=30.0)
             )

Also applies to: 607-607, 820-820, 1322-1360

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 179 - 183,
_pending_capacity is currently keyed only by model_name and can be clobbered
across providers; change its keying to be provider-scoped (e.g., use key
(provider_id, model_name) or a nested dict provider_id → model_name →
(registered_at)) and update the type annotation from dict[str, tuple[int,
float]] accordingly; then update every read/write/clear of _pending_capacity in
this module (places that set, check, or pop entries) to use the new
provider-scoped key so retries and reclaims are correctly isolated per provider.

Comment on lines +615 to +617
current_target = next(
(l for l in current_lanes if l.lane_id == target.lane_id),
None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename ambiguous loop variable l to pass E741.

Single-letter l is flagged as ambiguous and should be replaced with a descriptive name (e.g., lane_signal).

Also applies to: 1342-1345

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 616-616: ambiguous variable name 'l'

(E741)

🪛 Ruff (0.15.9)

[error] 616-616: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 615 - 617, The
generator expression assigning current_target uses the ambiguous single-letter
variable l; change that to a descriptive name like lane_signal (e.g.,
next((lane_signal for lane_signal in current_lanes if lane_signal.lane_id ==
target.lane_id), None)) so it passes E741; apply the same rename to the other
generator/comprehension occurrences flagged (the ones iterating over
current_lanes or similar collections that use `l`) around the later block where
target.lane_id is compared (previously using `l`) to ensure consistent,
descriptive naming across capacity_planner.py.

Comment on lines +928 to +932
try:
from logos.monitoring import prometheus_metrics as prom
prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc()
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t swallow metric errors silently.

The try/except Exception: pass around switch metrics hides instrumentation faults and makes regressions hard to diagnose.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 931-932: try-except-pass detected, consider logging the exception

(S110)


[warning] 931-931: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 928 - 932, The
current try/except around the prometheus increment
(prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc()) swallows errors; replace the bare
except/pass with targeted error handling that logs the exception so
instrumentation failures are visible: catch import/attribute/runtime exceptions
(or catch Exception as e) and call a module logger (e.g.,
logging.getLogger(__name__) or the existing class logger) with logger.exception
or logger.error including context like "failed to increment
CAPACITY_PLANNER_SWITCHES_TOTAL" and the exception details; do this around the
import and the prom.CAPACITY_PLANNER_SWITCHES_TOTAL.inc() call so failures are
recorded instead of silently ignored.

Comment on lines +1358 to +1360
asyncio.create_task(
self.prepare_lane_for_request(provider_id, m, timeout_seconds=30.0)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Track fire-and-forget tasks to avoid silent failures.

asyncio.create_task(...) is spawned without retaining/observing the task, so exceptions can become unobserved and retries become opaque.

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 1358-1360: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` around lines 1358 - 1360, The
fire-and-forget task created for prepare_lane_for_request(provider_id, m,
timeout_seconds=30.0) is untracked and can swallow exceptions; change the call
site to track the Task (e.g., add it to a persistent container like
self._background_tasks or similar), attach a done-callback that logs/exposes
exceptions, and remove the Task from the container when complete; ensure any
tracking container is created on the class (initialize self._background_tasks if
missing) and that the done-callback records failures via the existing logger so
retries/failures aren’t silent.

Comment thread logos/src/logos/capacity/capacity_planner.py
return False

try:
while True:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent indentation currently breaks linting.

These blocks are indented with non-4-space levels, triggering Flake8 E111/E114 and failing CI style checks.

Also applies to: 4447-4447, 4512-4512, 4564-4564, 4652-4652, 4734-4739

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 2526-2526: indentation is not a multiple of 4

(E111)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/capacity/capacity_planner.py` at line 2526, Indentation in
capacity_planner.py contains non-4-space indents (e.g., the "while True:" loop
and other occurrences), causing Flake8 E111/E114; fix by normalizing those
blocks to use 4-space indentation consistently so the "while True:" line and its
inner block align with surrounding code. Locate the offending "while True:" and
the other listed occurrences and replace tabs or wrong-space counts with
standard 4-space indents for the loop body and any nested blocks so linting
passes.

Comment on lines +292 to +296
try:
if not self._logosnode.is_model_lane_ready(model_id, provider_id):
continue
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent exception swallowing hides failures.

The static analysis correctly flags this: exceptions from is_model_lane_ready() are caught and the loop silently continues without logging. This could mask connectivity issues or bugs in the facade.

Consider logging at debug level before continuing:

🛠️ Suggested fix
             try:
                 if not self._logosnode.is_model_lane_ready(model_id, provider_id):
                     continue
-            except Exception:
+            except Exception as e:
+                logger.debug(
+                    "Skipping model %s provider %s in reevaluate: %s",
+                    model_id, provider_id, e,
+                )
                 continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
if not self._logosnode.is_model_lane_ready(model_id, provider_id):
continue
except Exception:
continue
try:
if not self._logosnode.is_model_lane_ready(model_id, provider_id):
continue
except Exception as e:
logger.debug(
"Skipping model %s provider %s in reevaluate: %s",
model_id, provider_id, e,
)
continue
🧰 Tools
🪛 Ruff (0.15.9)

[error] 295-296: try-except-continue detected, consider logging the exception

(S112)


[warning] 295-295: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/base_scheduler.py` around lines 292 - 296, The
catch-all except in the loop around
self._logosnode.is_model_lane_ready(model_id, provider_id) silently swallows
errors; modify the except to log the exception (at debug or appropriate level)
before continuing so connectivity or facade errors are visible—use the existing
logger (e.g., self._logger or module logger) and include context (model_id,
provider_id) and the exception when handling failures in the is_model_lane_ready
call.

Comment on lines +2603 to +2622
@pytest.mark.asyncio
async def test_pending_capacity_retry_on_reclaim_confirm():
"""After a confirmed sleep, _retry_pending_capacity re-fires for blocked models."""
facade = MockFacade(lanes=[])
planner = _make_planner(facade=facade)
planner._pending_capacity["model-x"] = (10, time.time())

retried: list[str] = []

async def _mock_prepare(pid, model, timeout_seconds=30.0):
retried.append(model)

planner.prepare_lane_for_request = _mock_prepare
planner._retry_pending_capacity(10)

# Let the created task run
await asyncio.sleep(0)

assert "model-x" in retried
assert "model-x" not in planner._pending_capacity
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove @pytest.mark.asyncio decorator.

Per coding guidelines, tests should not use @pytest.mark.asyncio decorators because the test configuration uses asyncio_mode = "auto".

🛠️ Suggested fix
-@pytest.mark.asyncio
 async def test_pending_capacity_retry_on_reclaim_confirm():
     """After a confirmed sleep, _retry_pending_capacity re-fires for blocked models."""
     facade = MockFacade(lanes=[])

As per coding guidelines: "Do not use @pytest.mark.asyncio decorators on test functions — tests use asyncio_mode = \"auto\" configuration."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_pending_capacity_retry_on_reclaim_confirm():
"""After a confirmed sleep, _retry_pending_capacity re-fires for blocked models."""
facade = MockFacade(lanes=[])
planner = _make_planner(facade=facade)
planner._pending_capacity["model-x"] = (10, time.time())
retried: list[str] = []
async def _mock_prepare(pid, model, timeout_seconds=30.0):
retried.append(model)
planner.prepare_lane_for_request = _mock_prepare
planner._retry_pending_capacity(10)
# Let the created task run
await asyncio.sleep(0)
assert "model-x" in retried
assert "model-x" not in planner._pending_capacity
async def test_pending_capacity_retry_on_reclaim_confirm():
"""After a confirmed sleep, _retry_pending_capacity re-fires for blocked models."""
facade = MockFacade(lanes=[])
planner = _make_planner(facade=facade)
planner._pending_capacity["model-x"] = (10, time.time())
retried: list[str] = []
async def _mock_prepare(pid, model, timeout_seconds=30.0):
retried.append(model)
planner.prepare_lane_for_request = _mock_prepare
planner._retry_pending_capacity(10)
# Let the created task run
await asyncio.sleep(0)
assert "model-x" in retried
assert "model-x" not in planner._pending_capacity
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/unit/capacity/test_capacity_planner.py` around lines 2603 - 2622,
The test function test_pending_capacity_retry_on_reclaim_confirm should not use
the `@pytest.mark.asyncio` decorator; remove that decorator from the test
definition so the test runs under the project's asyncio_mode="auto". Ensure the
async test signature remains (async def
test_pending_capacity_retry_on_reclaim_confirm():) and keep the rest of the test
logic that interacts with _make_planner, planner._pending_capacity,
planner.prepare_lane_for_request (mock _mock_prepare), and
planner._retry_pending_capacity(10) unchanged.

- Increase context_resolver retry window from 30s to 60s (range(20)→range(35))
  to survive worst-case multi-lane drain with continuous batching (50-60s)
- Add 300-req and 600-req hw3 workload variants to generator
- Fix distribution charts: remove stats card, move legend to center-right,
  remove density tick labels, x-axis starts at 0, thick solid Overall KDE
- Clean up old explicit/ benchmark results, replace with 6 organized folders:
  hw3_random_10m_{150,300,600}req_resolver{30s,60s} with all artifacts
- Results: resolver60s achieves 100% success at all loads (150/300/600 req)
  vs 98.7%/81.0%/76.0% with resolver30s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
logos/src/logos/pipeline/context_resolver.py (1)

134-137: ⚠️ Potential issue | 🟡 Minor

Logged elapsed time is inaccurate after backoff starts.

The log message uses attempt + 1 to represent seconds waited, but this doesn't account for the 2s backoff after attempt 9. For example, at attempt=20 it logs "21s" but actual elapsed time is 32s (10×1s + 11×2s).

Proposed fix to track actual elapsed time
                 if lane is None:
+                    elapsed_s = 0.0
                     for attempt in range(35):
-                        await asyncio.sleep(1.0 if attempt < 10 else 2.0)
+                        delay = 1.0 if attempt < 10 else 2.0
+                        await asyncio.sleep(delay)
+                        elapsed_s += delay
                         lane = await self._logosnode_registry.select_lane_for_model(provider_id, model_name)
                         if lane is not None:
                             logger.info(
                                 "Lane became available after %ds for provider=%s model=%s",
-                                attempt + 1, provider_name, model_name,
+                                int(elapsed_s), provider_name, model_name,
                             )
                             break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/context_resolver.py` around lines 134 - 137, The log
uses attempt+1 as the elapsed seconds which is wrong once the 2s backoff kicks
in; update the loop in the context resolver (the block that calls logger.info
for "Lane became available..." and uses the attempt variable) to compute actual
elapsed time instead of using attempt+1 — either record start = time.monotonic()
before the retry loop and compute elapsed_secs = int(time.monotonic() - start)
when logging, or maintain a running accumulated_seconds counter that adds the
exact sleep duration used each retry (1s for early attempts, 2s after backoff)
and log that value; replace the attempt+1 usage in the logger.info call with the
computed elapsed_secs.
logos/tests/performance/plot_benchmark_distributions.py (2)

25-29: ⚠️ Potential issue | 🟡 Minor

Clear the remaining Ruff/Flake8 blockers.

ticker is still unused, bin_edges and patches are unused unpack targets, and Line 494 still has an f-string with no interpolation. This file will keep failing lint until those leftovers are removed.

🧹 Minimal lint fix
 import matplotlib
 matplotlib.use("Agg")
 import matplotlib.pyplot as plt
-import matplotlib.ticker as ticker
 import numpy as np
-    counts, bin_edges, patches = ax.hist(
+    counts, _, _ = ax.hist(
         data, bins=n_bins, density=True,
         color=COLOR_FILL, edgecolor=COLOR_EDGE, alpha=0.8,
         linewidth=0.5, zorder=2,
     )
-    print(f"\n  Operations:")
+    print("\n  Operations:")

Also applies to: 114-118, 494-494

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 25 -
29, Remove the unused matplotlib.ticker import and eliminate unused unpack
targets by replacing the tuple unpacking of (bin_edges, patches) with either a
single variable or underscores (e.g., _ or bin_edges_unused) where the values
are not used in the plotting function (look for the histogram call in
plot_benchmark_distributions or similar); also fix the plain f-string at the
noted location (line with an f-string but no interpolation) by converting it to
a regular string literal or adding the required interpolation so it is not an
unused-fstring. Ensure these changes target the symbols ticker, bin_edges,
patches and the plain f-string in the file.

412-420: ⚠️ Potential issue | 🟠 Major

Plot the timeline from real arrival offsets.

This block still ignores the CSV's arrival_offset, and Line 466 overwrites every _offset_s with i * 0.5. That makes “seconds into benchmark” mean row order instead of benchmark time, so burst spacing and scheduler behavior are distorted.

Also applies to: 466-466

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 412 -
420, The code currently ignores each record's arrival_offset and overwrites
r["_offset_s"] with a simple index, which distorts real timeline spacing; update
the loop over ok_records to extract a numeric arrival time from
r.get("arrival_offset") and set r["_offset_s"] to that value (converting from ms
to seconds if needed), otherwise compute a fallback using r.get("queue_wait_ms")
+ r.get("processing_ms") (converted to seconds), and only if neither exists fall
back to index-based spacing (e.g., i * 0.5); also populate arrival_offsets with
the computed offsets so downstream plotting uses the real benchmark arrival
timeline instead of row order.
🧹 Nitpick comments (1)
logos/tests/performance/plot_benchmark_distributions.py (1)

57-69: Hoist the mean out of the variance loop.

sum(s) / len(s) is recomputed for every element in the stdev expression, which makes stats_block() quadratic on larger CSVs. Cache it once and reuse it.

♻️ Suggested cleanup
 def stats_block(data: list[float]) -> dict[str, float]:
     s = sorted(data)
+    mean = sum(s) / len(s) if s else 0
     return {
         "count": len(s),
         "min": s[0] if s else 0,
         "max": s[-1] if s else 0,
-        "mean": sum(s) / len(s) if s else 0,
+        "mean": mean,
         "median": percentile(s, 50),
         "p90": percentile(s, 90),
         "p95": percentile(s, 95),
         "p99": percentile(s, 99),
-        "stdev": (sum((x - sum(s)/len(s))**2 for x in s) / len(s)) ** 0.5 if s else 0,
+        "stdev": (sum((x - mean) ** 2 for x in s) / len(s)) ** 0.5 if s else 0,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 57 -
69, In stats_block, avoid recomputing sum(s)/len(s) inside the stdev loop by
computing mean once (e.g., assign mean = sum(s)/len(s) after sorting and
checking for emptiness) and then use that mean for both the "mean" field and
inside the stdev calculation (stdev = (sum((x - mean)**2 for x in s) /
len(s))**0.5 if s else 0); keep other fields (median, p90, p95, p99) as-is and
ensure percentile(...) calls remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Around line 46-69: Add comprehensive docstrings to all public top-level
functions: percentile, stats_block, load_records, and main (also update the
functions around lines 361-379). For each function include a short description,
parameters with types and meaning, the return value and type, and any exceptions
that can be raised (e.g., empty data, invalid percent values, file/IO errors,
parsing errors). Use triple-quoted docstrings placed immediately above each
function so they satisfy the repo guideline of documenting parameters, return
values, and exceptions.

---

Duplicate comments:
In `@logos/src/logos/pipeline/context_resolver.py`:
- Around line 134-137: The log uses attempt+1 as the elapsed seconds which is
wrong once the 2s backoff kicks in; update the loop in the context resolver (the
block that calls logger.info for "Lane became available..." and uses the attempt
variable) to compute actual elapsed time instead of using attempt+1 — either
record start = time.monotonic() before the retry loop and compute elapsed_secs =
int(time.monotonic() - start) when logging, or maintain a running
accumulated_seconds counter that adds the exact sleep duration used each retry
(1s for early attempts, 2s after backoff) and log that value; replace the
attempt+1 usage in the logger.info call with the computed elapsed_secs.

In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Around line 25-29: Remove the unused matplotlib.ticker import and eliminate
unused unpack targets by replacing the tuple unpacking of (bin_edges, patches)
with either a single variable or underscores (e.g., _ or bin_edges_unused) where
the values are not used in the plotting function (look for the histogram call in
plot_benchmark_distributions or similar); also fix the plain f-string at the
noted location (line with an f-string but no interpolation) by converting it to
a regular string literal or adding the required interpolation so it is not an
unused-fstring. Ensure these changes target the symbols ticker, bin_edges,
patches and the plain f-string in the file.
- Around line 412-420: The code currently ignores each record's arrival_offset
and overwrites r["_offset_s"] with a simple index, which distorts real timeline
spacing; update the loop over ok_records to extract a numeric arrival time from
r.get("arrival_offset") and set r["_offset_s"] to that value (converting from ms
to seconds if needed), otherwise compute a fallback using r.get("queue_wait_ms")
+ r.get("processing_ms") (converted to seconds), and only if neither exists fall
back to index-based spacing (e.g., i * 0.5); also populate arrival_offsets with
the computed offsets so downstream plotting uses the real benchmark arrival
timeline instead of row order.

---

Nitpick comments:
In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Around line 57-69: In stats_block, avoid recomputing sum(s)/len(s) inside the
stdev loop by computing mean once (e.g., assign mean = sum(s)/len(s) after
sorting and checking for emptiness) and then use that mean for both the "mean"
field and inside the stdev calculation (stdev = (sum((x - mean)**2 for x in s) /
len(s))**0.5 if s else 0); keep other fields (median, p90, p95, p99) as-is and
ensure percentile(...) calls remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2207a357-7853-4c25-989a-2e31399aab4d

📥 Commits

Reviewing files that changed from the base of the PR and between 74852e7 and 11ea377.

⛔ Files ignored due to path filters (34)
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_client_duration.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_cumulative_success.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_queue_processing.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_client_duration.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_cumulative_success.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_queue_processing.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_client_duration.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_cumulative_success.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/detailed_queue_processing.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/detailed.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/detailed_client_duration.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/detailed_cumulative_success.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/detailed_queue_processing.png is excluded by !**/*.png
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_bursty_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_bursty_300_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_bursty_600_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_random_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_random_300_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_coder3_even_random_600_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_hw3_even_bursty_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_hw3_even_random_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_hw3_even_random_300_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_hw3_even_random_600_10m.csv is excluded by !**/*.csv
📒 Files selected for processing (25)
  • logos/.gitignore
  • logos/src/logos/pipeline/context_resolver.py
  • logos/tests/performance/generate_explicit_model_workloads.py
  • logos/tests/performance/plot_benchmark_distributions.py
  • logos/tests/performance/results/.gitkeep
  • logos/tests/performance/results/explicit/10m/.gitkeep
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/provider_vram.json
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/runtime_samples.jsonl
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/provider_vram.json
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/runtime_samples.jsonl
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/provider_vram.json
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/runtime_samples.jsonl
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/provider_vram.json
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/runtime_samples.jsonl
  • logos/tests/performance/results/explicit/60m/.gitkeep
  • logos/tests/performance/results/resource/10m/.gitkeep
  • logos/tests/performance/results/resource/60m/.gitkeep
💤 Files with no reviewable changes (8)
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/run_meta.json
  • logos/tests/performance/results/explicit/10m/20260329_205541 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_211405 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_202109 - workload_explicit_local2_mistral_deepseek_bursty_200_10m/request_log_stats.json
  • logos/tests/performance/results/explicit/10m/20260329_212531 - workload_explicit_local2_mistral_deepseek_bursty_500_10m/request_log_stats.json
✅ Files skipped from review due to trivial changes (1)
  • logos/.gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • logos/tests/performance/generate_explicit_model_workloads.py

Comment on lines +46 to +69
def percentile(data: list[float], p: float) -> float:
if not data:
return 0.0
k = (len(data) - 1) * (p / 100.0)
f = int(k)
c = f + 1
if c >= len(data):
return data[-1]
return data[f] + (k - f) * (data[c] - data[f])


def stats_block(data: list[float]) -> dict[str, float]:
s = sorted(data)
return {
"count": len(s),
"min": s[0] if s else 0,
"max": s[-1] if s else 0,
"mean": sum(s) / len(s) if s else 0,
"median": percentile(s, 50),
"p90": percentile(s, 90),
"p95": percentile(s, 95),
"p99": percentile(s, 99),
"stdev": (sum((x - sum(s)/len(s))**2 for x in s) / len(s)) ** 0.5 if s else 0,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the public helpers and CLI entry points.

percentile(), stats_block(), load_records(), and main() are public top-level functions, but they currently have no docstrings. Please add parameter, return value, and exception docs so this script matches the repo rule. As per coding guidelines "Include docstrings for all public functions explaining parameters, return values, and exceptions".

Also applies to: 361-379

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 46 -
69, Add comprehensive docstrings to all public top-level functions: percentile,
stats_block, load_records, and main (also update the functions around lines
361-379). For each function include a short description, parameters with types
and meaning, the return value and type, and any exceptions that can be raised
(e.g., empty data, invalid percent values, file/IO errors, parsing errors). Use
triple-quoted docstrings placed immediately above each function so they satisfy
the repo guideline of documenting parameters, return values, and exceptions.

JakubJakubczyk and others added 4 commits April 10, 2026 19:55
… 8, drain fix

- Add VLLM_CONCURRENCY_OVERSUBSCRIPTION=3 multiplier to num_parallel:
  vLLM reports worst-case full-context concurrency, but real requests
  use ~5% of context window. 3x allows higher throughput while vLLM's
  PagedAttention handles fine-grained KV admission.
  (Qwen 7B: 36→108, Qwen 14B: 10→30, Mistral 7B: 16→48)

- Raise BACKEND_QUEUE_PRESSURE_THRESHOLD from 2→8: with oversubscription
  vLLM's internal scheduler handles queuing efficiently.

- Fix _drain_lane() to check vLLM internal queue_waiting and
  requests_running in addition to active_requests, preventing premature
  eviction while vLLM still has in-flight work.

- Add sdi/ bind mount in docker-compose.dev.yaml for live code reload.

Benchmark results at 600 req/10min: P50 90.3s→61.0s (-32%),
P95 160.3s→142.9s (-11%), 0% failures maintained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rastructure

Multi-provider routing: all three schedulers now expand ALL deployments
per model_id instead of using next() which only found the first. Each
(model_id, provider_id) pair becomes a separate scored candidate.
Queue fallback prefers logosnode deployments.

ECCS decision logging: JSON-lines log recording every scheduling
decision — candidates, classification weights, corrected scores, tiers,
selection, and correction_changed flag. Enabled via ECCS_DECISION_LOG
env var.

Weight override: ECCS_WEIGHT_OVERRIDE env var injects known
classification weights for controlled ablation experiments.

Ablation benchmark tooling: workload generators (burst-gap and
tag-differentiated), analysis script for hit rate / tier distribution /
conditional TTFT, and shell runner for ECCS on/off comparison.

ETTFT scheduling documentation: docs/ettft-scheduling.md covering the
correction formula, scientific connections, and benchmark methodology.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (4)
logos/src/logos/pipeline/context_resolver.py (1)

130-136: ⚠️ Potential issue | 🟡 Minor

Log the real backoff time, not attempt + 1.

After the first 10 retries this loop sleeps 2s per attempt, so the success log underreports elapsed time by up to 55s. That makes recovery timing misleading during incident triage.

Proposed fix
+                    elapsed_s = 0
                     for attempt in range(65):
-                        await asyncio.sleep(1.0 if attempt < 10 else 2.0)
+                        delay_s = 1 if attempt < 10 else 2
+                        await asyncio.sleep(float(delay_s))
+                        elapsed_s += delay_s
                         lane = await self._logosnode_registry.select_lane_for_model(provider_id, model_name)
                         if lane is not None:
                             logger.info(
                                 "Lane became available after %ds for provider=%s model=%s",
-                                attempt + 1, provider_name, model_name,
+                                elapsed_s, provider_name, model_name,
                             )
                             break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/context_resolver.py` around lines 130 - 136, The log
currently prints attempt + 1 which underreports elapsed sleep time because
iterations sleep 1s for the first 10 attempts and 2s thereafter; replace that by
tracking the real elapsed backoff seconds (e.g., initialize elapsed_seconds = 0
before the for loop, each iteration compute sleep_sec = 1.0 if attempt < 10 else
2.0, await asyncio.sleep(sleep_sec), then increment elapsed_seconds +=
sleep_sec) and use elapsed_seconds in the logger.info call where the code checks
the lane from _logosnode_registry.select_lane_for_model(provider_id, model_name)
so the log shows the true elapsed seconds rather than attempt + 1.
logos/tests/performance/plot_benchmark_distributions.py (2)

28-28: ⚠️ Potential issue | 🟡 Minor

This script still has lint-blocking leftovers.

ticker and rid are unused, ax.hist(...) unpacks values it never reads, and Line 494 still has an f prefix without interpolation. If Ruff/Flake8 gate CI, this file won't pass as written.

Also applies to: 114-114, 415-415, 494-494

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` at line 28, Remove
unused imports/variables and fix the stray f-string: delete the unused import
matplotlib.ticker as ticker and any unused variable named rid; change the
ax.hist(...) call so you don't unpack ignored return values (stop using
something like "n, bins, patches = ax.hist(...)" if n/bins/patches aren't used,
just call ax.hist(...)); and fix the literal with the stray "f" prefix on line
with string concatenation so it is a normal string or add proper interpolation.
Update references to these symbols (ticker, rid, ax.hist unpacking, and the
f-prefixed string) accordingly so Ruff/Flake8 no longer reports unused or
malformed string issues.

412-420: ⚠️ Potential issue | 🟠 Major

Plot real benchmark time here or rename the axis.

This block seeds _offset_s from record order and then hard-overwrites it with i * 0.5, so “seconds into benchmark” is really just request index. That distorts burst spacing and scheduler timing in the exported timeline.

Also applies to: 466-466

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/plot_benchmark_distributions.py` around lines 412 -
420, The axis is wrong because _offset_s is seeded from ok_records order and
later overwritten with a fixed spacing (i * 0.5); instead compute real timeline
offsets from available timing fields (e.g., use arrival_offsets built from
request metadata or infer from queue_wait_ms + processing_ms / 1000) and assign
r["_offset_s"] from those computed times for each record in ok_records (fall
back to the original index only when no timing data exists), and remove the
hard-coded i * 0.5 overwrite so plotted "seconds into benchmark" reflects actual
measured times; update references to arrival_offsets, ok_records, and the
_offset_s assignment accordingly.
logos/src/logos/pipeline/correcting_scheduler.py (1)

394-400: ⚠️ Potential issue | 🟠 Major

Make the post-enqueue capacity trigger non-fatal and retained.

enqueue() has already happened by this point. If get_model_name() raises, or if the detached create_task(...) is never retained, the request stays queued without the intended wake/load nudge and falls back to the slow reevaluation path.

Suggested fix
 class ClassificationCorrectingScheduler(BaseScheduler):
@@
         super().__init__(queue_manager, logosnode_facade, azure_facade, model_registry, on_capacity_needed)
         self._ettft_enabled = ettft_enabled
+        self._capacity_tasks: set[asyncio.Task[None]] = set()
@@
         if self._on_capacity_needed and provider_type == "logosnode":
-            model_name = self._logosnode.get_model_name(model_id, provider_id)
-            if model_name:
-                asyncio.create_task(
-                    self._on_capacity_needed(provider_id, model_name),
-                    name=f"capacity-needed-{model_name}-{provider_id}",
-                )
+            try:
+                model_name = self._logosnode.get_model_name(model_id, provider_id)
+            except KeyError:
+                logger.debug(
+                    "Skipping capacity trigger for unknown deployment model=%s provider=%s",
+                    model_id,
+                    provider_id,
+                )
+            else:
+                if model_name:
+                    task = asyncio.create_task(
+                        self._on_capacity_needed(provider_id, model_name),
+                        name=f"capacity-needed-{model_name}-{provider_id}",
+                    )
+                    self._capacity_tasks.add(task)
+                    task.add_done_callback(self._capacity_tasks.discard)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/correcting_scheduler.py` around lines 394 - 400,
Make the post-enqueue capacity trigger non-fatal and retained: wrap the call to
self._logosnode.get_model_name(model_id, provider_id) in a try/except to swallow
and log any exception (so a failure there doesn't leave the request queued), and
instead of fire-and-forget asyncio.create_task(...), store the returned Task in
a persistent container on the scheduler (e.g. self._capacity_tasks or
self._retained_tasks) so the task is retained; also register a done callback to
remove the task from that container when finished. Keep references to the
symbols _on_capacity_needed, provider_type == "logosnode",
_logosnode.get_model_name, asyncio.create_task, and the created task so the
capacity nudge survives GC and get_model_name exceptions are non-fatal.
🧹 Nitpick comments (2)
logos/tests/performance/generate_ablation_workloads.py (1)

273-273: Add strict=True to zip() for defensive checking.

Since all_offsets and request_archetypes should have the same length (both derived from total_requests), adding strict=True would catch any accidental mismatch.

🛠️ Suggested fix
-    for offset_ms, arch_key in zip(all_offsets, request_archetypes):
+    for offset_ms, arch_key in zip(all_offsets, request_archetypes, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/generate_ablation_workloads.py` at line 273, The
for-loop that iterates with for offset_ms, arch_key in zip(all_offsets,
request_archetypes) should defensively ensure both sequences are the same
length; update the zip call to zip(all_offsets, request_archetypes, strict=True)
so a length mismatch raises immediately, keeping iteration logic in the same
block (variables offset_ms and arch_key unchanged) and preserving behavior when
lengths match.
logos/src/logos/pipeline/utilization_scheduler.py (1)

152-152: Rename unused loop variable parallel to _parallel.

The static analysis correctly identifies that parallel is unpacked but never used within the loop body.

🛠️ Suggested fix
-        for model_id, weight, priority_int, parallel in candidates:
+        for model_id, weight, priority_int, _parallel in candidates:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/utilization_scheduler.py` at line 152, The loop in
utilization_scheduler.py unpacks four values but never uses the fourth variable
`parallel`; update the loop header in the relevant function (the for loop
currently written as "for model_id, weight, priority_int, parallel in
candidates:") to rename `parallel` to `_parallel` so the unused variable is
clearly marked as intentional and satisfies static analysis.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logos/docs/benchmark-results-analysis.md`:
- Around line 150-169: The fenced code block showing the results directory tree
lacks a language tag; update the fenced block in
logos/docs/benchmark-results-analysis.md by adding a language identifier (e.g.,
```text) to the opening fence so Markdownlint stops flagging it—locate the block
that starts with "results/" and modify its opening backticks to include the
chosen language.

In `@logos/docs/capacity-management-architecture.md`:
- Line 29: The TOC entry anchor
`#18-fairness-inversion-new-arrivals-preempt-established-queues` is incorrect
and points to a non-existent heading; update the Table of Contents entry (the
line containing "18. [Fairness Inversion: New Arrivals Preempt Established
Queues]") to use the actual heading slug "Fairness Inversion: Last-Arrival Wins
the VRAM Race" (or change the heading text to match the TOC) so the link
resolves; specifically, either replace the TOC text/anchor with
`#18-fairness-inversion-last-arrival-wins-the-vram-race` to match the heading,
or rename the heading to "Fairness Inversion: New Arrivals Preempt Established
Queues" so the existing TOC anchor is valid.

In `@logos/src/logos/pipeline/correcting_scheduler.py`:
- Around line 291-314: The code currently swallows all exceptions from
get_parallel_capacity, get_capacity_info, get_model_name and get_model_profiles
which causes silent "infinite" VRAM and zero model VRAM fallbacks; update the
try/except blocks around calls to self._logosnode.get_parallel_capacity(...),
self._logosnode.get_capacity_info(...), and the block that calls
self._logosnode.get_model_name(...) / self._logosnode.get_model_profiles(...) to
log the unexpected exceptions (including model_id and provider_id) instead of
passing silently—use the scheduler's logger (e.g. self._logger.error(...,
exc_info=True) or self._logger.exception(...)) to record the full exception and
context but keep the existing fallback assignments (available_vram_mb =
float("inf"), model_vram_mb = 0.0, kv_budget_mb = 0.0) so behavior is unchanged
except for telemetry of failures.

In `@logos/tests/benchmark_spread.sh`:
- Around line 16-18: The file currently hardcodes a root API key in the API_KEY
variable; remove the embedded "lg-root-..." literal and instead read API_KEY
from the environment (e.g., keep API_KEY="${API_KEY:-}" or require it and exit
with an error), document in the script header that users must provide API_KEY
via environment or a local secrets file, and update any README or comments to
instruct using export API_KEY=... or a .env for local benchmarking; reference
the API_KEY and API_BASE variable definitions to locate the change.

In `@logos/tests/integration/test_capacity_fixes.sh`:
- Around line 6-7: The test script currently hardcodes a privileged root token
in the LOGOS_KEY variable; change the script to read LOGOS_KEY from the
environment (e.g., use an environment variable lookup for LOGOS_KEY) and remove
the embedded token literal, and add a guard that fails the test early with a
clear message if LOGOS_KEY is not set; update any references to LOGOS_KEY in
this file (test_capacity_fixes.sh) to rely on the environment-provided value
instead of the baked-in string.
- Around line 13-30: The send_request() helper currently writes a log line and
the PID to stdout which breaks callers that capture its output (used by wait and
PIDS_B array); change send_request() so the informational log echo (the "[$tag]
Sent request..." line) is written to stderr (e.g. echo "..." >&2) and only the
PID is printed to stdout (echo "$pid"), and ensure callers append the PID
without word-splitting (use PIDS_B+=("$pid") or PIDS_B+=("$PID") where the
variable is used) so wait receives a single valid pid per element.

In `@logos/tests/performance/analyze_eccs_ablation.py`:
- Line 228: Remove the unnecessary f-string prefixes from plain string print
statements in analyze_eccs_ablation.py (the print calls that currently start
with print(f"\n  Tier distribution (selected candidates):") and the similar
prints at the other noted locations), since they do not perform interpolation
and trigger F541; update each to a normal string literal (remove the leading f)
for the prints that output static text such as "\n  Tier distribution (selected
candidates):", and the other static messages at the other reported lines.

In `@logos/tests/performance/plot_antithrash_comparison.py`:
- Around line 225-234: BEFORE_CSV and AFTER_CSV may be None when the glob finds
no files, so before calling load_metric() in main() check that both BEFORE_CSV
and AFTER_CSV are not None (or at least the ones you intend to use); if either
is None, log a clear error via print/processLogger or raise SystemExit with a
helpful message and avoid calling load_metric("ttft_ms") with None. Locate the
variables BEFORE_CSV and AFTER_CSV and the load_metric(...) calls in main(),
perform explicit None checks, and short-circuit (error/exit) before invoking
load_metric to prevent a TypeError.

In `@logos/tests/performance/plot_ollama_vs_vllm_3x.py`:
- Around line 28-47: The RUNS list uses sorted(...)[-1] which will raise
IndexError when globs return no matches; add a helper function (e.g.,
_latest_csv(base: Path, pattern: str) -> Path | None) that returns the last
match or None, replace each sorted(...)[-1] usage for the vllm3x_csv and
ollama_csv entries with calls to _latest_csv(BASE / "...", "pattern"), and
update main() (or the code that consumes RUNS) to check for None (and handle or
error gracefully) before accessing .exists() or opening files.

In `@logos/tests/performance/plot_ollama_vs_vllm.py`:
- Around line 27-46: The RUNS dict construction uses sorted(...)[-1] for several
"vllm_csv" and "ollama_csv" entries which will raise IndexError if the glob
finds no files; update each occurrence (the expressions that build "vllm_csv"
and "ollama_csv" inside RUNS) to safely handle empty results by converting the
glob iterator to a list, checking if the list is non-empty and using the last
element, otherwise either set a clear fallback (e.g., None) or raise a
descriptive ValueError with the glob path, e.g., files = sorted((BASE /
"...").glob("pattern")); if not files: raise ValueError("no results found for
..."); value = files[-1]. This avoids import-time crashes and gives a clear
error when expected files are missing.

In `@logos/tests/performance/run_eccs_ablation.sh`:
- Around line 132-140: The script sets ECCS_ETTFT_ENABLED=false but the code
checks LOGOS_SCHEDULER_ETTFT_ENABLED, so the “ECCS off” run never disables the
correcting scheduler; update the script to export
LOGOS_SCHEDULER_ETTFT_ENABLED=false (replace ECCS_ETTFT_ENABLED=false) in both
ECCS_OFF_ENV and the docker compose exec -e flags, and preserve passing the
optional ECCS_WEIGHT_OVERRIDE via ${WEIGHT_OVERRIDE:+-e
"ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"} so the run correctly toggles the
scheduler flag used by logos/src/logos/main.py.

In `@logos/tests/performance/run_ollama_benchmark_suite.sh`:
- Around line 9-23: The script's usage claims a --ollama-base flag but the code
only reads the first positional arg into OLLAMA_BASE, so calling it with
`--ollama-base URL` breaks; update run_ollama_benchmark_suite.sh to properly
parse arguments (e.g., iterate "$@" or use getopts/while loop), detect the
--ollama-base flag and set OLLAMA_BASE to its following value (fall back to
default http://localhost:11434 or accept a single positional URL if provided),
and ensure existing references to OLLAMA_BASE remain unchanged.

In `@logos/tests/performance/run_ollama_benchmark.py`:
- Around line 445-446: The current dict is incorrectly putting the synthetic
benchmarking normalization overhead_ms into "queue_wait_ms"; change
"queue_wait_ms" to use the observed queue wait value from the result record
(e.g., r.queue_wait_ms or a default 0) instead of overhead_ms, keep
"processing_ms": r.client_duration_ms as-is, and add a separate field (e.g.,
"synthetic_overhead_ms" or "normalization_overhead_ms") set to overhead_ms so
the synthetic adjustment is recorded but not mixed into real queue wait
measurements.
- Around line 172-179: The unload loop currently ignores the HTTP response from
the Ollama POST to f"{base_url}/api/generate" (the async block using
httpx.AsyncClient and the loop over running/models), so if an unload is rejected
we should fail fast: await the response for each unload request, inspect
response.status_code and response body (or response.json()) for errors, and if
the request is not successful raise an exception or exit the run immediately
with a clear error message indicating which model (name/model) failed to unload;
ensure this check is applied inside the same async loop that posts {"model":
name, "keep_alive": 0}.
- Around line 643-670: The plot currently uses the loop index `i` as the x-axis,
which shows completion order; replace that with each record's preserved arrival
time by using `r.get("arrival_offset_ms")` (or a converted seconds value) as the
x coordinate when calling `ax1.scatter` and `ax2.scatter`; ensure you handle
missing/None values like you do for `ttft_ms` and `total_latency_ms`, and
(optionally) sort `successful` by `arrival_offset_ms` before plotting so the
timeline is chronological; update references to `i` in the two scatter calls
(and axis label if you change units) so the chart reflects real benchmark time.

---

Duplicate comments:
In `@logos/src/logos/pipeline/context_resolver.py`:
- Around line 130-136: The log currently prints attempt + 1 which underreports
elapsed sleep time because iterations sleep 1s for the first 10 attempts and 2s
thereafter; replace that by tracking the real elapsed backoff seconds (e.g.,
initialize elapsed_seconds = 0 before the for loop, each iteration compute
sleep_sec = 1.0 if attempt < 10 else 2.0, await asyncio.sleep(sleep_sec), then
increment elapsed_seconds += sleep_sec) and use elapsed_seconds in the
logger.info call where the code checks the lane from
_logosnode_registry.select_lane_for_model(provider_id, model_name) so the log
shows the true elapsed seconds rather than attempt + 1.

In `@logos/src/logos/pipeline/correcting_scheduler.py`:
- Around line 394-400: Make the post-enqueue capacity trigger non-fatal and
retained: wrap the call to self._logosnode.get_model_name(model_id, provider_id)
in a try/except to swallow and log any exception (so a failure there doesn't
leave the request queued), and instead of fire-and-forget
asyncio.create_task(...), store the returned Task in a persistent container on
the scheduler (e.g. self._capacity_tasks or self._retained_tasks) so the task is
retained; also register a done callback to remove the task from that container
when finished. Keep references to the symbols _on_capacity_needed, provider_type
== "logosnode", _logosnode.get_model_name, asyncio.create_task, and the created
task so the capacity nudge survives GC and get_model_name exceptions are
non-fatal.

In `@logos/tests/performance/plot_benchmark_distributions.py`:
- Line 28: Remove unused imports/variables and fix the stray f-string: delete
the unused import matplotlib.ticker as ticker and any unused variable named rid;
change the ax.hist(...) call so you don't unpack ignored return values (stop
using something like "n, bins, patches = ax.hist(...)" if n/bins/patches aren't
used, just call ax.hist(...)); and fix the literal with the stray "f" prefix on
line with string concatenation so it is a normal string or add proper
interpolation. Update references to these symbols (ticker, rid, ax.hist
unpacking, and the f-prefixed string) accordingly so Ruff/Flake8 no longer
reports unused or malformed string issues.
- Around line 412-420: The axis is wrong because _offset_s is seeded from
ok_records order and later overwritten with a fixed spacing (i * 0.5); instead
compute real timeline offsets from available timing fields (e.g., use
arrival_offsets built from request metadata or infer from queue_wait_ms +
processing_ms / 1000) and assign r["_offset_s"] from those computed times for
each record in ok_records (fall back to the original index only when no timing
data exists), and remove the hard-coded i * 0.5 overwrite so plotted "seconds
into benchmark" reflects actual measured times; update references to
arrival_offsets, ok_records, and the _offset_s assignment accordingly.

---

Nitpick comments:
In `@logos/src/logos/pipeline/utilization_scheduler.py`:
- Line 152: The loop in utilization_scheduler.py unpacks four values but never
uses the fourth variable `parallel`; update the loop header in the relevant
function (the for loop currently written as "for model_id, weight, priority_int,
parallel in candidates:") to rename `parallel` to `_parallel` so the unused
variable is clearly marked as intentional and satisfies static analysis.

In `@logos/tests/performance/generate_ablation_workloads.py`:
- Line 273: The for-loop that iterates with for offset_ms, arch_key in
zip(all_offsets, request_archetypes) should defensively ensure both sequences
are the same length; update the zip call to zip(all_offsets, request_archetypes,
strict=True) so a length mismatch raises immediately, keeping iteration logic in
the same block (variables offset_ms and arch_key unchanged) and preserving
behavior when lengths match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e1b5cf5-50d8-47d1-a39a-a4fc2db446d6

📥 Commits

Reviewing files that changed from the base of the PR and between 11ea377 and 37a2702.

⛔ Files ignored due to path filters (40)
  • logos/tests/performance/results/comparative/150/comparison_total_latency.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/150/comparison_ttft.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/150/ollama_detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/comparative/150/ollama_summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/comparative/300/comparison_total_latency.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/300/comparison_ttft.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/300/ollama_detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/comparative/300/ollama_summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/comparative/600/comparison_total_latency.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/600/comparison_ttft.png is excluded by !**/*.png
  • logos/tests/performance/results/comparative/600/ollama_detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/comparative/600/ollama_summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/150/chart_queue_wait_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/150/chart_timeline.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/150/chart_total_latency_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/150/chart_ttft_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/150/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/150/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/300/chart_queue_wait_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/300/chart_timeline.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/300/chart_total_latency_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/300/chart_ttft_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/300/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/300/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/600/chart_queue_wait_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/600/chart_timeline.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/600/chart_total_latency_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/600/chart_ttft_distribution.png is excluded by !**/*.png
  • logos/tests/performance/results/logosworkernode-singular/600/detailed.csv is excluded by !**/*.csv
  • logos/tests/performance/results/logosworkernode-singular/600/summary.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_150_burst5_gap30.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_150_burst5_gap70.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_150_burst8_gap20.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_eccs_hw3_even_random_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_eccs_hw3_even_random_300_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_eccs_hw3_even_random_600_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_resource_hw3_even_random_150_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_resource_hw3_even_random_300_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/ablation/workload_resource_hw3_even_random_600_10m.csv is excluded by !**/*.csv
  • logos/tests/performance/workloads/explicit/10m/workload_explicit_hw3_even_random_1200_10m.csv is excluded by !**/*.csv
📒 Files selected for processing (39)
  • logos/docker-compose.dev.yaml
  • logos/docs/benchmark-results-analysis.md
  • logos/docs/capacity-management-architecture.md
  • logos/docs/ettft-scheduling.md
  • logos/logos-workernode/config.yml
  • logos/logos-workernode/data/lanes.json
  • logos/logos-workernode/data/model_profiles.yml
  • logos/logos-workernode/logos_worker_node/config.py
  • logos/logos-workernode/logos_worker_node/logos_bridge.py
  • logos/src/logos/capacity/capacity_planner.py
  • logos/src/logos/pipeline/context_resolver.py
  • logos/src/logos/pipeline/correcting_scheduler.py
  • logos/src/logos/pipeline/ettft_estimator.py
  • logos/src/logos/pipeline/fcfs_scheduler.py
  • logos/src/logos/pipeline/utilization_scheduler.py
  • logos/src/logos/sdi/providers/logosnode_provider.py
  • logos/tests/benchmark_spread.sh
  • logos/tests/integration/test_capacity_fixes.sh
  • logos/tests/performance/analyze_eccs_ablation.py
  • logos/tests/performance/generate_ablation_workloads.py
  • logos/tests/performance/generate_eccs_ablation_workloads.py
  • logos/tests/performance/generate_explicit_model_workloads.py
  • logos/tests/performance/plot_antithrash_comparison.py
  • logos/tests/performance/plot_benchmark_distributions.py
  • logos/tests/performance/plot_ollama_vs_vllm.py
  • logos/tests/performance/plot_ollama_vs_vllm_3x.py
  • logos/tests/performance/plot_ollama_vs_vllm_ttft.py
  • logos/tests/performance/results/logosworkernode-singular/150/request_log_stats.json
  • logos/tests/performance/results/logosworkernode-singular/150/run_meta.json
  • logos/tests/performance/results/logosworkernode-singular/300/request_log_stats.json
  • logos/tests/performance/results/logosworkernode-singular/300/run_meta.json
  • logos/tests/performance/results/logosworkernode-singular/600/request_log_stats.json
  • logos/tests/performance/results/logosworkernode-singular/600/run_meta.json
  • logos/tests/performance/run_eccs_ablation.sh
  • logos/tests/performance/run_ollama_benchmark.py
  • logos/tests/performance/run_ollama_benchmark_suite.sh
  • logos/tests/smoke_test_capacity.sh
  • logos/tests/unit/pipeline/test_correcting_scheduler.py
  • logos/tests/unit/pipeline/test_ettft_estimator.py
💤 Files with no reviewable changes (2)
  • logos/logos-workernode/logos_worker_node/logos_bridge.py
  • logos/logos-workernode/logos_worker_node/config.py
✅ Files skipped from review due to trivial changes (7)
  • logos/tests/performance/results/logosworkernode-singular/600/run_meta.json
  • logos/tests/performance/results/logosworkernode-singular/300/run_meta.json
  • logos/tests/performance/results/logosworkernode-singular/150/run_meta.json
  • logos/tests/performance/results/logosworkernode-singular/300/request_log_stats.json
  • logos/tests/performance/results/logosworkernode-singular/150/request_log_stats.json
  • logos/tests/performance/results/logosworkernode-singular/600/request_log_stats.json
  • logos/logos-workernode/data/lanes.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • logos/src/logos/sdi/providers/logosnode_provider.py

Comment on lines +150 to +169
```
results/
├── logosworkernode-singular/ # Per-model distribution charts + data
│ ├── 150/
│ │ ├── chart_ttft_distribution.png
│ │ ├── chart_total_latency_distribution.png
│ │ ├── chart_queue_wait_distribution.png
│ │ ├── chart_timeline.png
│ │ ├── summary.csv
│ │ └── detailed.csv
│ ├── 300/ (same structure)
│ └── 600/ (same structure)
├── comparative/ # Ollama vs LogosWorkerNode overlays
│ ├── 150/
│ │ ├── comparison_ttft.png
│ │ └── comparison_total_latency.png
│ ├── 300/ (same structure)
│ └── 600/ (same structure)
└── legacy/ # Previous iteration results
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to this fenced block.

Markdownlint will keep flagging this fence without a language. text is enough here.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 168-168: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/docs/benchmark-results-analysis.md` around lines 150 - 169, The fenced
code block showing the results directory tree lacks a language tag; update the
fenced block in logos/docs/benchmark-results-analysis.md by adding a language
identifier (e.g., ```text) to the opening fence so Markdownlint stops flagging
it—locate the block that starts with "results/" and modify its opening backticks
to include the chosen language.

15. [Deadlock, Starvation & Race Condition Analysis](#15-deadlock-starvation--race-condition-analysis)
16. [Benchmark Evidence](#16-benchmark-evidence)
17. [Refinement Proposals](#17-refinement-proposals)
18. [Fairness Inversion: New Arrivals Preempt Established Queues](#18-fairness-inversion-new-arrivals-preempt-established-queues)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the section 18 TOC anchor.

The fragment here points to #18-fairness-inversion-new-arrivals-preempt-established-queues, but the actual heading at Line 1213 is “Fairness Inversion: Last-Arrival Wins the VRAM Race”. As written, the TOC entry is a dead link.

Suggested fix
-18. [Fairness Inversion: New Arrivals Preempt Established Queues](`#18-fairness-inversion-new-arrivals-preempt-established-queues`)
+18. [Fairness Inversion: Last-Arrival Wins the VRAM Race](`#18-fairness-inversion-last-arrival-wins-the-vram-race`)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
18. [Fairness Inversion: New Arrivals Preempt Established Queues](#18-fairness-inversion-new-arrivals-preempt-established-queues)
18. [Fairness Inversion: Last-Arrival Wins the VRAM Race](`#18-fairness-inversion-last-arrival-wins-the-vram-race`)
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 29-29: Link fragments should be valid

(MD051, link-fragments)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/docs/capacity-management-architecture.md` at line 29, The TOC entry
anchor `#18-fairness-inversion-new-arrivals-preempt-established-queues` is
incorrect and points to a non-existent heading; update the Table of Contents
entry (the line containing "18. [Fairness Inversion: New Arrivals Preempt
Established Queues]") to use the actual heading slug "Fairness Inversion:
Last-Arrival Wins the VRAM Race" (or change the heading text to match the TOC)
so the link resolves; specifically, either replace the TOC text/anchor with
`#18-fairness-inversion-last-arrival-wins-the-vram-race` to match the heading,
or rename the heading to "Fairness Inversion: New Arrivals Preempt Established
Queues" so the existing TOC anchor is valid.

Comment on lines +291 to +314
try:
effective_parallel, _ = self._logosnode.get_parallel_capacity(model_id, provider_id)
except (KeyError, Exception):
pass

available_vram_mb = float("inf")
try:
cap = self._logosnode.get_capacity_info(provider_id)
available_vram_mb = float(cap.available_vram_mb)
except (KeyError, Exception):
pass

model_vram_mb = 0.0
kv_budget_mb = 0.0
try:
model_name = self._logosnode.get_model_name(model_id, provider_id)
if model_name:
profiles = self._logosnode.get_model_profiles(provider_id)
if model_name in profiles:
profile = profiles[model_name]
model_vram_mb = profile.estimate_vram_mb()
kv_budget_mb = float(profile.kv_budget_mb or 0)
except (KeyError, Exception):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Log unexpected ETTFT fallback failures instead of silently assuming free capacity.

If get_capacity_info() or get_model_profiles() starts failing here, the estimator falls back to available_vram_mb = inf and model_vram_mb = 0.0, which suppresses reclaim penalties and can reorder providers as if VRAM were unconstrained. That makes scheduling decisions depend on telemetry failures rather than real capacity.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 293-294: try-except-pass detected, consider logging the exception

(S110)


[warning] 293-293: Do not catch blind exception: Exception

(BLE001)


[error] 300-301: try-except-pass detected, consider logging the exception

(S110)


[warning] 300-300: Do not catch blind exception: Exception

(BLE001)


[error] 313-314: try-except-pass detected, consider logging the exception

(S110)


[warning] 313-313: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/src/logos/pipeline/correcting_scheduler.py` around lines 291 - 314, The
code currently swallows all exceptions from get_parallel_capacity,
get_capacity_info, get_model_name and get_model_profiles which causes silent
"infinite" VRAM and zero model VRAM fallbacks; update the try/except blocks
around calls to self._logosnode.get_parallel_capacity(...),
self._logosnode.get_capacity_info(...), and the block that calls
self._logosnode.get_model_name(...) / self._logosnode.get_model_profiles(...) to
log the unexpected exceptions (including model_id and provider_id) instead of
passing silently—use the scheduler's logger (e.g. self._logger.error(...,
exc_info=True) or self._logger.exception(...)) to record the full exception and
context but keep the existing fallback assignments (available_vram_mb =
float("inf"), model_vram_mb = 0.0, kv_budget_mb = 0.0) so behavior is unchanged
except for telemetry of failures.

Comment on lines +16 to +18
API_BASE="${API_BASE:-http://localhost:18080}"
API_KEY="${API_KEY:-lg-root-ifoY6N0fWiWD8_wUvrmeXY-Ft6aEGmbeBKQT47R_JtDrHVm1guuA9beCo3LEbQg8qZQKBP8iye73e8EVce1L6DHhaK3r3sMJjpcNDNWqPsGB6TmXd_dHqYOXz4GrKV9M}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the embedded root API key.

This commits a live-looking lg-root-... credential into source control. Even for local benchmarks, privileged keys should come from the environment or a local-only secrets file, not the repository.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/benchmark_spread.sh` around lines 16 - 18, The file currently
hardcodes a root API key in the API_KEY variable; remove the embedded
"lg-root-..." literal and instead read API_KEY from the environment (e.g., keep
API_KEY="${API_KEY:-}" or require it and exit with an error), document in the
script header that users must provide API_KEY via environment or a local secrets
file, and update any README or comments to instruct using export API_KEY=... or
a .env for local benchmarking; reference the API_KEY and API_BASE variable
definitions to locate the change.

Comment on lines +6 to +7
LOGOS_KEY="lg-root-ifoY6N0fWiWD8_wUvrmeXY-Ft6aEGmbeBKQT47R_JtDrHVm1guuA9beCo3LEbQg8qZQKBP8iye73e8EVce1L6DHhaK3r3sMJjpcNDNWqPsGB6TmXd_dHqYOXz4GrKV9M"
BASE_URL="http://localhost:18080"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the embedded root key from the test script.

This bakes a privileged lg-root-... token into the repository. Pull it from the environment instead so the script can run locally without leaking credentials into source control.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/integration/test_capacity_fixes.sh` around lines 6 - 7, The test
script currently hardcodes a privileged root token in the LOGOS_KEY variable;
change the script to read LOGOS_KEY from the environment (e.g., use an
environment variable lookup for LOGOS_KEY) and remove the embedded token
literal, and add a guard that fails the test early with a clear message if
LOGOS_KEY is not set; update any references to LOGOS_KEY in this file
(test_capacity_fixes.sh) to rely on the environment-provided value instead of
the baked-in string.

Comment on lines +132 to +140
ECCS_OFF_ENV="ECCS_ETTFT_ENABLED=false"
if [[ -n "$WEIGHT_OVERRIDE" ]]; then
ECCS_OFF_ENV="$ECCS_OFF_ENV ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"
fi

docker compose exec \
-e "ECCS_DECISION_LOG=/tmp/eccs_decisions_off.jsonl" \
-e "ECCS_ETTFT_ENABLED=false" \
${WEIGHT_OVERRIDE:+-e "ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"} \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The “ECCS off” run never disables the correcting scheduler.

logos/src/logos/main.py:1307-1315 reads LOGOS_SCHEDULER_ETTFT_ENABLED, but this script exports ECCS_ETTFT_ENABLED=false. As written, Run 2 still uses the same scheduler mode, so the ablation baseline is invalid.

Proposed fix
-    ECCS_OFF_ENV="ECCS_ETTFT_ENABLED=false"
+    ECCS_OFF_ENV="LOGOS_SCHEDULER_ETTFT_ENABLED=false"
@@
-        -e "ECCS_ETTFT_ENABLED=false" \
+        -e "LOGOS_SCHEDULER_ETTFT_ENABLED=false" \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ECCS_OFF_ENV="ECCS_ETTFT_ENABLED=false"
if [[ -n "$WEIGHT_OVERRIDE" ]]; then
ECCS_OFF_ENV="$ECCS_OFF_ENV ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"
fi
docker compose exec \
-e "ECCS_DECISION_LOG=/tmp/eccs_decisions_off.jsonl" \
-e "ECCS_ETTFT_ENABLED=false" \
${WEIGHT_OVERRIDE:+-e "ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"} \
ECCS_OFF_ENV="LOGOS_SCHEDULER_ETTFT_ENABLED=false"
if [[ -n "$WEIGHT_OVERRIDE" ]]; then
ECCS_OFF_ENV="$ECCS_OFF_ENV ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"
fi
docker compose exec \
-e "ECCS_DECISION_LOG=/tmp/eccs_decisions_off.jsonl" \
-e "LOGOS_SCHEDULER_ETTFT_ENABLED=false" \
${WEIGHT_OVERRIDE:+-e "ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"} \
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 140-140: Quotes/backslashes in this variable will not be respected.

(SC2090)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/run_eccs_ablation.sh` around lines 132 - 140, The
script sets ECCS_ETTFT_ENABLED=false but the code checks
LOGOS_SCHEDULER_ETTFT_ENABLED, so the “ECCS off” run never disables the
correcting scheduler; update the script to export
LOGOS_SCHEDULER_ETTFT_ENABLED=false (replace ECCS_ETTFT_ENABLED=false) in both
ECCS_OFF_ENV and the docker compose exec -e flags, and preserve passing the
optional ECCS_WEIGHT_OVERRIDE via ${WEIGHT_OVERRIDE:+-e
"ECCS_WEIGHT_OVERRIDE=$WEIGHT_OVERRIDE"} so the run correctly toggles the
scheduler flag used by logos/src/logos/main.py.

Comment on lines +9 to +23
# Usage:
# cd logos/
# bash tests/performance/run_ollama_benchmark_suite.sh [--ollama-base URL]
#
# Prerequisites:
# pip install httpx matplotlib numpy (if not already available)
# Ollama must be running (e.g. `ollama serve` or via Docker)
# ──────────────────────────────────────────────────────────────────────────
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
cd "$REPO_ROOT"

OLLAMA_BASE="${1:-http://localhost:11434}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The documented --ollama-base flag is not actually parsed.

Right now the script treats $1 as the URL directly, so bash tests/performance/run_ollama_benchmark_suite.sh --ollama-base http://... sets OLLAMA_BASE="--ollama-base" and the first curl fails. Parse the flag or change the usage text.

Proposed fix
-OLLAMA_BASE="${1:-http://localhost:11434}"
+OLLAMA_BASE="http://localhost:11434"
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --ollama-base)
+            OLLAMA_BASE="$2"
+            shift 2
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Usage:
# cd logos/
# bash tests/performance/run_ollama_benchmark_suite.sh [--ollama-base URL]
#
# Prerequisites:
# pip install httpx matplotlib numpy (if not already available)
# Ollama must be running (e.g. `ollama serve` or via Docker)
# ──────────────────────────────────────────────────────────────────────────
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
cd "$REPO_ROOT"
OLLAMA_BASE="${1:-http://localhost:11434}"
# Usage:
# cd logos/
# bash tests/performance/run_ollama_benchmark_suite.sh [--ollama-base URL]
#
# Prerequisites:
# pip install httpx matplotlib numpy (if not already available)
# Ollama must be running (e.g. `ollama serve` or via Docker)
# ──────────────────────────────────────────────────────────────────────────
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
cd "$REPO_ROOT"
OLLAMA_BASE="http://localhost:11434"
while [[ $# -gt 0 ]]; do
case "$1" in
--ollama-base)
OLLAMA_BASE="$2"
shift 2
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/run_ollama_benchmark_suite.sh` around lines 9 - 23,
The script's usage claims a --ollama-base flag but the code only reads the first
positional arg into OLLAMA_BASE, so calling it with `--ollama-base URL` breaks;
update run_ollama_benchmark_suite.sh to properly parse arguments (e.g., iterate
"$@" or use getopts/while loop), detect the --ollama-base flag and set
OLLAMA_BASE to its following value (fall back to default http://localhost:11434
or accept a single positional URL if provided), and ensure existing references
to OLLAMA_BASE remain unchanged.

Comment on lines +172 to +179
async with httpx.AsyncClient(timeout=120.0) as client:
for m in running:
name = m.get("name", m.get("model", ""))
print(f"Unloading {name}...")
await client.post(
f"{base_url}/api/generate",
json={"model": name, "keep_alive": 0},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when an unload request is rejected.

This loop ignores the /api/generate response entirely. If Ollama rejects one unload, the run still proceeds and is reported as a cold start even though models may still be resident in memory.

Suggested fix
         for m in running:
             name = m.get("name", m.get("model", ""))
             print(f"Unloading {name}...")
-            await client.post(
+            resp = await client.post(
                 f"{base_url}/api/generate",
                 json={"model": name, "keep_alive": 0},
             )
+            resp.raise_for_status()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/run_ollama_benchmark.py` around lines 172 - 179, The
unload loop currently ignores the HTTP response from the Ollama POST to
f"{base_url}/api/generate" (the async block using httpx.AsyncClient and the loop
over running/models), so if an unload is rejected we should fail fast: await the
response for each unload request, inspect response.status_code and response body
(or response.json()) for errors, and if the request is not successful raise an
exception or exit the run immediately with a clear error message indicating
which model (name/model) failed to unload; ensure this check is applied inside
the same async loop that posts {"model": name, "keep_alive": 0}.

Comment on lines +445 to +446
"queue_wait_ms": overhead_ms, # the 0.5s overhead stands in for queue wait
"processing_ms": r.client_duration_ms,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the synthetic overhead out of queue_wait_ms.

The fixed overhead_ms is a benchmarking normalization, not observed queue time. Writing it into queue_wait_ms fabricates a queueing distribution and will mislead downstream analyses that treat this column as real wait time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/run_ollama_benchmark.py` around lines 445 - 446, The
current dict is incorrectly putting the synthetic benchmarking normalization
overhead_ms into "queue_wait_ms"; change "queue_wait_ms" to use the observed
queue wait value from the result record (e.g., r.queue_wait_ms or a default 0)
instead of overhead_ms, keep "processing_ms": r.client_duration_ms as-is, and
add a separate field (e.g., "synthetic_overhead_ms" or
"normalization_overhead_ms") set to overhead_ms so the synthetic adjustment is
recorded but not mixed into real queue wait measurements.

Comment on lines +643 to +670
# Sort by request_id to get chronological order
model_colors = {}
color_cycle = ["#4C72B0", "#55A868", "#C44E52", "#8172B2", "#CCB974"]
for r in successful:
m = r.get("model_name", "")
if m not in model_colors:
model_colors[m] = color_cycle[len(model_colors) % len(color_cycle)]

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
fig.patch.set_facecolor(BG_COLOR)
for ax in (ax1, ax2):
ax.set_facecolor(BG_COLOR)

for i, r in enumerate(successful):
color = model_colors.get(r.get("model_name", ""), "#999999")
ttft = r.get("ttft_ms")
latency = r.get("total_latency_ms")
if isinstance(ttft, (int, float)):
ax1.scatter(i, ttft, c=color, s=15, alpha=0.7)
if isinstance(latency, (int, float)):
ax2.scatter(i, latency, c=color, s=15, alpha=0.7)

ax1.set_ylabel("TTFT (ms)")
ax1.set_title("TTFT over Request Order")
ax1.grid(True, alpha=0.3)
ax2.set_ylabel("Total Latency (ms)")
ax2.set_title("Total Latency over Request Order")
ax2.set_xlabel("Request index")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use arrival_offset_ms for the timeline instead of request order.

build_results() already preserves each workload arrival offset, but this plot ignores it and scatters by loop index. The generated chart therefore shows completion order, not benchmark time, which hides burstiness and idle gaps.

Suggested fix
-    # Sort by request_id to get chronological order
     model_colors = {}
@@
-    for i, r in enumerate(successful):
+    for i, r in enumerate(successful):
         color = model_colors.get(r.get("model_name", ""), "#999999")
+        offset_ms = r.get("arrival_offset_ms")
+        x = (
+            float(offset_ms) / 1000.0
+            if isinstance(offset_ms, (int, float))
+            else float(i)
+        )
         ttft = r.get("ttft_ms")
         latency = r.get("total_latency_ms")
         if isinstance(ttft, (int, float)):
-            ax1.scatter(i, ttft, c=color, s=15, alpha=0.7)
+            ax1.scatter(x, ttft, c=color, s=15, alpha=0.7)
         if isinstance(latency, (int, float)):
-            ax2.scatter(i, latency, c=color, s=15, alpha=0.7)
+            ax2.scatter(x, latency, c=color, s=15, alpha=0.7)
@@
-    ax2.set_xlabel("Request index")
+    ax2.set_xlabel("Time (seconds into benchmark)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logos/tests/performance/run_ollama_benchmark.py` around lines 643 - 670, The
plot currently uses the loop index `i` as the x-axis, which shows completion
order; replace that with each record's preserved arrival time by using
`r.get("arrival_offset_ms")` (or a converted seconds value) as the x coordinate
when calling `ax1.scatter` and `ax2.scatter`; ensure you handle missing/None
values like you do for `ttft_ms` and `total_latency_ms`, and (optionally) sort
`successful` by `arrival_offset_ms` before plotting so the timeline is
chronological; update references to `i` in the two scatter calls (and axis label
if you change units) so the chart reflects real benchmark time.

@github-actions
Copy link
Copy Markdown

There hasn't been any activity on this pull request recently. Therefore, this pull request has been automatically marked as stale and will be closed if no further activity occurs within seven days. Thank you for your contributions.

@github-actions github-actions Bot added stale and removed stale labels Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet