Conversation
…tch, Python missing deps
📝 WalkthroughWalkthroughThis PR introduces Desktop Nova (local AI organization execution mode), production Oracle Cloud deployment infrastructure with Cloudflare Tunnel, multi-step reasoning chains for agents, specialized prompt templates, file-edit tooling for surgical code modifications, comprehensive test coverage, and upgrades to Next.js/React/Go toolchains. It restructures agent design around structured LLM reasoning workflows with budget tracking, cost validation, and precision code generation. Changes
Sequence DiagramssequenceDiagram
participant User as User (TUI)
participant Desktop as Desktop Nova
participant Orchestrator as OrchestratorEngine
participant CEO as CEO Agent
participant CTO as CTO Agent
participant Swarm as Agent Swarm
User->>Desktop: python desktop_nova.py "goal"
activate Desktop
Desktop->>Orchestrator: initialize(workspace="./output")
Desktop->>CEO: create with LLM client
Desktop->>CTO: create with LLM client
Desktop->>Swarm: create specialist agents
Desktop->>Orchestrator: register_agent(role, instance)
Orchestrator->>Orchestrator: setup event logger callback
Desktop->>Orchestrator: start_project(business_idea="goal")
activate Orchestrator
Orchestrator->>CEO: execute()
activate CEO
CEO->>CEO: ReasoningChain (analyze→generate→critique→refine)
CEO-->>Orchestrator: emit(AgentEvent) per step
CEO-->>Orchestrator: return BusinessPlan
deactivate CEO
Orchestrator->>CTO: execute()
activate CTO
CTO->>CTO: ReasoningChain (analyze→generate→critique→refine)
CTO-->>Orchestrator: emit(AgentEvent) per step
CTO-->>Orchestrator: return Architecture
deactivate CTO
Orchestrator->>Swarm: execute remaining agents sequentially
Swarm-->>Orchestrator: emit events + return artifacts
Orchestrator-->>Desktop: poll status until completed/failed
deactivate Orchestrator
Desktop->>Desktop: event_logger(event) - color format and print
Desktop->>User: print success with output path
deactivate Desktop
sequenceDiagram
participant Git as Git (main push)
participant GH as GitHub Actions
participant Oracle as Oracle Cloud VM
participant Tunnel as Cloudflare Tunnel
participant Services as Docker Services
Git->>GH: push to main
activate GH
GH->>GH: trigger deploy-oracle workflow
GH->>Oracle: SSH (secrets.ORACLE_HOST, SSH_KEY)
activate Oracle
Oracle->>Oracle: git pull origin main
Oracle->>Services: docker compose pull
Oracle->>Services: docker compose build --parallel
Oracle->>Services: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --remove-orphans
Oracle->>Services: sleep 45 (service stabilization)
Oracle->>Services: curl http://localhost:8080/health
Oracle->>Services: curl http://localhost:9091/healthz
Services-->>Oracle: health check response
Services->>Tunnel: cloudflared tunnel run
activate Tunnel
Tunnel->>Tunnel: read TUNNEL_TOKEN env
Tunnel->>Services: proxy external traffic to internal ports
deactivate Tunnel
Oracle-->>GH: health check result
deactivate Oracle
GH-->>Git: deployment status
deactivate GH
sequenceDiagram
participant LLM as LLM Provider
participant ReasoningChain as ReasoningChain
participant Step1 as Step: Analyze
participant Step2 as Step: Generate
participant Step3 as Step: Critique
participant Step4 as Step: Refine
participant Schema as Pydantic Schema
participant Callback as on_step Callback
ReasoningChain->>Step1: format_prompt(context)
Step1->>LLM: call with temperature, response_format
LLM-->>Step1: JSON response
Step1->>Step1: parse JSON
Step1-->>Callback: on_step("analyze", preview)
Step1-->>ReasoningChain: store output in context
ReasoningChain->>Step2: format_prompt(context + analyze_output)
Step2->>LLM: call with temperature, response_format
LLM-->>Step2: JSON response
Step2-->>Callback: on_step("generate", preview)
Step2-->>ReasoningChain: store output in context
ReasoningChain->>Step3: format_prompt(context + generate_output)
Step3->>LLM: call with temperature, response_format
LLM-->>Step3: JSON response
Step3-->>Callback: on_step("critique", preview)
Step3-->>ReasoningChain: store output in context
ReasoningChain->>Step4: format_prompt(context + critique_output)
Step4->>LLM: call with temperature, response_format
LLM-->>Step4: JSON response
Step4-->>Callback: on_step("refine", preview)
Step4-->>ReasoningChain: store output in context
ReasoningChain->>Schema: validate(output)
alt Validation Success
Schema-->>ReasoningChain: validated instance
else Validation Fails
ReasoningChain->>LLM: call with validation error details
LLM-->>ReasoningChain: corrected response
ReasoningChain->>Schema: retry validation (max_retries)
end
ReasoningChain-->>ReasoningChain: return final context dict
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Code Review
This pull request focuses on production hardening and deployment infrastructure, introducing a Cloudflare Tunnel setup, a comprehensive Oracle Cloud deployment guide, and a service health check script. Key technical updates include upgrading the backend to Go 1.25, migrating the dashboard to React 19 and NextAuth v5, and refactoring the SagaCoordinator to use Redis for distributed state. Feedback identifies critical concurrency issues in the new Redis-based saga management, specifically race conditions in RegisterSaga and non-atomic updates in Advance, as well as a security concern regarding the fallback to a wildcard CORS policy.
| exists, _ := sc.redis.Client.Exists(ctx, key).Result() | ||
| if exists > 0 { | ||
| return | ||
| } | ||
|
|
||
| state := &SagaState{ | ||
| ProjectID: projectID, | ||
| CurrentStep: 0, | ||
| Payload: initialPayload, | ||
| Status: "forward", | ||
| } | ||
|
|
||
| data, _ := json.Marshal(state) | ||
| sc.redis.Client.Set(ctx, key, data, 24*time.Hour) |
There was a problem hiding this comment.
The RegisterSaga implementation contains a race condition and lacks error handling. The check-then-set pattern (lines 56-69) is not atomic. Additionally, errors from Redis operations are ignored, which is problematic for a 'Hardening' phase.
Consider using SetNX to atomically initialize the saga state only if it doesn't exist, and ensure Redis errors are handled or logged.
func (sc *SagaCoordinator) RegisterSaga(ctx context.Context, projectID string, initialPayload map[string]any) {
key := fmt.Sprintf("saga:state:%s", projectID)
state := &SagaState{
ProjectID: projectID,
CurrentStep: 0,
Payload: initialPayload,
Status: "forward",
}
data, err := json.Marshal(state)
if err != nil {
return
}
if err := sc.redis.Client.SetNX(ctx, key, data, 24*time.Hour).Err(); err != nil {
// Log error appropriately
}
}| func (sc *SagaCoordinator) Advance(ctx context.Context, projectID string, agentRole string) { | ||
| key := fmt.Sprintf("saga:state:%s", projectID) | ||
|
|
||
| val, err := sc.redis.Client.Get(ctx, key).Result() | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| var state SagaState | ||
| if err := json.Unmarshal([]byte(val), &state); err != nil { | ||
| return | ||
| } | ||
|
|
||
| state.History = append(state.History, fmt.Sprintf("%s_completed_at_%s", agentRole, time.Now().Format(time.RFC3339))) | ||
| state.CurrentStep++ | ||
|
|
||
| data, _ := json.Marshal(&state) | ||
| sc.redis.Client.Set(ctx, key, data, 24*time.Hour) | ||
| } |
There was a problem hiding this comment.
The Advance method implements a non-atomic read-modify-write pattern on the Redis state. In a distributed environment with multiple orchestrator instances, this can lead to lost updates if two agents finish tasks for the same project simultaneously.
To harden this, you should use a Redis Lua script to perform the update atomically or implement optimistic concurrency control using a version field in the SagaState.
| redis *db.Redis | ||
| producer *kafka.Producer | ||
| sagas map[string]*SagaState // In-memory for now, should be in DB/Redis for multi-node | ||
| mu sync.RWMutex |
| if origins == "" { | ||
| origins = "*" | ||
| } |
There was a problem hiding this comment.
The PR description mentions securing CORS by restricting the wildcard default, but this logic explicitly falls back to * if origins is empty. In a production-hardened environment, it is safer to default to a restricted origin (e.g., the primary dashboard domain) or return an error if the configuration is missing, rather than allowing all origins by default.
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
go-backend/internal/shared/auth/auth.go (1)
154-169:⚠️ Potential issue | 🟠 MajorValidate Google userinfo response before accepting identity.
Lines 154–169 lack
resp.StatusCodevalidation and required field checks. Non-200 responses or incomplete payloads (emptysub/Suggested fix
import ( "context" "crypto/rsa" "encoding/json" "fmt" "io" + "net/http" "os" "time" @@ resp, err := client.Get("https://www.googleapis.com/oauth2/v3/userinfo") if err != nil { return "", "", "", fmt.Errorf("auth: get userinfo: %w", err) } defer resp.Body.Close() //nolint:errcheck + if resp.StatusCode != http.StatusOK { + body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) + return "", "", "", fmt.Errorf("auth: userinfo status %d: %s", resp.StatusCode, string(body)) + } @@ if err := parseJSON(resp.Body, &info); err != nil { return "", "", "", fmt.Errorf("auth: parse userinfo: %w", err) } + if info.Sub == "" || info.Email == "" { + return "", "", "", fmt.Errorf("auth: incomplete userinfo payload") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-backend/internal/shared/auth/auth.go` around lines 154 - 169, The Google userinfo HTTP response handling around client.Get("https://www.googleapis.com/oauth2/v3/userinfo"), parseJSON, and the returned info.Sub/info.Email/info.Name must validate resp.StatusCode and required fields: check resp.StatusCode == 200 (or 2xx) and return a wrapped error including status and optionally body if not OK, and after decoding ensure info.Sub and info.Email are non-empty (and Name if required) before returning; if any are missing return a clear error (e.g., "auth: invalid userinfo: missing sub/email") so downstream JWT/database code cannot accept incomplete identities.dashboard/auth.ts (1)
10-12:⚠️ Potential issue | 🔴 CriticalFail closed in production when auth secrets are missing.
Line 10-Line 12 logs a warning but still starts with mock OAuth credentials and a known default
secret(Line 36). That weakens auth guarantees in production.Proposed change
if (process.env.NODE_ENV === 'production' && (!googleId || !googleSecret || !nextAuthSecret)) { - console.warn('Missing authentication environment variables in production. Falling back to mock values.'); + throw new Error("Missing required auth environment variables in production"); } @@ - clientId: googleId || "mock_client_id", - clientSecret: googleSecret || "mock_client_secret", + clientId: googleId ?? "mock_client_id", + clientSecret: googleSecret ?? "mock_client_secret", @@ - secret: nextAuthSecret || "super-secret-default-key-for-dev", + secret: process.env.NODE_ENV === "production" + ? nextAuthSecret + : (nextAuthSecret || "super-secret-default-key-for-dev"),Also applies to: 17-18, 36-36
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dashboard/auth.ts` around lines 10 - 12, When running in production, do not fall back to mock OAuth credentials or a known default secret: replace the console.warn block that checks process.env.NODE_ENV === 'production' && (!googleId || !googleSecret || !nextAuthSecret) with a hard failure (throw an Error or call process.exit(1)) so the app refuses to start if googleId, googleSecret or nextAuthSecret are missing; additionally, ensure the code path that assigns mock OAuth credentials and the hardcoded `secret` (the fallback at the place where mock values/default secret are set) is only reachable when NODE_ENV !== 'production' so no mocks/default secret are used in production.docker-compose.yml (2)
488-503:⚠️ Potential issue | 🟡 MinorGrafana is missing the
obsprofile.Prometheus and Jaeger both have
profiles: ["obs"], but Grafana does not. This means runningmake start-obs(which uses--profile obs) will start Prometheus and Jaeger but not Grafana, breaking the observability stack.🔧 Proposed fix
grafana: image: grafana/grafana:10.3.1 restart: unless-stopped + profiles: ["obs"] ports: - "${GRAFANA_PORT:-3002}:3000"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 488 - 503, The grafana service is missing the profiles key so it won't start with `--profile obs`; add a profiles entry for the grafana service (e.g., profiles: ["obs"]) in the docker-compose definition for the grafana service to match prometheus and jaeger, ensuring the service name "grafana" includes the profiles field so `make start-obs` brings up Grafana as part of the observability stack.
150-165:⚠️ Potential issue | 🔴 CriticalCritical: Container memory limit is lower than Redis maxmemory setting.
Redis is configured with
--maxmemory 256mbbut the container memory limit is set to only 128M. The OOM killer will terminate Redis before it reaches its configured memory threshold, causing unexpected container crashes and potential data loss.🐛 Proposed fix — align memory limits
Either increase container limit to accommodate Redis maxmemory plus overhead:
command: > redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --save 60 1 --appendonly yes ... deploy: resources: limits: - memory: 128M + memory: 320MOr reduce Redis maxmemory to fit within the container limit:
command: > - redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --save 60 1 --appendonly yes + redis-server --maxmemory 100mb --maxmemory-policy allkeys-lru --save 60 1 --appendonly yes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 150 - 165, The container memory limit (deploy.resources.limits.memory = 128M) is lower than the Redis maxmemory flag (--maxmemory 256mb) and will cause OOM kills; fix by aligning limits: either increase the container memory limit to exceed Redis maxmemory plus overhead (e.g., set deploy.resources.limits.memory to a value >256M) or reduce the Redis startup flag (--maxmemory) to a value below the current container limit; update the redis-server command or the deploy.resources.limits.memory entry accordingly so they match.
🧹 Nitpick comments (7)
moe-scoring/src/scorer.rs (1)
176-176: Reorder imports to follow Rust conventions.Lines 174-176 have imports in non-standard order. Rust conventions place
stdimports first, but currentlysuperandcrateimports precedestd::collections::HashMap. No custom rustfmt configuration was found in the repository. Consider reordering to:use std::collections::HashMap; use crate::models::{Expert, ExpertStats, Strategy}; use super::*;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@moe-scoring/src/scorer.rs` at line 176, Imports in scorer.rs are out of conventional order; move the std import before crate and super. Specifically, place use std::collections::HashMap; above use crate::models::{Expert, ExpertStats, Strategy}; and ensure use super::*; comes last so the import order is: std (HashMap), crate (models), then super::*;.go-backend/internal/orchestrator/server/saga.go (1)
36-41: Unusedmufield after Redis migration.The
mu sync.RWMutexfield is no longer used after migrating saga state to Redis. The in-memory map synchronization has been replaced with Redis operations, making this field dead code.♻️ Remove unused field
type SagaCoordinator struct { db *db.Pool redis *db.Redis producer *kafka.Producer - mu sync.RWMutex }Also remove the unused
"sync"import at line 7.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-backend/internal/orchestrator/server/saga.go` around lines 36 - 41, Remove the now-dead in-memory mutex and its import: delete the mu field from the SagaCoordinator struct (the declaration "mu sync.RWMutex") and remove the unused "sync" import from the top of the file; verify there are no remaining references to SagaCoordinator.mu elsewhere (search for ".mu" or "mu.RLock"/"mu.Lock") and run tests/build to confirm compilation after cleaning up the unused symbol.dashboard/.npmrc (1)
1-1: Document or resolve the peer dependency conflicts.Using
legacy-peer-deps=truesuppresses npm's peer dependency resolution, which can mask real incompatibilities. While this is sometimes necessary for beta packages (like next-auth v5 beta mentioned in the PR), consider adding a comment explaining which packages require this workaround and plan to remove it once stable versions are available.📝 Suggested documentation
+# Required for next-auth v5 beta peer dependency conflicts +# TODO: Remove once next-auth v5 is stable legacy-peer-deps=true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dashboard/.npmrc` at line 1, Replace the unconditional legacy-peer-deps flag with a documented decision: either resolve the conflicting peer deps in package.json (bump or pin versions) or keep legacy-peer-deps=true but add an explanatory comment in this repository (e.g., in .npmrc and/or a short note in README/CONTRIBUTING) that lists the offending packages (for example next-auth v5 beta), why the workaround is necessary, and a TODO to remove the flag once stable releases arrive; if you can, prefer fixing the dependency versions in package.json (or adding overrides/resolutions) so the flag can be removed entirely..github/workflows/deploy-oracle.yml (1)
40-47: Replace fixed sleep with bounded readiness retries.Using only
sleep 45is fragile during cold starts. Prefer retry loops around health probes with timeout.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-oracle.yml around lines 40 - 47, Replace the fixed sleep 45 with a bounded readiness retry loop that polls the same health endpoints (the curl checks currently used for http://localhost:8080/health and http://localhost:9091/healthz) until they become healthy or a timeout/maximum attempts is reached; implement a short interval/backoff between attempts, log each retry (the existing echo lines can be reused), run docker compose ps between attempts if helpful, and exit non‑zero if the max wait time is exceeded so the workflow fails fast instead of relying on a single fixed sleep.go-backend/internal/gateway/middleware/middleware_production_test.go (1)
86-87: Replaceos.Setenv/os.Unsetenvwitht.Setenv.Use
t.Setenvinstead of manually managing environment variables withdefer os.Unsetenv. This provides automatic cleanup and eliminates the need for error suppression.Proposed change
- os.Setenv("LOCAL_USER_EMAIL", "dev@autonomousorg.ai") //nolint:errcheck - defer os.Unsetenv("LOCAL_USER_EMAIL") //nolint:errcheck + t.Setenv("LOCAL_USER_EMAIL", "dev@autonomousorg.ai")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-backend/internal/gateway/middleware/middleware_production_test.go` around lines 86 - 87, Replace the manual environment management using os.Setenv("LOCAL_USER_EMAIL", ...) and defer os.Unsetenv(...) in the test with t.Setenv("LOCAL_USER_EMAIL", "dev@autonomousorg.ai") so the testing framework handles cleanup automatically; remove the //nolint:errcheck annotations and the defer call, and update the test function (where LOCAL_USER_EMAIL is set) to use t.Setenv instead.Makefile (1)
58-77: Add new modular targets to.PHONY.These new targets are command targets and should be declared phony to prevent Make from checking for files with these names.
-.PHONY: help start stop restart logs test build clean status shell demo +.PHONY: help start stop restart logs test build clean status shell demo \ + start-infra start-bus start-core start-agents start-obs status-links🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 58 - 77, The new Make targets start-infra, start-bus, start-core, start-agents, and start-obs are command targets and must be added to the .PHONY declaration; update the existing .PHONY line to include these symbols (start-infra, start-bus, start-core, start-agents, start-obs) so Make treats them as phony targets rather than filenames.docker-compose.prod.yml (1)
16-21: Consider failing fast ifCORS_ORIGINSis unset.Unlike the base
docker-compose.ymlwhich has a fallback (${CORS_ORIGINS:-http://localhost:3000,...}), this production override uses${CORS_ORIGINS}without a default. If the variable is unset, the gateway may receive an empty CORS origin list, potentially breaking cross-origin requests silently.🛡️ Option 1: Add a safe production default
api-gateway: ports: [] environment: AI_ORG_ENV: production AUTH_DISABLED: "false" - CORS_ORIGINS: "${CORS_ORIGINS}" + CORS_ORIGINS: "${CORS_ORIGINS:?CORS_ORIGINS must be set for production}"Using
:?will cause docker compose to fail with a clear error if the variable is unset.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.prod.yml` around lines 16 - 21, The production compose override leaves the api-gateway CORS_ORIGINS unset-risky; update the api-gateway environment entry for CORS_ORIGINS to use Docker Compose variable validation (e.g., the ${CORS_ORIGINS:?...} form) so the process fails fast with a clear error if CORS_ORIGINS is not provided; modify the api-gateway environment key referencing CORS_ORIGINS to use the `:?` expansion (include a short error message) instead of the bare `${CORS_ORIGINS}` to ensure deployment breaks loudly when the variable is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy-oracle.yml:
- Line 16: The environment.url expression uses the unsupported secrets context;
replace uses of secrets.ORACLE_DOMAIN with vars.ORACLE_DOMAIN in the
environment.url assignment (i.e., change url: https://${{ secrets.ORACLE_DOMAIN
}} to url: https://${{ vars.ORACLE_DOMAIN }}), and ensure ORACLE_DOMAIN is
declared under the workflow-level vars or passed into the job via vars so that
the vars.ORACLE_DOMAIN symbol resolves at runtime.
In `@agents/base_agent.py`:
- Around line 350-352: The current budget check in agents/base_agent.py (using
budget_key, self._current_project_id and self.redis_client) is race-prone
because it does separate GET → compare → INCRBYFLOAT; replace both occurrences
(around the shown block and the similar 374-375 block) with an atomic Redis
operation — implement a small Redis Lua script (EVAL/EVALSHA) that reads the
current value, compares against the limit, and if under the limit increments by
the requested amount and returns success (or returns failure without
incrementing); call that script via self.redis_client and handle the
success/failure result to enforce the budget atomically.
In `@check_services.sh`:
- Line 58: The hardcoded PREFIX variable
("autonomousmulti-agentaiorganization-") is incorrect; update the script so
PREFIX is derived from the current project directory name and normalized the
same way Docker Compose does (lowercase and non-alphanumeric characters turned
into hyphens) then append the trailing hyphen; specifically change how PREFIX is
set in check_services.sh (the PREFIX variable) to compute basename="$(basename
"$(pwd)")" lowercased and non-alphanumerics replaced with "-" so the resulting
prefix matches "autonomous-multi-agent-ai-organization-".
In `@dashboard/Dockerfile`:
- Around line 14-16: The Dockerfile currently copies only package.json and
.npmrc (COPY package.json .npmrc ./) and runs RUN npm install
--legacy-peer-deps; update the Dockerfile to also copy package-lock.json (so the
lockfile is present in the image) and replace the install step with a
deterministic install command such as RUN npm ci --legacy-peer-deps to ensure
reproducible builds.
In `@dashboard/package.json`:
- Around line 29-30: package.json declares TypeScript and `@types/react` ranges
that differ from the resolved versions in package-lock.json (typescript vs
typescript@6.0.2 and `@types/react`@^19.0.0 vs `@types/react`@18.3.28); to fix,
either update the package.json entries for "typescript" and "@types/react" to
the exact versions found in package-lock.json or regenerate the lockfile by
running the package manager (npm install / npm ci after adjusting ranges) so
package-lock.json and package.json are consistent; update the dependencies in
package.json (the "typescript" and "@types/react" entries) or run lockfile
regeneration and commit the updated lockfile.
In `@dashboard/types.d.ts`:
- Around line 2-3: The module declaration currently matches '*.css' which
incorrectly types global CSS (e.g., globals.css) as CSS modules; update the
declaration in types.d.ts to only match '*.module.css' so only CSS module
imports get the { [className: string]: string } default export typing and global
side-effect imports remain untyped/default-import-free. Locate the declare
module '*.css' block and change the pattern to '*.module.css' while keeping the
same shape for the exported content.
In `@DEPLOY.md`:
- Around line 7-18: Update the fenced code block that shows the architecture
diagram (the block that begins with ``` and contains "You → Cloudflare (WAF +
SSL) → Oracle VM → Docker Compose Stack" and the listed services) to include a
language specifier (for example change the opening fence to ```text) so the
markdown lint rule is satisfied; do not alter the ASCII diagram contents, only
add the language token after the opening backticks.
- Around line 35-39: Replace the advice that sets an SSH ingress of "0.0.0.0/0"
in the DEPLOY.md snippet (the commented block containing "Add Ingress rule: TCP
port 22 (SSH) from 0.0.0.0/0") with a recommendation to restrict SSH to specific
CIDR(s) or to require access via a bastion/jump host; mention examples like
"your office IP(s) / bastion host only" and/or suggest using Cloudflare Access
or ephemeral key-based access as the preferred alternative so production
environments are not world-open for port 22.
In `@docker-compose.prod.yml`:
- Around line 33-38: The dashboard environment currently sets
NEXT_PUBLIC_API_URL and NEXT_PUBLIC_WS_URL using the single ORACLE_DOMAIN which
points to the dashboard itself; change these to point to the correct subdomains
used by the Cloudflare tunnel by either (A) replacing NEXT_PUBLIC_API_URL and
NEXT_PUBLIC_WS_URL values to use the explicit subdomains
("https://api.yourdomain.com" and "wss://ws.yourdomain.com") or (B) introduce
new env vars ORACLE_API_DOMAIN and ORACLE_WS_DOMAIN and set
NEXT_PUBLIC_API_URL="https://${ORACLE_API_DOMAIN}" and
NEXT_PUBLIC_WS_URL="wss://${ORACLE_WS_DOMAIN}" so API traffic goes to
api.yourdomain.com (api-gateway:8080) and WS traffic goes to ws.yourdomain.com
(ws-hub:8081); update the dashboard service environment entries for
NEXT_PUBLIC_API_URL and NEXT_PUBLIC_WS_URL accordingly.
In `@docker-compose.yml`:
- Around line 527-536: The cloudflared service configuration lacks a memory
limit, healthcheck, and uses the unstable :latest image; update the cloudflared
service block (service name "cloudflared") to pin a specific image tag instead
of :latest, add a deploy.resources.limits.memory value to match other services,
add a healthcheck that queries the metrics endpoint (ensure the environment
includes TUNNEL_METRICS and that the command for the tunnel includes the
--metrics flag), and make sure TUNNEL_TOKEN remains in environment; adjust the
command (e.g., include --metrics <PORT>) and healthcheck probe to use that
metrics port so the healthcheck will succeed.
In `@go-backend/internal/gateway/middleware/middleware.go`:
- Around line 194-200: The CORS middleware currently defaults origins to "*" and
writes the raw origins string to Access-Control-Allow-Origin, which is insecure
and invalid for comma-separated lists; update the CORS(origins string) function
to parse the origins parameter into a validated allowlist at initialization
(split on commas, trim and validate each URL), remove the "*" fallback so empty
input yields a restricted allowlist, and in the returned handler read the
incoming request Origin (c.Get("Origin")), check it against the allowlist, and
if it matches set Access-Control-Allow-Origin to that single origin (otherwise
do not set the header or return a 403 for disallowed origins); also ensure
preflight OPTIONS requests still respond with the appropriate
Allow-Methods/Allow-Headers using the matched origin when allowed.
In `@go-backend/internal/orchestrator/server/saga.go`:
- Around line 123-141: The Advance method currently swallows errors from
sc.redis.Client.Get, json.Unmarshal, json.Marshal and sc.redis.Client.Set;
update SagaCoordinator.Advance to check and handle each error: log the error
with contextual fields (projectID, agentRole, key) and return early on failures
from Get and Unmarshal, and also log and handle failures from Marshal and Set
(including retry or returning the error as appropriate for your caller contract)
so failures to read/parse/update the saga are observable; reference the Advance
method and the calls to sc.redis.Client.Get, json.Unmarshal, json.Marshal and
sc.redis.Client.Set when making the changes.
- Around line 92-95: Handle errors from json.Marshal and Redis write in
HandleFailure: replace the ignored json.Marshal(&state) with code that checks
the returned err and if non-nil logs via the saga's logger (or returns the
error) and aborts further steps; then call sc.redis.Client.Set(ctx, key, data,
24*time.Hour) and check the returned error via .Err() (or the function's error)
and log/return on failure; ensure you do not proceed to publish the recovery
task (the publish call after line 118) unless both marshal and Redis Set succeed
so the saga.Status="compensating" is reliably persisted. Use the existing
symbols HandleFailure, state.Status, json.Marshal, sc.redis.Client.Set, ctx, key
and the publisher call to locate and gate the publish.
In `@go-backend/internal/shared/config/config.go`:
- Line 144: The default for gateway.cors_origins is currently set to an empty
string which ends up being treated as permissive; change the default to an
explicit empty list by setting v.SetDefault("gateway.cors_origins", []string{})
in config.go (replace the current v.SetDefault("gateway.cors_origins", "")), and
update the origin handling in gateway/middleware/middleware.go (the code that
converts empty origins to "*") to treat an empty slice/string as "no origins
allowed" (restricted) instead of converting it to "*". Ensure both the config
key and the middleware logic consistently handle the empty-slice case.
In `@scratch/verify_fixes.py`:
- Around line 11-18: The file has trailing whitespace on multiple blank lines
causing ruff W293 failures; open the class MockAgent (methods system_prompt and
run) and remove trailing spaces from empty lines (or run `ruff format
scratch/verify_fixes.py`) so no blank line ends with whitespace; optionally
enable your editor's "trim trailing whitespace on save" to prevent recurrence.
- Around line 1-4: The file-level imports (asyncio, os, sys, and from
unittest.mock import MagicMock, AsyncMock) are unsorted and there are
whitespace/formatting issues causing ruff violations; reorder and group imports
into standard-library then third-party/local (e.g., put asyncio, os, sys first,
then the unittest.mock import), remove any unused imports, ensure single blank
line between import groups and one newline at end of file, and fix any extra
trailing/leading whitespace so the module-level symbols (asyncio, os, sys,
MagicMock, AsyncMock) comply with ruff/isort rules.
- Around line 40-43: Replace the try/except-pass that awaits
asyncio.wait_for(agent.suspend_for_approval("test_action"), timeout=0.1) with
contextlib.suppress(TimeoutError): to suppress the builtin TimeoutError (Python
3.11+) instead of catching asyncio.TimeoutError; add "import contextlib" at the
top and call asyncio.wait_for(agent.suspend_for_approval("test_action"),
timeout=0.1) inside the contextlib.suppress block so the timeout is ignored
without an empty except.
In `@security-check/src/validator.rs`:
- Around line 57-65: The nested `if let` in is_dangerous_expr should be
collapsed into one pattern match so clippy's collapsible-match warning is
resolved; replace the two-level checks (if let Call { function, .. } =
&expr.node { if let Identifier { name } = &function.node { ... }}) with a single
match or single if-let pattern that directly matches Call { function: Identifier
{ name }, .. } and then compare name against the dangerous identifiers ("eval",
"exec", "compile", "getattr", "open"); keep the same return true branch and
surrounding logic in is_dangerous_expr.
In `@tools/git_tool.py`:
- Around line 89-96: The code builds authed_url and calls git remote set-url via
_run_subprocess which exposes the GitHub token in process args and .git/config
and leaves it if a crash occurs; replace this flow by using a credential helper
instead of embedding the token (e.g., invoke GitHub CLI or configure Git
Credential Manager) and remove any constructions of authed_url with the token;
update the logic in the method that calls authed_url and _run_subprocess to
authenticate via gh auth login or git config credential.helper (manager-core)
before pushing, and ensure any temporary remote changes are never written with
plain tokens and any existing remote URL restoration is done inside a
try/finally to guarantee cleanup if an error occurs.
---
Outside diff comments:
In `@dashboard/auth.ts`:
- Around line 10-12: When running in production, do not fall back to mock OAuth
credentials or a known default secret: replace the console.warn block that
checks process.env.NODE_ENV === 'production' && (!googleId || !googleSecret ||
!nextAuthSecret) with a hard failure (throw an Error or call process.exit(1)) so
the app refuses to start if googleId, googleSecret or nextAuthSecret are
missing; additionally, ensure the code path that assigns mock OAuth credentials
and the hardcoded `secret` (the fallback at the place where mock values/default
secret are set) is only reachable when NODE_ENV !== 'production' so no
mocks/default secret are used in production.
In `@docker-compose.yml`:
- Around line 488-503: The grafana service is missing the profiles key so it
won't start with `--profile obs`; add a profiles entry for the grafana service
(e.g., profiles: ["obs"]) in the docker-compose definition for the grafana
service to match prometheus and jaeger, ensuring the service name "grafana"
includes the profiles field so `make start-obs` brings up Grafana as part of the
observability stack.
- Around line 150-165: The container memory limit
(deploy.resources.limits.memory = 128M) is lower than the Redis maxmemory flag
(--maxmemory 256mb) and will cause OOM kills; fix by aligning limits: either
increase the container memory limit to exceed Redis maxmemory plus overhead
(e.g., set deploy.resources.limits.memory to a value >256M) or reduce the Redis
startup flag (--maxmemory) to a value below the current container limit; update
the redis-server command or the deploy.resources.limits.memory entry accordingly
so they match.
In `@go-backend/internal/shared/auth/auth.go`:
- Around line 154-169: The Google userinfo HTTP response handling around
client.Get("https://www.googleapis.com/oauth2/v3/userinfo"), parseJSON, and the
returned info.Sub/info.Email/info.Name must validate resp.StatusCode and
required fields: check resp.StatusCode == 200 (or 2xx) and return a wrapped
error including status and optionally body if not OK, and after decoding ensure
info.Sub and info.Email are non-empty (and Name if required) before returning;
if any are missing return a clear error (e.g., "auth: invalid userinfo: missing
sub/email") so downstream JWT/database code cannot accept incomplete identities.
---
Nitpick comments:
In @.github/workflows/deploy-oracle.yml:
- Around line 40-47: Replace the fixed sleep 45 with a bounded readiness retry
loop that polls the same health endpoints (the curl checks currently used for
http://localhost:8080/health and http://localhost:9091/healthz) until they
become healthy or a timeout/maximum attempts is reached; implement a short
interval/backoff between attempts, log each retry (the existing echo lines can
be reused), run docker compose ps between attempts if helpful, and exit non‑zero
if the max wait time is exceeded so the workflow fails fast instead of relying
on a single fixed sleep.
In `@dashboard/.npmrc`:
- Line 1: Replace the unconditional legacy-peer-deps flag with a documented
decision: either resolve the conflicting peer deps in package.json (bump or pin
versions) or keep legacy-peer-deps=true but add an explanatory comment in this
repository (e.g., in .npmrc and/or a short note in README/CONTRIBUTING) that
lists the offending packages (for example next-auth v5 beta), why the workaround
is necessary, and a TODO to remove the flag once stable releases arrive; if you
can, prefer fixing the dependency versions in package.json (or adding
overrides/resolutions) so the flag can be removed entirely.
In `@docker-compose.prod.yml`:
- Around line 16-21: The production compose override leaves the api-gateway
CORS_ORIGINS unset-risky; update the api-gateway environment entry for
CORS_ORIGINS to use Docker Compose variable validation (e.g., the
${CORS_ORIGINS:?...} form) so the process fails fast with a clear error if
CORS_ORIGINS is not provided; modify the api-gateway environment key referencing
CORS_ORIGINS to use the `:?` expansion (include a short error message) instead
of the bare `${CORS_ORIGINS}` to ensure deployment breaks loudly when the
variable is missing.
In `@go-backend/internal/gateway/middleware/middleware_production_test.go`:
- Around line 86-87: Replace the manual environment management using
os.Setenv("LOCAL_USER_EMAIL", ...) and defer os.Unsetenv(...) in the test with
t.Setenv("LOCAL_USER_EMAIL", "dev@autonomousorg.ai") so the testing framework
handles cleanup automatically; remove the //nolint:errcheck annotations and the
defer call, and update the test function (where LOCAL_USER_EMAIL is set) to use
t.Setenv instead.
In `@go-backend/internal/orchestrator/server/saga.go`:
- Around line 36-41: Remove the now-dead in-memory mutex and its import: delete
the mu field from the SagaCoordinator struct (the declaration "mu sync.RWMutex")
and remove the unused "sync" import from the top of the file; verify there are
no remaining references to SagaCoordinator.mu elsewhere (search for ".mu" or
"mu.RLock"/"mu.Lock") and run tests/build to confirm compilation after cleaning
up the unused symbol.
In `@Makefile`:
- Around line 58-77: The new Make targets start-infra, start-bus, start-core,
start-agents, and start-obs are command targets and must be added to the .PHONY
declaration; update the existing .PHONY line to include these symbols
(start-infra, start-bus, start-core, start-agents, start-obs) so Make treats
them as phony targets rather than filenames.
In `@moe-scoring/src/scorer.rs`:
- Line 176: Imports in scorer.rs are out of conventional order; move the std
import before crate and super. Specifically, place use
std::collections::HashMap; above use crate::models::{Expert, ExpertStats,
Strategy}; and ensure use super::*; comes last so the import order is: std
(HashMap), crate (models), then super::*;.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc77696f-2a5a-4e4f-b1f6-5c42ba6a99bb
⛔ Files ignored due to path filters (1)
dashboard/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (43)
.env.example.env.prod.example.github/workflows/deploy-oracle.yml.github/workflows/python-ci.ymlDEPLOY.mdMakefileagents/base_agent.pyagents/devops_agent.pycheck_services.shdashboard/.npmrcdashboard/Dockerfiledashboard/app/api/auth/[...nextauth]/route.tsdashboard/app/dashboard/page.tsxdashboard/auth.tsdashboard/next.config.jsdashboard/package.jsondashboard/types.d.tsdocker-compose.prod.ymldocker-compose.ymlgo-backend/cmd/gateway/main.gogo-backend/cmd/orchestrator/main.gogo-backend/deploy/dockerfiles/Dockerfile.gatewaygo-backend/deploy/dockerfiles/Dockerfile.orchestratorgo-backend/deploy/dockerfiles/Dockerfile.proxygo-backend/deploy/dockerfiles/Dockerfile.ws-hubgo-backend/internal/gateway/middleware/middleware.gogo-backend/internal/gateway/middleware/middleware_production_test.gogo-backend/internal/gateway/middleware/middleware_test.gogo-backend/internal/orchestrator/server/results.gogo-backend/internal/orchestrator/server/saga.gogo-backend/internal/proxy/egress.gogo-backend/internal/shared/auth/auth.gogo-backend/internal/shared/config/config.gogo-backend/internal/shared/config/config_test.gomoe-scoring/src/main.rsmoe-scoring/src/models.rsmoe-scoring/src/scorer.rsorchestrator/memory/artifacts_store.pyscratch/verify_fixes.pysecurity-check/src/main.rssecurity-check/src/scrubber.rssecurity-check/src/validator.rstools/git_tool.py
💤 Files with no reviewable changes (2)
- moe-scoring/src/main.rs
- moe-scoring/src/models.rs
| runs-on: ubuntu-latest | ||
| environment: | ||
| name: production | ||
| url: https://${{ secrets.ORACLE_DOMAIN }} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Is the secretscontext allowed in GitHub Actionsjobs.<job_id>.environment.url expressions? If not, which contexts are supported there?
💡 Result:
No, the secrets context is not allowed in GitHub Actions jobs.<job_id>.environment.url expressions. The supported contexts are: github, needs, strategy, matrix, job, runner, env, vars, steps, inputs.
Citations:
- 1: https://docs.github.com/en/actions/reference/workflows-and-actions/contexts
- 2: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/contexts
- 3: https://docs.github.com/en/actions/reference/accessing-contextual-information-about-workflow-runs
Replace secrets context with vars in environment.url.
The secrets context is not supported in jobs.<job_id>.environment.url expressions. Only these contexts are allowed: github, needs, strategy, matrix, job, runner, env, vars, steps, inputs. Using secrets will cause workflow validation to fail.
Proposed change
environment:
name: production
- url: https://${{ secrets.ORACLE_DOMAIN }}
+ url: https://${{ vars.ORACLE_DOMAIN }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| url: https://${{ secrets.ORACLE_DOMAIN }} | |
| environment: | |
| name: production | |
| url: https://${{ vars.ORACLE_DOMAIN }} |
🧰 Tools
🪛 actionlint (1.7.12)
[error] 16-16: context "secrets" is not allowed here. available contexts are "env", "github", "inputs", "job", "matrix", "needs", "runner", "steps", "strategy", "vars". see https://docs.github.com/en/actions/learn-github-actions/contexts#context-availability for more details
(expression)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/deploy-oracle.yml at line 16, The environment.url
expression uses the unsupported secrets context; replace uses of
secrets.ORACLE_DOMAIN with vars.ORACLE_DOMAIN in the environment.url assignment
(i.e., change url: https://${{ secrets.ORACLE_DOMAIN }} to url: https://${{
vars.ORACLE_DOMAIN }}), and ensure ORACLE_DOMAIN is declared under the
workflow-level vars or passed into the job via vars so that the
vars.ORACLE_DOMAIN symbol resolves at runtime.
| budget_key = f"budget:{self._current_project_id or 'global'}:used" | ||
| used_str = await self.redis_client.get(budget_key) | ||
| used = float(used_str) if used_str else 0.0 |
There was a problem hiding this comment.
Budget gate is still race-prone under concurrent calls.
Current GET → compare → later INCRBYFLOAT is not atomic. Multiple concurrent requests in the same project can all pass the check and exceed the limit.
Suggested direction (atomic check+increment in Redis)
- used_str = await self.redis_client.get(budget_key)
- used = float(used_str) if used_str else 0.0
- if used >= self.budget_limit:
- raise RuntimeError(...)
+ # Lua: atomically enforce limit and increment
+ script = """
+ local used = tonumber(redis.call('GET', KEYS[1]) or '0')
+ local limit = tonumber(ARGV[1])
+ local delta = tonumber(ARGV[2])
+ if used + delta > limit then
+ return -1
+ end
+ return redis.call('INCRBYFLOAT', KEYS[1], delta)
+ """
+ new_used = await self.redis_client.eval(script, 1, budget_key, self.budget_limit, 0.005)
+ if float(new_used) == -1:
+ raise RuntimeError(f"Budget exceeded! Limit: ${self.budget_limit:.2f}")Also applies to: 374-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/base_agent.py` around lines 350 - 352, The current budget check in
agents/base_agent.py (using budget_key, self._current_project_id and
self.redis_client) is race-prone because it does separate GET → compare →
INCRBYFLOAT; replace both occurrences (around the shown block and the similar
374-375 block) with an atomic Redis operation — implement a small Redis Lua
script (EVAL/EVALSHA) that reads the current value, compares against the limit,
and if under the limit increments by the requested amount and returns success
(or returns failure without incrementing); call that script via
self.redis_client and handle the success/failure result to enforce the budget
atomically.
| } | ||
|
|
||
| # ── Container name prefix ──────────────────────────────────────────────────── | ||
| PREFIX="autonomousmulti-agentaiorganization-" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
# Verify actual container names and compare with configured prefix.
docker compose ps --format '{{.Name}}'
echo "Configured prefix: autonomousmulti-agentaiorganization-"Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 213
🏁 Script executed:
# First, find and examine the check_services.sh file
find . -name "check_services.sh" -type f | head -20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 115
🏁 Script executed:
# Look for docker-compose files
find . -name "docker-compose*" -type f | head -20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 319
🏁 Script executed:
# Also check for any compose directory or files
git ls-files | grep -i compose | head -20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 305
🏁 Script executed:
# Read check_services.sh to understand the context and usage of PREFIX
cat -n ./check_services.shRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 6632
🏁 Script executed:
# Read the main docker-compose.yml to see how project name and services are configured
cat -n ./docker-compose.ymlRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 18768
🏁 Script executed:
# Check the repository directory name and how Docker Compose would normalize it
pwd | xargs basenameRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 99
🏁 Script executed:
# Also check if there's a .git config or similar that shows the actual project context
ls -la | head -20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1333
🏁 Script executed:
# Verify Docker Compose's container naming convention by checking the actual working directory
git rev-parse --show-toplevelRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 114
🏁 Script executed:
# Docker Compose normalizes project names by taking the directory name, converting to lowercase
# and replacing non-alphanumeric characters with empty string (by default)
# Let's check what the actual directory would be called and verify the normalization
# For the repo "Autonomous-Multi-Agent-AI-Organization", the normalized name would be:
python3 << 'EOF'
import re
# The actual repo directory name (from GitHub)
repo_dir = "Autonomous-Multi-Agent-AI-Organization"
# Docker Compose's default normalization (converts to lowercase, replaces non-alphanumeric)
# However, recent versions preserve hyphens
normalized_default = repo_dir.lower() # Keep hyphens by default
print(f"Original directory: {repo_dir}")
print(f"Docker Compose normalized (with hyphens): {normalized_default}")
print(f"Hardcoded prefix in script: autonomousmulti-agentaiorganization-")
print(f"Expected container name: {normalized_default}-postgres-1")
print(f"Script looking for: autonomousmulti-agentaiorganization-postgres-1")
print(f"\nMismatch: {'YES - hyphens are missing in hardcoded prefix' if normalized_default != 'autonomousmulti-agentaiorganization' else 'NO'}")
EOFRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 499
Container prefix is hardcoded incorrectly and will cause all checks to fail.
Line 58 hardcodes the prefix as autonomousmulti-agentaiorganization- (without hyphens), but Docker Compose normalizes the directory name Autonomous-Multi-Agent-AI-Organization to autonomous-multi-agent-ai-organization (with hyphens preserved). This causes every container lookup to fail with (not found).
Recommended fix
-PREFIX="autonomousmulti-agentaiorganization-"
+PROJECT="${COMPOSE_PROJECT_NAME:-$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_-]//g')}"
+PREFIX="${PROJECT}-"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| PREFIX="autonomousmulti-agentaiorganization-" | |
| PROJECT="${COMPOSE_PROJECT_NAME:-$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_-]//g')}" | |
| PREFIX="${PROJECT}-" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@check_services.sh` at line 58, The hardcoded PREFIX variable
("autonomousmulti-agentaiorganization-") is incorrect; update the script so
PREFIX is derived from the current project directory name and normalized the
same way Docker Compose does (lowercase and non-alphanumeric characters turned
into hyphens) then append the trailing hyphen; specifically change how PREFIX is
set in check_services.sh (the PREFIX variable) to compute basename="$(basename
"$(pwd)")" lowercased and non-alphanumerics replaced with "-" so the resulting
prefix matches "autonomous-multi-agent-ai-organization-".
| COPY package.json .npmrc ./ | ||
| # Use npm install with legacy-peer-deps to handle React 19 peer dep mismatches | ||
| RUN npm install --legacy-peer-deps |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify lockfile presence and how Dockerfile installs deps
fd -HI '^package-lock\.json$' dashboard
rg -n 'COPY .*package|npm (install|ci)' dashboard/Dockerfile
# Inspect resolved versions that would be used if lockfile exists
python - <<'PY'
import json, pathlib
lock = pathlib.Path("dashboard/package-lock.json")
if not lock.exists():
print("package-lock.json is missing under dashboard/")
else:
data = json.loads(lock.read_text())
print("lockfileVersion:", data.get("lockfileVersion"))
root = data.get("packages", {}).get("", {})
print("root deps keys:", sorted(list(root.get("dependencies", {}).keys()))[:10])
PYRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 432
Include package-lock.json in Docker build to ensure reproducible installs.
The package-lock.json exists in the repository but is not copied into the Docker image (line 14 only copies package.json and .npmrc). Without the lockfile, npm install will resolve dependencies fresh on each build, creating non-deterministic dependency graphs even though a lockfile is available. Consider copying the lockfile and using npm ci instead of npm install for deterministic builds.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dashboard/Dockerfile` around lines 14 - 16, The Dockerfile currently copies
only package.json and .npmrc (COPY package.json .npmrc ./) and runs RUN npm
install --legacy-peer-deps; update the Dockerfile to also copy package-lock.json
(so the lockfile is present in the image) and replace the install step with a
deterministic install command such as RUN npm ci --legacy-peer-deps to ensure
reproducible builds.
| "@types/react": "^19.0.0", | ||
| "@types/react-dom": "^19.0.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import json
pkg = json.load(open("dashboard/package.json"))
lock = json.load(open("dashboard/package-lock.json"))
decl = pkg.get("devDependencies", {})
print("Declared:")
print(" typescript =", decl.get("typescript"))
print(" `@types/react` =", decl.get("@types/react"))
print(" `@types/react-dom` =", decl.get("@types/react-dom"))
root = lock.get("packages", {}).get("", {})
root_deps = {}
root_deps.update(root.get("dependencies", {}))
root_deps.update(root.get("devDependencies", {}))
print("\nLockfile root references:")
print(" typescript =", root_deps.get("typescript"))
print(" `@types/react` =", root_deps.get("@types/react"))
print(" `@types/react-dom` =", root_deps.get("@types/react-dom"))
pkgs = lock.get("packages", {})
print("\nResolved node_modules versions:")
print(" node_modules/typescript =", pkgs.get("node_modules/typescript", {}).get("version"))
print(" node_modules/@types/react =", pkgs.get("node_modules/@types/react", {}).get("version"))
print(" node_modules/@types/react-dom =", pkgs.get("node_modules/@types/react-dom", {}).get("version"))
PYRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 430
Resolve manifest ↔ lockfile version drift for TypeScript and @types/react.
package.json declares typescript@^5.4.5 and @types/react@^19.0.0, but package-lock.json resolves to typescript@6.0.2 and @types/react@18.3.28. This mismatch breaks build reproducibility and masks type compatibility issues. Update package.json to match locked versions or regenerate the lockfile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dashboard/package.json` around lines 29 - 30, package.json declares
TypeScript and `@types/react` ranges that differ from the resolved versions in
package-lock.json (typescript vs typescript@6.0.2 and `@types/react`@^19.0.0 vs
`@types/react`@18.3.28); to fix, either update the package.json entries for
"typescript" and "@types/react" to the exact versions found in package-lock.json
or regenerate the lockfile by running the package manager (npm install / npm ci
after adjusting ranges) so package-lock.json and package.json are consistent;
update the dependencies in package.json (the "typescript" and "@types/react"
entries) or run lockfile regeneration and commit the updated lockfile.
| import asyncio | ||
| import os | ||
| import sys | ||
| from unittest.mock import MagicMock, AsyncMock |
There was a problem hiding this comment.
Fix linting errors to unblock the CI pipeline.
The pipeline is failing due to multiple ruff violations. The imports need sorting, and there are whitespace issues throughout the file.
🔧 Proposed fix for imports
-import asyncio
-import os
-import sys
-from unittest.mock import MagicMock, AsyncMock
+import asyncio
+import os
+import sys
+
+from unittest.mock import AsyncMock, MagicMock📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import asyncio | |
| import os | |
| import sys | |
| from unittest.mock import MagicMock, AsyncMock | |
| import asyncio | |
| import os | |
| import sys | |
| from unittest.mock import AsyncMock, MagicMock |
🧰 Tools
🪛 GitHub Actions: AI Organization — CI/CD Pipeline
[error] 1-1: ruff check failed (I001): Import block is un-sorted or un-formatted.
🪛 GitHub Actions: Python CI
[error] 1-4: ruff check . reported I001: Import block is un-sorted or un-formatted. Organize imports.
🪛 GitHub Check: 🧪 Test & Lint
[failure] 1-4: ruff (I001)
scratch/verify_fixes.py:1:1: I001 Import block is un-sorted or un-formatted
help: Organize imports
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scratch/verify_fixes.py` around lines 1 - 4, The file-level imports (asyncio,
os, sys, and from unittest.mock import MagicMock, AsyncMock) are unsorted and
there are whitespace/formatting issues causing ruff violations; reorder and
group imports into standard-library then third-party/local (e.g., put asyncio,
os, sys first, then the unittest.mock import), remove any unused imports, ensure
single blank line between import groups and one newline at end of file, and fix
any extra trailing/leading whitespace so the module-level symbols (asyncio, os,
sys, MagicMock, AsyncMock) comply with ruff/isort rules.
| class MockAgent(BaseAgent): | ||
| @property | ||
| def system_prompt(self): | ||
| return "You are a test agent." | ||
|
|
||
| async def run(self, **kwargs): | ||
| return {"status": "success"} | ||
|
|
There was a problem hiding this comment.
Remove whitespace from blank lines.
Multiple blank lines contain trailing whitespace (lines 15, 24, 27, 31, 39, etc.) causing ruff W293 failures. Run ruff format scratch/verify_fixes.py or configure your editor to strip trailing whitespace on save.
🧰 Tools
🪛 GitHub Actions: Python CI
[error] 15-15: ruff check . reported W293: Blank line contains whitespace. Remove whitespace from blank line.
🪛 GitHub Check: 🧪 Test & Lint
[failure] 15-15: ruff (W293)
scratch/verify_fixes.py:15:1: W293 Blank line contains whitespace
help: Remove whitespace from blank line
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scratch/verify_fixes.py` around lines 11 - 18, The file has trailing
whitespace on multiple blank lines causing ruff W293 failures; open the class
MockAgent (methods system_prompt and run) and remove trailing spaces from empty
lines (or run `ruff format scratch/verify_fixes.py`) so no blank line ends with
whitespace; optionally enable your editor's "trim trailing whitespace on save"
to prevent recurrence.
| try: | ||
| await asyncio.wait_for(agent.suspend_for_approval("test_action"), timeout=0.1) | ||
| except asyncio.TimeoutError: | ||
| pass |
There was a problem hiding this comment.
Use contextlib.suppress(TimeoutError) instead of try-except-pass.
Per pipeline failures, replace the try-except-pass pattern and use the builtin TimeoutError (Python 3.11+).
🔧 Proposed fix
Add import at the top:
import contextlibThen replace the try-except block:
- try:
- await asyncio.wait_for(agent.suspend_for_approval("test_action"), timeout=0.1)
- except asyncio.TimeoutError:
- pass
+ with contextlib.suppress(TimeoutError):
+ await asyncio.wait_for(agent.suspend_for_approval("test_action"), timeout=0.1)🧰 Tools
🪛 GitHub Actions: Python CI
[error] 40-43: ruff check . reported SIM105: Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass.
[error] 42-42: ruff check . reported UP041: Replace aliased errors with builtin TimeoutError (use TimeoutError instead of asyncio.TimeoutError).
🪛 GitHub Check: 🧪 Test & Lint
[failure] 42-42: ruff (UP041)
scratch/verify_fixes.py:42:12: UP041 Replace aliased errors with TimeoutError
help: Replace asyncio.TimeoutError with builtin TimeoutError
[failure] 40-43: ruff (SIM105)
scratch/verify_fixes.py:40:5: SIM105 Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass
help: Replace try-except-pass with with contextlib.suppress(asyncio.TimeoutError): ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scratch/verify_fixes.py` around lines 40 - 43, Replace the try/except-pass
that awaits asyncio.wait_for(agent.suspend_for_approval("test_action"),
timeout=0.1) with contextlib.suppress(TimeoutError): to suppress the builtin
TimeoutError (Python 3.11+) instead of catching asyncio.TimeoutError; add
"import contextlib" at the top and call
asyncio.wait_for(agent.suspend_for_approval("test_action"), timeout=0.1) inside
the contextlib.suppress block so the timeout is ignored without an empty except.
| authed_url = url.replace( | ||
| "https://github.com/", f"https://{token}@github.com/" | ||
| ) | ||
| # Temporarily update remote URL for push | ||
| await self._run_subprocess(["git", "remote", "set-url", remote, authed_url], cwd=self.repo_path) | ||
| res = await self._run_subprocess(["git", "push", remote, branch], cwd=self.repo_path) | ||
| await self._run_subprocess( | ||
| ["git", "remote", "set-url", remote, authed_url], | ||
| cwd=self.repo_path, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the recommended secure way to authenticate git push operations using GitHub personal access tokens without exposing the token in command-line arguments or git config files?
💡 Result:
The recommended secure way to authenticate git push operations using GitHub personal access tokens (PATs), without exposing the token in command-line arguments or git config files, is to use Git Credential Manager (GCM) or GitHub CLI (gh). These tools store credentials securely in the system's credential store (e.g., Windows Credential Manager, macOS Keychain, or Linux keyring) and handle authentication automatically for HTTPS remotes. GitHub official documentation explicitly recommends GitHub CLI or Git Credential Manager over manually creating and storing PATs for command-line access. Primary methods: 1. GitHub CLI (simplest, handles PAT generation and storage): - Install GitHub CLI (https://cli.github.com). - Run gh auth login, select HTTPS protocol, and confirm Git credential setup (Y). - CLI generates/stores a PAT securely; subsequent git push works without token exposure. 2. Git Credential Manager (GCM, cross-platform): - Install GCM (https://github.com/git-ecosystem/git-credential-manager). - Configure: git config --global credential.helper manager-core (or auto-configured on some platforms). - On first git push, browser authenticates (supports 2FA); GCM stores token securely. - No manual PAT creation needed; GCM manages it. Avoid: - git config --global credential.helper store (plaintext file ~/.git-credentials). - Command-line args like git push https://token@github.com/... (visible in process lists/history). - Plaintext in scripts/configs. For manual PAT use (if avoiding CLI/GCM), perform first push interactively (enter PAT as password when prompted), then rely on OS-native helpers like git config --global credential.helper osxkeychain (macOS), but GitHub prioritizes GCM/CLI. Use HTTPS remotes (convert SSH if needed: git remote set-url origin https://github.com/user/repo.git). This ensures tokens never appear in args, configs, or history.
Citations:
- 1: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
- 2: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-authentication-to-github
- 3: https://docs.github.com/en/rest/authentication/keeping-your-api-credentials-secure?apiVersion=2026-03-10
- 4: https://docs.github.com/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
- 5: https://oneuptime.com/blog/post/2026-01-24-git-credential-storage/view
- 6: https://docs.github.com/en/github/using-git/caching-your-github-credentials-in-git
- 7: https://docs.github.com/get-started/getting-started-with-git/caching-your-github-credentials-in-git
- 8: https://docs.github.com/en/github/getting-started-with-github/caching-your-github-credentials-in-git
- 9: https://docs.github.com/en/github/getting-started-with-github/getting-started-with-git/caching-your-github-credentials-in-git
- 10: https://docs.github.com/en/github/using-git/caching-your-github-password-in-git
🏁 Script executed:
fd -t f "git_tool.py" -x cat -n {}Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 6608
🏁 Script executed:
fd -t f "base_tool.py" -x cat -n {}Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 7495
Critical security issue: GitHub token exposed in process listings and git configuration.
The current implementation embeds the token directly in the URL passed to git remote set-url, which creates multiple credential exposure vectors:
- Process listing visibility: The token appears in command-line arguments visible via
ps,top, or/proc/<pid>/cmdline—exposing it to process monitoring, system tools, and other users. - Git config exposure: The token is temporarily written to
.git/config, readable by any process with filesystem access. - Leak persistence: No error handling between lines 93 and 101. If the process crashes after
set-urlbut before the restore, the token remains in git configuration indefinitely.
This approach contradicts GitHub's official security guidance. Replace it with one of the officially recommended credential management tools:
- GitHub CLI (
gh): Generates and stores credentials securely in the system credential store. Rungh auth loginonce; subsequent git operations authenticate automatically without token exposure. - Git Credential Manager (GCM): Cross-platform solution that stores credentials securely (Windows Credential Manager, macOS Keychain, or Linux keyring). Install and configure with
git config --global credential.helper manager-core.
Both approaches eliminate token exposure in process arguments, git config files, and command history while relying on OS-native secure storage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/git_tool.py` around lines 89 - 96, The code builds authed_url and calls
git remote set-url via _run_subprocess which exposes the GitHub token in process
args and .git/config and leaves it if a crash occurs; replace this flow by using
a credential helper instead of embedding the token (e.g., invoke GitHub CLI or
configure Git Credential Manager) and remove any constructions of authed_url
with the token; update the logic in the method that calls authed_url and
_run_subprocess to authenticate via gh auth login or git config
credential.helper (manager-core) before pushing, and ensure any temporary remote
changes are never written with plain tokens and any existing remote URL
restoration is done inside a try/finally to guarantee cleanup if an error
occurs.
aad39c1 to
b3cc065
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
82cc218 to
ba2ec15
Compare
Resolves PII leak, persists saga state, adds project-scoped budgeting, and implements Docker Profiles for modular startup.
ba2ec15 to
dec569d
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docker-compose.yml (3)
147-167:⚠️ Potential issue | 🔴 CriticalRedis
--maxmemory 256mbexceeds the 128M container limit — guaranteed OOMKill under load.Redis is configured with
--maxmemory 256mb(Line 151) butdeploy.resources.limits.memoryis set to128M(Line 165). Redis'smaxmemoryis enforced in userspace based on its own accounting, so Redis will happily allocate up to ~256MB before its LRU eviction kicks in — long after the cgroup has OOM-killed the container. Add AOF replay on restart and you risk data loss too. Either lower--maxmemorybelow the cgroup limit (e.g.96mb) or raise the container limit to≥ 320Mto leave room for Redis overhead (buffers, replication backlog, COW during BGSAVE).🔧 Proposed fix
command: > - redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --save 60 1 --appendonly yes + redis-server --maxmemory 96mb --maxmemory-policy allkeys-lru --save 60 1 --appendonly yes ... deploy: resources: limits: - memory: 128M + memory: 128MOr raise the limit to ~320M if you need the full 256MB working set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 147 - 167, The redis service has a mismatch: the command uses "--maxmemory 256mb" while deploy.resources.limits.memory is "128M", which will cause OOMs; either lower Redis's maxmemory to below the cgroup limit (e.g., set "--maxmemory 96mb" in the redis service command) or raise deploy.resources.limits.memory to at least ~320M to allow 256MB plus Redis overhead; update the redis service block (the command string and/or deploy.resources.limits.memory) and ensure the chosen change is consistent with appendonly/AOF considerations before committing.
488-503:⚠️ Potential issue | 🟠 Major
grafanais missing aprofiles:entry and a memory limit — inconsistent with the rest ofobs.
prometheusandjaegerare gated onprofiles: ["obs"], butgrafanahas no profile. Consequences:
make start-obswill work for prom/jaeger but grafana will also start under any other profile-activateddocker compose up, while itsdepends_on: prometheuspoints to a service that may not be active → Compose will fail or pull inprometheusimplicitly despite the operator not asking forobs.- It's the only service in this file lacking
deploy.resources.limits.memory, contradicting the PR's goal of "Applied memory limits to all services."🔧 Proposed fix
grafana: image: grafana/grafana:10.3.1 ... depends_on: - prometheus networks: - ai-org-net + profiles: ["obs"] + deploy: + resources: + limits: + memory: 256M🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 488 - 503, The grafana service lacks a profiles entry and a memory limit; add profiles: ["obs"] to the grafana service block and add deploy.resources.limits.memory (matching the pattern used for other obs services) so it only starts with the obs profile and enforces the same memory cap; ensure depends_on still references prometheus and jaeger consistently with the other obs services.
257-342:⚠️ Potential issue | 🟡 Minor128 MB limit for Go services is aggressive — verify under load before production.
orchestrator,ws-hub, andapi-gatewayare each capped at 128 MB. Go's GC works against a soft target but will happily exceed it under burst traffic (especially with Kafka consumer batching and gRPC connection pooling), so a hard cgroup cap of 128 MB tends to trip OOM-kills on spikes. Consider settingGOMEMLIMIT(e.g.GOMEMLIMIT=100MiB) alongside the cgroup cap, or bump to 256 MB for request-path services, and monitorcontainer_memory_working_set_bytesduring load tests before locking these in.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 257 - 342, The 128M cgroup cap is too low for the Go services (orchestrator, ws-hub, api-gateway) and will likely cause OOMs under spikes; update the docker-compose service definitions for orchestrator, ws-hub, and api-gateway to either increase the deploy.resources.limits.memory to 256M or add a GOMEMLIMIT environment variable (e.g. GOMEMLIMIT=100MiB) alongside the existing environment block to keep the Go runtime within the cgroup, and ensure you run load tests and monitor container_memory_working_set_bytes before finalizing the cap.
🧹 Nitpick comments (2)
Makefile (1)
7-7: Add new modular targets to.PHONY.
start-infra,start-bus,start-core,start-agents,start-obs, andstatus-linksare not listed in.PHONY, so a file matching any of those names in the repo root would silently skip the target.♻️ Proposed fix
-.PHONY: help start stop restart logs test build clean status shell demo +.PHONY: help start stop restart logs test build clean status shell demo \ + start-infra start-bus start-core start-agents start-obs status-links🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` at line 7, The .PHONY declaration is missing several modular targets so they may be skipped if files with those names exist; update the .PHONY line (the .PHONY declaration in the Makefile) to include start-infra, start-bus, start-core, start-agents, start-obs, and status-links alongside the existing targets to ensure Make treats them as phony targets..github/workflows/deploy-oracle.yml (1)
40-42: Replacesleep 45with a wait-for-healthy loop.A fixed 45s sleep is both flaky (Kafka/Postgres cold-starts vary) and wasteful. Consider polling
docker compose ps --format jsonfor health status (or loopingcurlwith a timeout) so deployments fail fast on real problems and don't over-wait on fast restarts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-oracle.yml around lines 40 - 42, Replace the fixed "sleep 45" with a health-wait loop that polls "docker compose ps --format json" for the Kafka and Postgres service health states (or uses curl against their health endpoints) until both report "healthy" or a short timeout is reached; the loop should print intermediate status with "docker compose ps" output, exit non‑zero on timeout, and break immediately when services are healthy. Locate the block containing the literal "sleep 45" and update it to repeatedly call "docker compose ps --format json" (parse with jq or similar) to check each service's Health/State, use a configurable timeout/retry count, and fail fast if any service enters a non-recoverable state. Ensure logs show current health each iteration and that the workflow step returns a failure on timeout so deployments don't hang.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy-oracle.yml:
- Around line 44-46: Health checks fail because api-gateway is not exposing port
8080 in prod and orchestrator has no port mapping and a port mismatch; modify
the deploy workflow to perform health checks inside the compose network (use
docker compose exec -T api-gateway wget/curl to probe
http://localhost:8080/health and docker compose exec -T orchestrator wget/curl
to probe http://localhost:9091/healthz) OR expose loopback-only ports for probes
in docker-compose.prod.yml, and update the orchestrator service in
docker-compose.yml/docker-compose.prod.yml to uncomment and correct the port
mapping to map host:container 9091 (ensure AI_ORG_SERVER__PORT is 9091) so the
health endpoint aligns with the container port.
In `@go-backend/internal/orchestrator/server/saga.go`:
- Around line 75-82: HandleFailure currently treats any error from
sc.redis.Client.Get(ctx, key).Result() as "no active saga", masking connectivity
issues; update HandleFailure to check for redis.Nil specifically (if err ==
redis.Nil) and return the "no active saga" error only in that case, otherwise
return the original Redis error so connection/timeouts surface; add imports for
"errors" and "github.com/redis/go-redis/v9" if not already present and ensure
the code path that unmarshals val into SagaState (variable state) runs only when
err is nil.
In `@Makefile`:
- Around line 53-56: The start target in the Makefile currently runs `docker
compose up -d` which won't start services that are gated behind profiles; update
the start recipe (target name "start") to invoke docker compose with all
required --profile flags (activate profiles for infra, bus, core, agents, obs —
i.e., the profiles that gate services like egress-proxy, postgres, qdrant,
redis, zookeeper, kafka, kafka-ui, orchestrator, ws-hub, api-gateway, agents,
dashboard, prometheus, jaeger) so the full stack comes up, and also extend the
.PHONY declaration to include the new modular targets start-infra, start-bus,
start-core, start-agents, start-obs so they are treated as phony targets.
---
Outside diff comments:
In `@docker-compose.yml`:
- Around line 147-167: The redis service has a mismatch: the command uses
"--maxmemory 256mb" while deploy.resources.limits.memory is "128M", which will
cause OOMs; either lower Redis's maxmemory to below the cgroup limit (e.g., set
"--maxmemory 96mb" in the redis service command) or raise
deploy.resources.limits.memory to at least ~320M to allow 256MB plus Redis
overhead; update the redis service block (the command string and/or
deploy.resources.limits.memory) and ensure the chosen change is consistent with
appendonly/AOF considerations before committing.
- Around line 488-503: The grafana service lacks a profiles entry and a memory
limit; add profiles: ["obs"] to the grafana service block and add
deploy.resources.limits.memory (matching the pattern used for other obs
services) so it only starts with the obs profile and enforces the same memory
cap; ensure depends_on still references prometheus and jaeger consistently with
the other obs services.
- Around line 257-342: The 128M cgroup cap is too low for the Go services
(orchestrator, ws-hub, api-gateway) and will likely cause OOMs under spikes;
update the docker-compose service definitions for orchestrator, ws-hub, and
api-gateway to either increase the deploy.resources.limits.memory to 256M or add
a GOMEMLIMIT environment variable (e.g. GOMEMLIMIT=100MiB) alongside the
existing environment block to keep the Go runtime within the cgroup, and ensure
you run load tests and monitor container_memory_working_set_bytes before
finalizing the cap.
---
Nitpick comments:
In @.github/workflows/deploy-oracle.yml:
- Around line 40-42: Replace the fixed "sleep 45" with a health-wait loop that
polls "docker compose ps --format json" for the Kafka and Postgres service
health states (or uses curl against their health endpoints) until both report
"healthy" or a short timeout is reached; the loop should print intermediate
status with "docker compose ps" output, exit non‑zero on timeout, and break
immediately when services are healthy. Locate the block containing the literal
"sleep 45" and update it to repeatedly call "docker compose ps --format json"
(parse with jq or similar) to check each service's Health/State, use a
configurable timeout/retry count, and fail fast if any service enters a
non-recoverable state. Ensure logs show current health each iteration and that
the workflow step returns a failure on timeout so deployments don't hang.
In `@Makefile`:
- Line 7: The .PHONY declaration is missing several modular targets so they may
be skipped if files with those names exist; update the .PHONY line (the .PHONY
declaration in the Makefile) to include start-infra, start-bus, start-core,
start-agents, start-obs, and status-links alongside the existing targets to
ensure Make treats them as phony targets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 957fe76d-013d-4a11-96ae-dbbda89d8e0b
📒 Files selected for processing (14)
.github/workflows/deploy-oracle.ymlMakefileagents/base_agent.pycheck_services.shdashboard/Dockerfiledashboard/next.config.jsdashboard/package.jsondocker-compose.ymlgo-backend/cmd/orchestrator/main.gogo-backend/go.modgo-backend/internal/orchestrator/server/results.gogo-backend/internal/orchestrator/server/saga.gogo-backend/internal/shared/config/config.gosecurity-check/src/validator.rs
✅ Files skipped from review due to trivial changes (5)
- dashboard/Dockerfile
- go-backend/go.mod
- dashboard/next.config.js
- security-check/src/validator.rs
- check_services.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- go-backend/cmd/orchestrator/main.go
- go-backend/internal/orchestrator/server/results.go
- agents/base_agent.py
- dashboard/package.json
| echo "==> Health check..." | ||
| curl -f http://localhost:8080/health || (echo "FAIL: API Gateway not healthy" && exit 1) | ||
| curl -f http://localhost:9091/healthz || (echo "FAIL: Orchestrator not healthy" && exit 1) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether docker-compose.prod.yml clears ports for api-gateway / orchestrator.
fd -e yml -e yaml docker-compose.prod | xargs -I{} sh -c 'echo "=== {} ==="; cat -n {}'Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 2734
🏁 Script executed:
fd -e yml -e yaml docker-compose.yml | xargs -I{} sh -c 'echo "=== {} ==="; cat -n {} | head -100'Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 7119
🏁 Script executed:
rg -A 10 "^\s+api-gateway:|^\s+orchestrator:" docker-compose.ymlRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1356
🏁 Script executed:
rg -B 5 -A 15 "api-gateway:" docker-compose.yml | grep -A 20 "api-gateway:"Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1179
Health checks fail: api-gateway cleared in prod, orchestrator never exposed.
The curl commands will fail when deployed:
api-gateway:portsis cleared to[]indocker-compose.prod.yml(line 17), sohttp://localhost:8080/healthfails in prod.orchestrator:portsis commented out in basedocker-compose.ymland not overridden in prod, sohttp://localhost:9091/healthzfails in both dev and prod. Additionally, the commented port mapping specifies9090but the service runs on9091(AI_ORG_SERVER__PORT: 9091), creating a port mismatch.
Either expose loopback-only ports for health probes in prod, or run checks inside the container network (e.g., docker compose exec -T api-gateway wget --spider -q http://localhost:8080/health). For orchestrator, uncomment and correct the port mapping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/deploy-oracle.yml around lines 44 - 46, Health checks fail
because api-gateway is not exposing port 8080 in prod and orchestrator has no
port mapping and a port mismatch; modify the deploy workflow to perform health
checks inside the compose network (use docker compose exec -T api-gateway
wget/curl to probe http://localhost:8080/health and docker compose exec -T
orchestrator wget/curl to probe http://localhost:9091/healthz) OR expose
loopback-only ports for probes in docker-compose.prod.yml, and update the
orchestrator service in docker-compose.yml/docker-compose.prod.yml to uncomment
and correct the port mapping to map host:container 9091 (ensure
AI_ORG_SERVER__PORT is 9091) so the health endpoint aligns with the container
port.
| val, err := sc.redis.Client.Get(ctx, key).Result() | ||
| if err != nil { | ||
| return fmt.Errorf("no active saga found in redis for project %s", projectID) | ||
| } | ||
|
|
||
| var state SagaState | ||
| if err := json.Unmarshal([]byte(val), &state); err != nil { | ||
| return err |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm callers of HandleFailure ignore the error (which would hide Redis outages).
rg -nP --type=go -C2 '\bHandleFailure\s*\('Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1156
🏁 Script executed:
#!/bin/bash
# Check the actual code at lines 75-82 and surrounding context in saga.go
sed -n '1,100p' go-backend/internal/orchestrator/server/saga.go | head -100Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 3004
🏁 Script executed:
#!/bin/bash
# Look at imports in saga.go
rg -n '^import' go-backend/internal/orchestrator/server/saga.go -A 20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 817
🏁 Script executed:
#!/bin/bash
# Search for redis.Nil usage in the codebase to see existing patterns
rg -n 'redis\.Nil' --type=goRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 164
🏁 Script executed:
#!/bin/bash
# Check the db/redis.go or similar to understand the Redis wrapper
find go-backend -name '*redis*' -type fRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 134
🏁 Script executed:
#!/bin/bash
# Check if errors package is imported anywhere else in the go-backend
rg -n '"errors"' go-backend --type=go | head -20Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 156
🏁 Script executed:
#!/bin/bash
# Check redis.Nil usage pattern in client.go
sed -n '60,75p' go-backend/internal/shared/redis/client.goRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 499
🏁 Script executed:
#!/bin/bash
# Check db/redis.go structure to understand the wrapper
cat -n go-backend/internal/shared/db/redis.go | head -40Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1261
🏁 Script executed:
#!/bin/bash
# Verify that HandleFailure is actually ignoring the error at the call site
sed -n '130,140p' go-backend/internal/orchestrator/server/results.goRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 407
Differentiate redis.Nil from connectivity errors in HandleFailure.
The code collapses both missing sagas and Redis outages into the same error message. Get(...).Result() returns redis.Nil when the key is absent but a different error for connection/timeout failures. Since the error is ignored at the call site (results.go line 134), real infrastructure failures become invisible.
Adopt the pattern already used in redis/client.go (line 66) to check if err == redis.Nil separately. This requires importing "github.com/redis/go-redis/v9" (already imported in db/redis.go) and "errors" (already used in keystore.go).
Proposed fix
- val, err := sc.redis.Client.Get(ctx, key).Result()
- if err != nil {
- return fmt.Errorf("no active saga found in redis for project %s", projectID)
- }
-
- var state SagaState
- if err := json.Unmarshal([]byte(val), &state); err != nil {
- return err
- }
+ val, err := sc.redis.Client.Get(ctx, key).Result()
+ if err != nil {
+ if errors.Is(err, redis.Nil) {
+ log.Warn("no active saga found for project", zap.String("project_id", projectID))
+ return fmt.Errorf("no active saga found in redis for project %s", projectID)
+ }
+ log.Error("failed to load saga state from redis", zap.Error(err), zap.String("project_id", projectID))
+ return fmt.Errorf("load saga state: %w", err)
+ }
+
+ var state SagaState
+ if err := json.Unmarshal([]byte(val), &state); err != nil {
+ log.Error("failed to unmarshal saga state", zap.Error(err), zap.String("project_id", projectID))
+ return fmt.Errorf("unmarshal saga state: %w", err)
+ }Add imports: "errors" and "github.com/redis/go-redis/v9"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val, err := sc.redis.Client.Get(ctx, key).Result() | |
| if err != nil { | |
| return fmt.Errorf("no active saga found in redis for project %s", projectID) | |
| } | |
| var state SagaState | |
| if err := json.Unmarshal([]byte(val), &state); err != nil { | |
| return err | |
| val, err := sc.redis.Client.Get(ctx, key).Result() | |
| if err != nil { | |
| if errors.Is(err, redis.Nil) { | |
| log.Warn("no active saga found for project", zap.String("project_id", projectID)) | |
| return fmt.Errorf("no active saga found in redis for project %s", projectID) | |
| } | |
| log.Error("failed to load saga state from redis", zap.Error(err), zap.String("project_id", projectID)) | |
| return fmt.Errorf("load saga state: %w", err) | |
| } | |
| var state SagaState | |
| if err := json.Unmarshal([]byte(val), &state); err != nil { | |
| log.Error("failed to unmarshal saga state", zap.Error(err), zap.String("project_id", projectID)) | |
| return fmt.Errorf("unmarshal saga state: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-backend/internal/orchestrator/server/saga.go` around lines 75 - 82,
HandleFailure currently treats any error from sc.redis.Client.Get(ctx,
key).Result() as "no active saga", masking connectivity issues; update
HandleFailure to check for redis.Nil specifically (if err == redis.Nil) and
return the "no active saga" error only in that case, otherwise return the
original Redis error so connection/timeouts surface; add imports for "errors"
and "github.com/redis/go-redis/v9" if not already present and ensure the code
path that unmarshals val into SagaState (variable state) runs only when err is
nil.
| start: | ||
| @echo "🚀 Starting AI Organization..." | ||
| @echo "🚀 Starting AI Organization (Full Stack)..." | ||
| @docker compose up -d | ||
| @$(MAKE) status-links |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm how many services in docker-compose.yml are gated behind profiles vs. unprofiled.
rg -nP '^\s{2}[a-zA-Z0-9_-]+:\s*$|^\s{4}profiles:' docker-compose.ymlRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1314
🏁 Script executed:
sed -n '527,540p' docker-compose.ymlRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 480
🏁 Script executed:
sed -n '1,20p' MakefileRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 873
🏁 Script executed:
rg 'start-(infra|bus|core|agents|obs)' MakefileRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 505
make start will fail to launch the "Full Stack" — Docker Compose profiles gate out almost every service.
After the profiles change in docker-compose.yml, nearly every service (egress-proxy, postgres, qdrant, redis, zookeeper, kafka, kafka-ui, orchestrator, ws-hub, api-gateway, all agents, dashboard, prometheus, jaeger) is gated behind a profile. Per Compose semantics, services with profiles: set only start when their profile is explicitly activated. Running docker compose up -d without profile flags will start only grafana — not the full stack.
Fix line 55 to activate all profiles:
start:
`@echo` "🚀 Starting AI Organization (Full Stack)..."
- `@docker` compose up -d
+ `@docker` compose --profile infra --profile bus --profile core --profile agents --profile dashboard --profile obs up -d
@$(MAKE) status-linksAlso, update the .PHONY declaration on line 7 to include the new modular targets (start-infra, start-bus, start-core, start-agents, start-obs).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| start: | |
| @echo "🚀 Starting AI Organization..." | |
| @echo "🚀 Starting AI Organization (Full Stack)..." | |
| @docker compose up -d | |
| @$(MAKE) status-links | |
| start: | |
| `@echo` "🚀 Starting AI Organization (Full Stack)..." | |
| `@docker` compose --profile infra --profile bus --profile core --profile agents --profile dashboard --profile obs up -d | |
| @$(MAKE) status-links |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 53 - 56, The start target in the Makefile currently
runs `docker compose up -d` which won't start services that are gated behind
profiles; update the start recipe (target name "start") to invoke docker compose
with all required --profile flags (activate profiles for infra, bus, core,
agents, obs — i.e., the profiles that gate services like egress-proxy, postgres,
qdrant, redis, zookeeper, kafka, kafka-ui, orchestrator, ws-hub, api-gateway,
agents, dashboard, prometheus, jaeger) so the full stack comes up, and also
extend the .PHONY declaration to include the new modular targets start-infra,
start-bus, start-core, start-agents, start-obs so they are treated as phony
targets.
…A Agent validation. Resolve CI compilation error.
This commit enables the local-first workbench mode (AI_ORG_LOCAL_MODE), parallelizes the Implementation phase (Frontend/Backend), and introduces the Desktop Nova surgical file-editing tools.
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
agents/agent_service.py (2)
328-399:⚠️ Potential issue | 🟡 MinorCI ruff failure:
_process_taskexceeds PLR0915 (51 > 50 statements).The pipeline fails on this function. Extract the LLM-client-resolution preamble (lines ~332-359) into a small helper such as
_prepare_agent_for_task(task_msg) -> (agent_instance, provider, model_name, llm_client)to drop below the threshold and simultaneously make_process_taskeasier to read.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/agent_service.py` around lines 328 - 399, The _process_task function is over the PLR0915 statement limit; extract the LLM client resolution and agent preparation block into a helper named _prepare_agent_for_task(task_msg) that returns (agent_instance, provider, model_name, llm_client). Move logic that reads llm_config from task_msg.input_data, calls _build_llm_client(llm_config, self.role), patches self.agent with llm_client/model_name/provider, and instantiates/configures agent_instance via _load_agent into this helper, then call it from _process_task and use the returned values in place of the inlined code.
400-420:⚠️ Potential issue | 🔴 CriticalFix critical type mismatch in mock LLM path and prevent stale cost/usage reporting.
Line 350 has a critical bug:
_mock_llm_response()returnstuple[str, dict[str, int]](per line 585), butcall_llm()declares return type-> str. Whenllm_client is None, callers expecting a string will receive a tuple, causing crashes or unpredictable behavior.Additionally, mock mode and semantic-cache hits bypass the cost/usage tracking at lines 87–88:
- Mock path (line 350): Returns early without setting
_last_cost/_last_usage, leaving stale values from the previous task. Since the agent instance persists across tasks, mock/cached tasks will incorrectly report the previous task's cost and tokens.- Cache hit (lines 40–45): Returns early without setting these values, same issue.
The code also does not reset
_last_cost/_last_usageper task (no reset inexecute_task). If an agent ever makes multiple LLM calls within a single task, only the last call's cost will be reported; earlier calls are lost.Fixes needed:
- Fix mock mode to return only
str(extract tuple destructuring incall_llmline 350, or refactor_mock_llm_responseto return just text).- Set
_last_costand_last_usagefor mock and cache-hit paths (use mock's hardcodedusage = {"prompt_tokens": 100, "completion_tokens": 50}andcost_usd=0; for cache hits, cost is legitimately0).- Reset
_last_cost = 0.0and_last_usage = {}at the start ofexecute_task()to prevent cross-task contamination.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/agent_service.py` around lines 400 - 420, call_llm currently returns a tuple in the mock path causing a type mismatch and mock/cache paths never set/reset cost/usage; update so _mock_llm_response and call_llm agree on returning only str (or destructure the tuple inside call_llm to extract text) and ensure that both the mock branch in call_llm and the cache-hit branch set agent_instance._last_cost (0.0) and agent_instance._last_usage (e.g., {"prompt_tokens":100,"completion_tokens":50} for the mock response and zeros for cache hits) before returning; additionally, in execute_task, reset agent_instance._last_cost = 0.0 and agent_instance._last_usage = {} at the start of each task so stale values cannot carry over across tasks, and when multiple LLM calls occur within one task update _last_cost/_last_usage on every call so the final values reflect the last call.README.md (3)
54-62:⚠️ Potential issue | 🟡 MinorStale
makecommand table inside the Tech Stack section conflicts with the new Desktop quick-start.A "Command / Description" sub-table (
make start,make stop,make clean,make logs) sits under## Tech Stackeven though the rewritten Quick Start now runspython3 desktop_nova.pyand the PR description calls out new modularmaketargets (infra,bus,core,agents,obs). The table is in the wrong section and out of date — either move it under a "Makefile Targets" section reflecting the new modular commands, or remove it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 54 - 62, The Tech Stack README contains a stale "Command / Description" make table listing `make start`, `make stop`, `make clean`, `make logs` that conflicts with the new Desktop quick-start (which instructs `python3 desktop_nova.py`) and the updated modular Make targets; either remove that table or move and refresh it under a new "Makefile Targets" section and replace entries with the new targets (`make infra`, `make bus`, `make core`, `make agents`, `make obs`) and their brief descriptions; ensure the Tech Stack keeps the desktop quick-start (`python3 desktop_nova.py`) and update any cross-references so the README reflects the new modular workflow.
25-45:⚠️ Potential issue | 🟡 MinorBroken "What It Does" numbered list — only step 1 remains.
The list starts at
1. CEO Agentand then jumps directly to---followed by the new "Operational Modes" heading. The original steps for CTO / Engineers / QA / DevOps were dropped mid-list, leaving a visibly truncated narrative. Restore steps 2+ above the---, or convert step 1 into a paragraph if the per-agent sequence is intentionally being replaced by the mode table.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 25 - 45, The "What It Does" numbered list was truncated after "1. **CEO Agent**" before the horizontal rule (`---`) and "Operational Modes" section; restore the missing agent steps (e.g., "2. **CTO Agent**", "3. **Engineers**", "4. **QA**", "5. **DevOps**") above the `---`, preserving the numbered list format under the "## What It Does" heading, or if the intent is to remove the per-agent sequence, replace the single "1. **CEO Agent**" list item with a short explanatory paragraph summarizing the replaced workflow; update the content around the `---` and the "## 🛠️ Operational Modes" heading so the document reads coherently.
3-3:⚠️ Potential issue | 🟡 MinorUpdate Go badge version to 1.25.0.
README.md line 3 badges Go
1.24.0, butgo-backend/go.modspecifies1.25.0. Update the badge to match.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 3, Update the Go version badge in README.md from 1.24.0 to 1.25.0 so it matches the go version declared in go-backend/go.mod; locate the badge line containing the Go shield (the markdown with https://img.shields.io/badge/go-1.24.0-00ADD8...) and change the version token to 1.25.0.tools/docker_sandbox.py (1)
63-83:⚠️ Potential issue | 🟠 MajorSecret leak risk: env values flow into debug logs and host
psoutput.The
devops_agent.py:181-184pattern (injecting AWS credentials viaenv) combined with line 83 —logger.debug("Executing isolated tool command", raw_cmd=" ".join(docker_cmd))— means every secret passed through the newenvparameter is written verbatim to the debug log. Even without debug enabled,docker run -e KEY=valueis visible to anyone on the host viaps auxe//proc/<pid>/cmdline; prefer--env-file(a short-lived tempfile, 0600) or streaming via stdin of anenv-reading wrapper.At minimum, redact keys that look sensitive before logging.
🛡️ Safer: use --env-file and redact the debug log
docker_cmd = [ "docker", "run", "--rm", "--name", container_name, "-v", f"{os.path.abspath(self.host_workspace if self.local_mode else self.working_dir)}:/workspace", "-w", "/workspace", ] - # Add environment variables - if env: - for k, v in env.items(): - docker_cmd.extend(["-e", f"{k}={v}"]) + # Add environment variables via --env-file (avoids exposing secrets in + # `ps` output and reduces accidental log leakage). + env_file_path: str | None = None + if env: + import tempfile + fd, env_file_path = tempfile.mkstemp(prefix="ai-org-env-", suffix=".env") + try: + with os.fdopen(fd, "w") as ef: + for k, v in env.items(): + ef.write(f"{k}={v}\n") + os.chmod(env_file_path, 0o600) + except Exception: + if env_file_path and os.path.exists(env_file_path): + os.remove(env_file_path) + raise + docker_cmd.extend(["--env-file", env_file_path]) @@ - logger.debug("Executing isolated tool command", raw_cmd=" ".join(docker_cmd)) - result = await self._run_subprocess(docker_cmd) + # Never log resolved env values. + logger.debug( + "Executing isolated tool command", + raw_cmd=" ".join(docker_cmd), + env_keys=list(env.keys()) if env else [], + ) + try: + result = await self._run_subprocess(docker_cmd) + finally: + if env_file_path and os.path.exists(env_file_path): + try: + os.remove(env_file_path) + except OSError: + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/docker_sandbox.py` around lines 63 - 83, The debug log and docker command currently expose plaintext env values; fix by writing env to a short-lived tempfile with 0o600 permissions and pass it to docker via "--env-file", remove any individual "-e" entries from docker_cmd; ensure the tempfile is securely deleted after use (use tempfile.NamedTemporaryFile with delete=False or similar and os.remove in finally). Also stop logging raw env values: replace logger.debug(..., raw_cmd=" ".join(docker_cmd)) with a redacted command string (e.g., build redacted_cmd that masks values for keys matching patterns like "KEY|SECRET|TOKEN|AWS|PASS|PASSWORD") or omit env values entirely from logs; update references to docker_cmd construction and logger.debug in this file accordingly.agents/base_agent.py (1)
349-350:⚠️ Potential issue | 🔴 CriticalCritical:
call_llmnow returns a tuple instead ofstrwhen no LLM client is configured.
_mock_llm_responsewas changed to returntuple[str, dict[str, int]](line 598), butcall_llmis declared-> strand simply forwards its result on line 350. Every caller (e.g.,self_critiqueat line 702json.loads(raw), andagents/agent_service.pywhich consumes the string) will now receive a tuple and break in mock mode. This also leaves_last_usage/_last_costunset, so downstream cost publishing gets stale data from the prior task.🩹 Suggested fix
if self.llm_client is None: - return self._mock_llm_response(messages) + text, usage = self._mock_llm_response(messages) + self._last_usage = usage + self._last_cost = 0.0 + return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 349 - 350, call_llm currently declares -> str but forwards the new _mock_llm_response which returns (str, dict) causing callers (e.g., self_critique and agents/agent_service.py) to receive a tuple and leaving _last_usage/_last_cost unset; modify call_llm (and its type hint) to unpack the tuple when llm_client is None: call _mock_llm_response(messages), assign the returned (raw, usage) to local variables, set self._last_usage and self._last_cost from usage, then return raw (a str) so callers keep receiving a string; ensure the function signature/type hint reflects the returned str and that _mock_llm_response is only used in the mock branch.
♻️ Duplicate comments (2)
tools/git_tool.py (1)
89-103:⚠️ Potential issue | 🔴 CriticalToken still leaks through the temporary remote URL.
This still embeds
GITHUB_TOKENin thegit remote set-urlargument and writes it into.git/config; ifgit pushor the process fails before Line 101, the token can remain persisted. Prefer GitHub CLI/Git Credential Manager or a credential helper flow that never writes the token into the remote URL.Read-only verification that the unsafe token-in-remote pattern is still present:
#!/bin/bash set -euo pipefail fd -t f "git_tool.py" -x awk ' /authed_url = url\.replace/,/\["git", "remote", "set-url", remote, authed_url\]/ { print FILENAME ":" FNR ":" $0 } ' {}Safer shape to avoid persisting the token in the remote URL
- authed_url = url.replace( - "https://github.com/", f"https://{token}@github.com/" - ) - # Temporarily update remote URL for push - await self._run_subprocess( - ["git", "remote", "set-url", remote, authed_url], - cwd=self.repo_path, - ) - res = await self._run_subprocess( - ["git", "push", remote, branch], cwd=self.repo_path - ) - # Restore original URL - await self._run_subprocess( - ["git", "remote", "set-url", remote, url], cwd=self.repo_path - ) - return res + # Do not embed GITHUB_TOKEN into the remote URL. Require the + # environment to provide auth via gh/GCM/credential helper. + return await self._run_subprocess( + ["git", "push", remote, branch], cwd=self.repo_path + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/git_tool.py` around lines 89 - 103, The current code in git_tool.py builds authed_url and calls _run_subprocess to set-url (which writes the token into .git/config) and later restores it; instead avoid persisting the token by removing the git remote set-url calls around _run_subprocess and instead perform an authenticated push that never writes credentials to config — e.g. use an in-memory credential helper via GIT_ASKPASS (or GitHub CLI/credential helper) when calling _run_subprocess for the push so the token is supplied only via environment and not embedded in the remote URL; update the logic around the variables authed_url, remote and branch and the _run_subprocess calls to create and use a transient askpass helper (or invoke gh auth push) and ensure no code path writes authed_url into the repo config.agents/base_agent.py (1)
370-396:⚠️ Potential issue | 🟠 MajorBudget gate is still race-prone under concurrent calls (same project).
Even after the project-scoped key change, the
GET → compare → INCRBYFLOATpattern is still two round-trips, so multiple concurrent calls for the sameproject_idcan each observeused < limitand collectively blow past the limit. Consider folding the check+increment into a single atomic Redis Lua script (EVAL) orWATCH/MULTI/EXEC. This was raised in a prior review round.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 370 - 396, The current budget gate using budget_key (built from self._current_project_id) reads used via redis_client.get, compares to self.budget_limit, then later calls redis_client.incrbyfloat — this GET → compare → INCRBYFLOAT is race-prone under concurrent calls; replace that pattern in the budget check and update around _execute_provider/_calculate_actual_cost with a single atomic Redis operation (either an EVAL Lua script that checks current value + increments by cost_usd and returns success/failure, or a WATCH/MULTI/EXEC transaction) so the check+increment is atomic; update the code paths referencing budget_key, redis_client, budget_limit, and preserve the existing logging/RuntimeError behavior when the increment would exceed the limit.
🧹 Nitpick comments (3)
tools/docker_sandbox.py (1)
24-25: Minor:AI_ORG_LOCAL_WORKSPACEfallback can drift fromself.working_dir.
os.getenv("AI_ORG_LOCAL_WORKSPACE", os.getcwd())resolves at__init__, whileself.working_dir(fromBaseTool) is also derived fromos.getcwd()at its own__init__. If a caller later setssandbox.working_dir = project_path(the pattern shown indevops_agent.py), togglinglocal_mode=truesilently ignores that override and mounts the original CWD instead. Consider falling back toself.working_dir(evaluated lazily) whenAI_ORG_LOCAL_WORKSPACEis unset.- self.local_mode = os.getenv("AI_ORG_LOCAL_MODE", "false").lower() == "true" - self.host_workspace = os.getenv("AI_ORG_LOCAL_WORKSPACE", os.getcwd()) + self.local_mode = os.getenv("AI_ORG_LOCAL_MODE", "false").lower() == "true" + # Only set host_workspace if explicitly configured; otherwise defer to + # self.working_dir at run time so callers that mutate working_dir are honored. + self.host_workspace = os.getenv("AI_ORG_LOCAL_WORKSPACE") or None @@ - f"{os.path.abspath(self.host_workspace if self.local_mode else self.working_dir)}:/workspace", + f"{os.path.abspath((self.host_workspace or self.working_dir) if self.local_mode else self.working_dir)}:/workspace",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/docker_sandbox.py` around lines 24 - 25, The current init sets self.host_workspace to os.getenv("AI_ORG_LOCAL_WORKSPACE", os.getcwd()) which freezes in the original CWD and ignores later changes to self.working_dir from BaseTool; change the logic in DockerSandbox.__init__ to read AI_ORG_LOCAL_WORKSPACE with a None default (os.getenv("AI_ORG_LOCAL_WORKSPACE", None)) and if that returns None set self.host_workspace = self.working_dir (or delay evaluation by making host_workspace a property that returns os.getenv(...) or self.working_dir), so updates to self.working_dir after construction are respected; reference self.host_workspace, self.working_dir and BaseTool when locating where to apply the fix.tools/file_edit_tool.py (1)
62-74: Clean up the.tmpfile on write failures.If
f.write(new_content)raises (disk full, encoding error after a partial flush, etc.), the.tmpsibling is left on disk and theexceptat line 76 returns a generic error without cleanup. Over time this can litter the workspace — especially since agents iterate on edits.- tmp_path = full_path + ".tmp" - with open(tmp_path, 'w', encoding='utf-8') as f: - f.write(new_content) - - os.replace(tmp_path, full_path) + tmp_path = full_path + ".tmp" + try: + with open(tmp_path, 'w', encoding='utf-8') as f: + f.write(new_content) + os.replace(tmp_path, full_path) + except Exception: + if os.path.exists(tmp_path): + try: + os.remove(tmp_path) + except OSError: + pass + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/file_edit_tool.py` around lines 62 - 74, The write path currently creates a tmp_path and writes new_content but leaves the .tmp file behind if f.write or subsequent operations raise; update the editing routine around tmp_path (the block that opens tmp_path, calls f.write(new_content), and then os.replace(tmp_path, full_path)) to ensure the .tmp file is removed on any failure: perform the write inside a try/except/finally (or nested try) so that on exception you call os.remove(tmp_path) if it exists, only call os.replace if the write completed successfully, and preserve the existing ToolResult return behavior while logging the error; reference tmp_path, f.write(new_content), os.replace and the current exception handling to locate where to add the cleanup.architecture.md (1)
56-64: Minor: duplicate🧱section icon / overlapping content with "Core Components".The new "🧱 Expert Coordination Mesh" at line 56 and the existing "🧱 Core Components" at line 64 use the same emoji and cover overlapping ground (both describe the Go orchestrator + agent responsibilities). Consider merging, or at least picking a distinct icon/heading to avoid reader confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@architecture.md` around lines 56 - 64, The two adjacent headings "🧱 Expert Coordination Mesh" and "🧱 Core Components" duplicate the same emoji and overlap in content about the Go Orchestrator and agent responsibilities; either merge the "Expert Coordination Mesh" content into the "Core Components" section (consolidate mentions of Go Orchestrator, Strategic Experts, Surgical Execution, LocalFileEditTool, and AI_ORG_LOCAL_MODE) or rename one heading and pick a distinct icon (e.g., "🧭 Expert Coordination Mesh" or "⚙️ Core Components") and remove the redundant sentences so the Go Orchestrator, LocalFileEditTool, and AI_ORG_LOCAL_MODE descriptions appear only once and flow logically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/base_agent.py`:
- Around line 154-165: The failing lint warnings are caused by trailing
whitespace on blank lines in the construction of the base_instructions string
(see base_instructions, local_mode, and self.ROLE); remove any trailing spaces
at the ends of the blank lines inside the f-string and the multi-line
concatenation so blank lines are truly empty (no trailing spaces) and re-run
ruff to verify W293 is resolved; repeat this cleanup for other similar
occurrences in the file (around where base_instructions is built and the other
reported locations).
- Around line 600-631: In _calculate_actual_cost replace the exact-key lookup of
pricing with prefix-matching against the pricing keys (e.g. use next(... if
model.startswith(k) ...)) so versioned or suffixed model IDs match their family
rates, and if no match is found log a warning (use the module/class logger) that
the fallback pricing {"prompt":1.0,"completion":4.0} was used for the given
model; compute the final cost and return round(cost, 6) instead of
float(f"{cost:.6f}").
In `@agents/qa_agent.py`:
- Around line 437-460: The heuristic parsing of pytest output using result,
output, passed_count and failed_count can misreport qa_passed when
result.success is False but no "FAILED" substrings are found; update the return
logic so that when result.success is False you force passed_count = 0,
failed_count = max(1, failed_count), total = passed_count + failed_count, errors
= 1, and include the failure detail in failures (e.g., keep {"test":
"sandbox_execution", "error": result.error}); keep the substring counts only
when result.success is True. Also consider switching to pytest-json-report (add
--json-report and parse structured output) for a more robust fix referenced in
the code that computes qa_passed from test_results and security_results.
- Around line 497-521: The _run_coverage_analysis function currently emits a
hardcoded 50.0% when sandbox_tool.run succeeds and swallows exceptions; change
it to produce truthful parsed metadata and log errors: instead of returning
line_coverage_pct: 50.0 in the success path, run pytest with a machine-readable
report (e.g., "pytest --cov=backend --cov-report=json:coverage.json"), then open
and parse the JSON's totals.percent_covered into line_coverage_pct and set
parsed: True; if parsing fails set line_coverage_pct: None (or 0) and parsed:
False (include tools_used and sandbox_id as before). Also replace the bare
except Exception: with an except Exception as e: that calls logger.warning(...)
including the exception and context (function _run_coverage_analysis and
context.project_id) and returns {"line_coverage_pct": None, "passed": False,
"parsed": False}. Use the existing symbols _run_coverage_analysis,
sandbox_tool.run, and context.project_id to locate and modify the code.
- Around line 462-495: The _run_security_scan currently runs "bandit" inside the
sandbox image (via sandbox_tool.run) but the default image ("python:3.11-slim")
doesn't include bandit, so the command silently fails; fix by ensuring bandit is
installed in the container before invoking it—either change the sandbox
invocation in _run_security_scan to use an image that already has bandit, or
prepend an install step to the command passed to sandbox_tool.run (e.g., run
"pip install bandit" before "bandit -r . -f json -q || true"); update the cmd
passed to sandbox_tool.run or the image parameter used by sandbox_tool.run in
_run_security_scan so result.output contains bandit JSON output when present.
In `@desktop_nova.py`:
- Around line 35-49: create_llm_client currently returns None or constructs SDKs
with possibly-missing API keys leading BaseAgent to silently fall back to
_mock_llm_response; update create_llm_client to validate the provider and
required env var (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY,
AWS_REGION/AWS creds for bedrock) and if the provider is unrecognized or the
required key is absent, raise a clear exception (ValueError or custom
InvalidProviderError) with a descriptive message including the provider name and
which key is missing; also add a short guard in BaseAgent (where
_mock_llm_response is invoked) to surface the same error rather than silently
using mocks so callers get an explicit failure instead of silent placeholder
output.
- Around line 102-120: Replace the direct access to the private
engine._active_projects with the public API
engine.get_project_status(project_id) to retrieve the current status/metadata
(use project_id and engine.output_dir for messages); implement a polling timeout
(e.g., timeout_seconds = 7200) by recording start = time.monotonic() and
breaking with a failure message if elapsed >= timeout_seconds, otherwise poll
every 2 seconds via await asyncio.sleep(2); preserve the existing status checks
for "completed" and "failed" and on exception log via logger.error and exit as
before.
In `@go-backend/cmd/ws-hub/main.go`:
- Around line 103-116: The writePump currently closes the websocket on
WriteMessage error but there is no read deadline or ping/pong handling to detect
half-open TCP connections, so add a read timeout and pong handler in the reader
loop to ensure dead peers are discovered: in the connection setup / reader
goroutine (the code that calls c.conn.ReadMessage and the place that calls
reg.remove), set an initial SetReadDeadline and register a SetPongHandler that
updates the read deadline on each pong; also ensure a periodic ping ticker is
started (paired with the existing 5s write deadline in writePump) to send pings
so inactive clients are timed out; reference the reader goroutine,
c.conn.SetReadDeadline, c.conn.SetPongHandler, the ping ticker that sends
websocket.PingMessage, and reg.remove/c.send so the lifecycle remains coupled
and buffered messages are drained.
- Around line 64-100: The remove/broadcast race: ensure broadcast holds
r.mu.RLock() while iterating and sending to each client's c.send to prevent
concurrent map writes, and change remove so it closes c.send while still holding
r.mu (i.e., perform delete(r.conns[projectID], c) and then close(c.send) before
r.mu.Unlock()). Update registry.remove and registry.broadcast to use these
locking rules (references: registry.remove, registry.broadcast, r.conns map, and
client.c.send) so iteration and channel close cannot race.
In `@go-backend/internal/orchestrator/server/grpc.go`:
- Around line 150-163: The switch on status currently omits the "pending" case
so rows with status "pending" map to PROJECT_STATUS_UNSPECIFIED; update the
switch (the branch that sets p.Status based on the local variable status and
assigns to p.Status) to add a case for "pending" that sets p.Status =
pb.ProjectStatus_PROJECT_STATUS_PENDING so DB rows created with status 'pending'
(per migrations/001_init.up.sql) are reported correctly.
- Around line 134-145: The GetProject RPC currently queries projects by id only
and wraps all DB errors as "project not found"; update the SQL in
OrchestratorServer.GetProject to include tenant isolation (e.g., add "AND
tenant_id = $2" and optionally "AND user_id = $3") and validate the
authenticated caller from gRPC metadata or the req.GetTenantId() before
querying, using req.TenantId (or metadata) as the additional parameter(s).
Replace the local variable named status with a non-conflicting name like
statusStr to avoid shadowing the grpc status package, and change error handling
to distinguish no-rows vs other DB errors: use errors.Is(err, pgx.ErrNoRows) to
return status.Error(codes.NotFound, "...") for missing projects and return
status.Error(codes.Internal, fmt.Sprintf("db error: %v", err)) for other errors.
Add imports for errors, github.com/jackc/pgx/v5, google.golang.org/grpc/codes
and google.golang.org/grpc/status as needed.
In `@go-backend/internal/orchestrator/server/results.go`:
- Around line 54-63: Update the deferred panic handler in ResultHandler: fix the
log message text to "panic recovered in ResultHandler" (replace the typo
"Resulthandler") and change the log field from zap.Binary("payload", msg.Value)
to zap.ByteString("payload", msg.Value) so the JSON payload is rendered as text;
keep the existing zap.Any("panic", r) and the err = fmt.Errorf(...) behavior
intact.
- Around line 137-147: The code ignores the error from
h.db.QueryRow(...).Scan(&incompleteCount) and has a TOCTOU race where concurrent
Engineer_Backend/Engineer_Frontend handlers can both call dispatchNext and
create duplicate QA tasks; fix by checking and handling the Scan error (capture
the error, log via log.Error or zap and return the error to avoid treating
failure as zero) and make the QA dispatch atomic: either perform the
count+insert inside a DB transaction with appropriate locking (e.g., SELECT ...
FOR UPDATE on a project row or tasks aggregation) or add a partial unique index
on tasks(project_id) WHERE agent_role='QA' and update dispatchNext to catch and
ignore unique-violation errors (refer to dispatchNext, h.db.QueryRow, Scan,
incompleteCount) so duplicate QA tasks cannot be created.
In `@tools/file_edit_tool.py`:
- Around line 40-47: The computed variable snippet is never used, so update the
failure return in the code path where matches == 0 (the block that returns
ToolResult) to include the truncated file preview: append or embed snippet into
the ToolResult.error string (e.g., "String to replace not found in {file_path}.
Preview: {snippet}") so the agent gets the helpful context; locate the code that
constructs ToolResult in tools/file_edit_tool.py (the block referencing matches,
snippet, content, and ToolResult) and modify the error field to include snippet
(or remove the unused snippet line if you prefer dropping the preview).
- Around line 13-18: LocalFileEditTool is being instantiated without a
project-scoped working directory so it falls back to os.getcwd(); locate where
LocalFileEditTool() is created in the agent initialization (the
BaseAgent/constructor code that builds tools) and fix by either passing the
agent's project-scoped working_dir into the LocalFileEditTool constructor (same
approach used for DockerSandboxTool) or modify LocalFileEditTool.run() to derive
working_dir from the agent execution context/assigned project path at runtime so
edits are applied under the project-scoped directory rather than the server CWD.
- Around line 20-31: The run method in tools/file_edit_tool.py must validate
that the resolved file path stays inside self.working_dir to prevent path
traversal: after computing full_path (from file_path and self.working_dir)
resolve both to their absolute realpaths (to follow symlinks) and ensure the
file realpath startswith the working_dir realpath (or use os.path.commonpath)
and return ToolResult(success=False, error=...) if it does not; apply this check
in LocalFileEditTool.run before any os.path.exists/os operations and reject
absolute paths or paths with .. that escape the working_dir.
---
Outside diff comments:
In `@agents/agent_service.py`:
- Around line 328-399: The _process_task function is over the PLR0915 statement
limit; extract the LLM client resolution and agent preparation block into a
helper named _prepare_agent_for_task(task_msg) that returns (agent_instance,
provider, model_name, llm_client). Move logic that reads llm_config from
task_msg.input_data, calls _build_llm_client(llm_config, self.role), patches
self.agent with llm_client/model_name/provider, and instantiates/configures
agent_instance via _load_agent into this helper, then call it from _process_task
and use the returned values in place of the inlined code.
- Around line 400-420: call_llm currently returns a tuple in the mock path
causing a type mismatch and mock/cache paths never set/reset cost/usage; update
so _mock_llm_response and call_llm agree on returning only str (or destructure
the tuple inside call_llm to extract text) and ensure that both the mock branch
in call_llm and the cache-hit branch set agent_instance._last_cost (0.0) and
agent_instance._last_usage (e.g., {"prompt_tokens":100,"completion_tokens":50}
for the mock response and zeros for cache hits) before returning; additionally,
in execute_task, reset agent_instance._last_cost = 0.0 and
agent_instance._last_usage = {} at the start of each task so stale values cannot
carry over across tasks, and when multiple LLM calls occur within one task
update _last_cost/_last_usage on every call so the final values reflect the last
call.
In `@agents/base_agent.py`:
- Around line 349-350: call_llm currently declares -> str but forwards the new
_mock_llm_response which returns (str, dict) causing callers (e.g.,
self_critique and agents/agent_service.py) to receive a tuple and leaving
_last_usage/_last_cost unset; modify call_llm (and its type hint) to unpack the
tuple when llm_client is None: call _mock_llm_response(messages), assign the
returned (raw, usage) to local variables, set self._last_usage and
self._last_cost from usage, then return raw (a str) so callers keep receiving a
string; ensure the function signature/type hint reflects the returned str and
that _mock_llm_response is only used in the mock branch.
In `@README.md`:
- Around line 54-62: The Tech Stack README contains a stale "Command /
Description" make table listing `make start`, `make stop`, `make clean`, `make
logs` that conflicts with the new Desktop quick-start (which instructs `python3
desktop_nova.py`) and the updated modular Make targets; either remove that table
or move and refresh it under a new "Makefile Targets" section and replace
entries with the new targets (`make infra`, `make bus`, `make core`, `make
agents`, `make obs`) and their brief descriptions; ensure the Tech Stack keeps
the desktop quick-start (`python3 desktop_nova.py`) and update any
cross-references so the README reflects the new modular workflow.
- Around line 25-45: The "What It Does" numbered list was truncated after "1.
**CEO Agent**" before the horizontal rule (`---`) and "Operational Modes"
section; restore the missing agent steps (e.g., "2. **CTO Agent**", "3.
**Engineers**", "4. **QA**", "5. **DevOps**") above the `---`, preserving the
numbered list format under the "## What It Does" heading, or if the intent is to
remove the per-agent sequence, replace the single "1. **CEO Agent**" list item
with a short explanatory paragraph summarizing the replaced workflow; update the
content around the `---` and the "## 🛠️ Operational Modes" heading so the
document reads coherently.
- Line 3: Update the Go version badge in README.md from 1.24.0 to 1.25.0 so it
matches the go version declared in go-backend/go.mod; locate the badge line
containing the Go shield (the markdown with
https://img.shields.io/badge/go-1.24.0-00ADD8...) and change the version token
to 1.25.0.
In `@tools/docker_sandbox.py`:
- Around line 63-83: The debug log and docker command currently expose plaintext
env values; fix by writing env to a short-lived tempfile with 0o600 permissions
and pass it to docker via "--env-file", remove any individual "-e" entries from
docker_cmd; ensure the tempfile is securely deleted after use (use
tempfile.NamedTemporaryFile with delete=False or similar and os.remove in
finally). Also stop logging raw env values: replace logger.debug(..., raw_cmd="
".join(docker_cmd)) with a redacted command string (e.g., build redacted_cmd
that masks values for keys matching patterns like
"KEY|SECRET|TOKEN|AWS|PASS|PASSWORD") or omit env values entirely from logs;
update references to docker_cmd construction and logger.debug in this file
accordingly.
---
Duplicate comments:
In `@agents/base_agent.py`:
- Around line 370-396: The current budget gate using budget_key (built from
self._current_project_id) reads used via redis_client.get, compares to
self.budget_limit, then later calls redis_client.incrbyfloat — this GET →
compare → INCRBYFLOAT is race-prone under concurrent calls; replace that pattern
in the budget check and update around _execute_provider/_calculate_actual_cost
with a single atomic Redis operation (either an EVAL Lua script that checks
current value + increments by cost_usd and returns success/failure, or a
WATCH/MULTI/EXEC transaction) so the check+increment is atomic; update the code
paths referencing budget_key, redis_client, budget_limit, and preserve the
existing logging/RuntimeError behavior when the increment would exceed the
limit.
In `@tools/git_tool.py`:
- Around line 89-103: The current code in git_tool.py builds authed_url and
calls _run_subprocess to set-url (which writes the token into .git/config) and
later restores it; instead avoid persisting the token by removing the git remote
set-url calls around _run_subprocess and instead perform an authenticated push
that never writes credentials to config — e.g. use an in-memory credential
helper via GIT_ASKPASS (or GitHub CLI/credential helper) when calling
_run_subprocess for the push so the token is supplied only via environment and
not embedded in the remote URL; update the logic around the variables
authed_url, remote and branch and the _run_subprocess calls to create and use a
transient askpass helper (or invoke gh auth push) and ensure no code path writes
authed_url into the repo config.
---
Nitpick comments:
In `@architecture.md`:
- Around line 56-64: The two adjacent headings "🧱 Expert Coordination Mesh" and
"🧱 Core Components" duplicate the same emoji and overlap in content about the
Go Orchestrator and agent responsibilities; either merge the "Expert
Coordination Mesh" content into the "Core Components" section (consolidate
mentions of Go Orchestrator, Strategic Experts, Surgical Execution,
LocalFileEditTool, and AI_ORG_LOCAL_MODE) or rename one heading and pick a
distinct icon (e.g., "🧭 Expert Coordination Mesh" or "⚙️ Core Components") and
remove the redundant sentences so the Go Orchestrator, LocalFileEditTool, and
AI_ORG_LOCAL_MODE descriptions appear only once and flow logically.
In `@tools/docker_sandbox.py`:
- Around line 24-25: The current init sets self.host_workspace to
os.getenv("AI_ORG_LOCAL_WORKSPACE", os.getcwd()) which freezes in the original
CWD and ignores later changes to self.working_dir from BaseTool; change the
logic in DockerSandbox.__init__ to read AI_ORG_LOCAL_WORKSPACE with a None
default (os.getenv("AI_ORG_LOCAL_WORKSPACE", None)) and if that returns None set
self.host_workspace = self.working_dir (or delay evaluation by making
host_workspace a property that returns os.getenv(...) or self.working_dir), so
updates to self.working_dir after construction are respected; reference
self.host_workspace, self.working_dir and BaseTool when locating where to apply
the fix.
In `@tools/file_edit_tool.py`:
- Around line 62-74: The write path currently creates a tmp_path and writes
new_content but leaves the .tmp file behind if f.write or subsequent operations
raise; update the editing routine around tmp_path (the block that opens
tmp_path, calls f.write(new_content), and then os.replace(tmp_path, full_path))
to ensure the .tmp file is removed on any failure: perform the write inside a
try/except/finally (or nested try) so that on exception you call
os.remove(tmp_path) if it exists, only call os.replace if the write completed
successfully, and preserve the existing ToolResult return behavior while logging
the error; reference tmp_path, f.write(new_content), os.replace and the current
exception handling to locate where to add the cleanup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2051a8a-420e-4e69-a41c-7f93ba32b620
⛔ Files ignored due to path filters (1)
go-backend/coverage.outis excluded by!**/*.out
📒 Files selected for processing (23)
.gitignoreREADME.mdagents/agent_service.pyagents/backend_agent.pyagents/base_agent.pyagents/devops_agent.pyagents/frontend_agent.pyagents/qa_agent.pyarchitecture.mddesktop_nova.pygo-backend/cmd/gateway/main.gogo-backend/cmd/orchestrator/main.gogo-backend/cmd/ws-hub/main.gogo-backend/internal/gateway/middleware/middleware_production_test.gogo-backend/internal/orchestrator/server/grpc.gogo-backend/internal/orchestrator/server/results.gogo-backend/internal/proxy/egress.gogo-backend/internal/shared/auth/auth.gogo-backend/test_output.txtorchestrator/planner.pytools/docker_sandbox.pytools/file_edit_tool.pytools/git_tool.py
✅ Files skipped from review due to trivial changes (5)
- go-backend/internal/proxy/egress.go
- agents/frontend_agent.py
- agents/backend_agent.py
- go-backend/test_output.txt
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (5)
- go-backend/internal/shared/auth/auth.go
- go-backend/cmd/gateway/main.go
- go-backend/internal/gateway/middleware/middleware_production_test.go
- go-backend/cmd/orchestrator/main.go
- agents/devops_agent.py
|
|
||
| base_instructions = f"YOU ARE {self.ROLE} IN THE PROXIMUS COLLABORATIVE AI ORGANIZATION.\n" | ||
|
|
||
| if local_mode: | ||
| base_instructions += ( | ||
| "\n### LOCAL MASTERY MODE ACTIVE\n" | ||
| "- You are working directly on the developer's LOCAL FILESYSTEM.\n" | ||
| "- Your workspace is MOUNTED via Docker directly to the host's directory.\n" | ||
| "- PRIORITIZE surgery over replacement: Use 'file_edit' for partial updates rather than overwriting whole files.\n" | ||
| "- PRESERVE STYLE: Maintain the user's indentation and coding style exactly.\n" | ||
| ) | ||
|
|
There was a problem hiding this comment.
CI lint failures: blank lines contain trailing whitespace (ruff W293).
CI check is failing on W293 at lines 154, 156, 165 (and also 391, 465, 490, 520, 560, 610 elsewhere in this file). Strip the whitespace on those blank lines to get the build green.
🧰 Tools
🪛 GitHub Check: 🧪 Test & Lint
[failure] 165-165: ruff (W293)
agents/base_agent.py:165:1: W293 Blank line contains whitespace
help: Remove whitespace from blank line
[failure] 156-156: ruff (W293)
agents/base_agent.py:156:1: W293 Blank line contains whitespace
help: Remove whitespace from blank line
[failure] 154-154: ruff (W293)
agents/base_agent.py:154:1: W293 Blank line contains whitespace
help: Remove whitespace from blank line
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/base_agent.py` around lines 154 - 165, The failing lint warnings are
caused by trailing whitespace on blank lines in the construction of the
base_instructions string (see base_instructions, local_mode, and self.ROLE);
remove any trailing spaces at the ends of the blank lines inside the f-string
and the multi-line concatenation so blank lines are truly empty (no trailing
spaces) and re-run ruff to verify W293 is resolved; repeat this cleanup for
other similar occurrences in the file (around where base_instructions is built
and the other reported locations).
| def _calculate_actual_cost(self, model: str, usage: dict[str, int]) -> float: | ||
| """Calculate USD cost based on token counts and model pricing.""" | ||
| prompt_tokens = usage.get("prompt_tokens", 0) | ||
| completion_tokens = usage.get("completion_tokens", 0) | ||
|
|
||
| # Pricing per 1M tokens (Standard market rates as of Q1 2026) | ||
| pricing = { | ||
| # Google Gemini | ||
| "gemini-1.5-pro": {"prompt": 1.25, "completion": 3.75}, | ||
| "gemini-1.5-flash": {"prompt": 0.075, "completion": 0.30}, | ||
|
|
||
| # OpenAI | ||
| "gpt-4o": {"prompt": 2.50, "completion": 10.00}, | ||
| "gpt-4o-mini": {"prompt": 0.15, "completion": 0.60}, | ||
|
|
||
| # Anthropic | ||
| "claude-3-5-sonnet-latest": {"prompt": 3.00, "completion": 15.00}, | ||
| "claude-3-haiku-20240307": {"prompt": 0.25, "completion": 1.25}, | ||
|
|
||
| # Bedrock (Nova) | ||
| "amazon.nova-pro-v1:0": {"prompt": 0.80, "completion": 3.20}, | ||
| "amazon.nova-lite-v1:0": {"prompt": 0.06, "completion": 0.24}, | ||
| "amazon.nova-micro-v1:0": {"prompt": 0.035, "completion": 0.14}, | ||
| } | ||
|
|
||
| # Fallback pricing for unknown models (Average mid-range) | ||
| model_pricing = pricing.get(model, {"prompt": 1.0, "completion": 4.0}) | ||
|
|
||
| cost = (prompt_tokens / 1_000_000 * model_pricing["prompt"]) + \ | ||
| (completion_tokens / 1_000_000 * model_pricing["completion"]) | ||
|
|
||
| return float(f"{cost:.6f}") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What model ID strings does the Google google-genai Python SDK (v0.5.0) return in GenerateContentResponse, and what model IDs do current OpenAI and Anthropic SDKs echo back in response.model?
💡 Result:
The Google google-genai Python SDK (current versions, as v0.5.0 appears to be from the deprecated google-gemini/deprecated-generative-ai-python repository which does not have a v0.5.0 tag and latest is v0.8.1) returns GenerateContentResponse objects from generate_content. These responses do not contain a direct model ID string field like response.model. Instead: - The underlying REST API (Gemini API and Vertex AI) returns modelVersion (a version string like "0001-06-24") in GenerateContentResponse, not the model ID (1,5,7,9). - SDK documentation and code examples show access to response.text, response.parts, response.candidates, response.usage_metadata, but no response.model or similar model ID attribute (1,2,6,7,18,61,62). - For Vertex AI, the response has modelVersion as a string version identifier (5). Current OpenAI Python SDK (chat.completions.create) echoes back the input model ID string in response.model (e.g., "gpt-4o", "gpt-4o-mini") (22,24). Current Anthropic Python SDK (client.messages.create) echoes back the input model ID string in response.model (e.g., "claude-3-5-sonnet-20241022") as confirmed by standard SDK behavior and API documentation patterns.
Citations:
- 1: https://github.com/googleapis/python-genai/
- 2: https://googleapis.github.io/python-genai/
- 3: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/GenerateContentResponse
- 4: https://googleapis-python-genai-70.mintlify.app/guides/generate-content
- 5: https://googleapis-python-genai-70.mintlify.app/api/models/generate-content
- 6: https://ai.google.dev/api/generate-content
- 7: https://github.com/googleapis/python-genai/blob/main/google/genai/models.py
- 8: https://developers.openai.com/api/reference/python/resources/chat/
- 9: https://developers.openai.com/api/reference/python/resources/chat/subresources/completions/methods/create/
- 10: https://github.com/googleapis/python-genai/blob/main/google/genai/_interactions/_models.py
- 11: https://googleapis-python-genai-70.mintlify.app/concepts/types
🏁 Script executed:
# Find calls to _calculate_actual_cost to understand how model parameter is passed
rg "_calculate_actual_cost" -A 3 -B 3Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 989
🏁 Script executed:
# Also search for where model IDs are extracted from provider responses
rg "response\.model|model_id|model_name" -B 2 -A 2 | head -100Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 6909
🏁 Script executed:
# Check for any SDK imports and model response handling
rg "GenerateContentResponse|chat\.completions|messages\.create" -B 2 -A 5 | head -80Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1698
🏁 Script executed:
# Check the actual code around the cost formatting
sed -n '600,631p' agents/base_agent.pyRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1606
🏁 Script executed:
# Search for any model name normalization/transformation before the function call
rg "model_name\s*=" agents/base_agent.py -B 2 -A 2Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 324
🏁 Script executed:
# Verify round() vs float(f"...") behavior with Python
python3 << 'EOF'
cost = 0.000001234567
# Method 1: float(f"{cost:.6f}")
result1 = float(f"{cost:.6f}")
print(f"float(f\"{{cost:.6f}}\"): {result1}")
print(f"repr: {repr(result1)}")
# Method 2: round(cost, 6)
result2 = round(cost, 6)
print(f"round(cost, 6): {result2}")
print(f"repr: {repr(result2)}")
# Test with a larger number
cost2 = 0.123456789
result3 = float(f"{cost2:.6f}")
result4 = round(cost2, 6)
print(f"\nFor {cost2}:")
print(f"float(f\"{{cost:.6f}}\"): {result3}")
print(f"round(cost, 6): {result4}")
print(f"Are they equal? {result3 == result4}")
EOFRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 264
Add prefix-matching and logging for unknown model IDs in cost calculation.
The pricing dict uses unqualified model names (gpt-4o, claude-3-5-sonnet-latest), but users may configure versioned IDs or newer model families not in the table. Exact-key lookup will silently fall through to the generic fallback ({"prompt": 1.0, "completion": 4.0}), causing incorrect cost tracking and false budget alerts.
Replace the lookup with prefix-matching:
model_pricing = next((v for k, v in pricing.items() if model.startswith(k)), {"prompt": 1.0, "completion": 4.0})
Add a warning log when fallback is used so cost drift becomes observable.
Also prefer round(cost, 6) over float(f"{cost:.6f}") for clarity.
🧰 Tools
🪛 GitHub Check: 🧪 Test & Lint
[failure] 610-610: ruff (W293)
agents/base_agent.py:610:1: W293 Blank line contains whitespace
help: Remove whitespace from blank line
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/base_agent.py` around lines 600 - 631, In _calculate_actual_cost
replace the exact-key lookup of pricing with prefix-matching against the pricing
keys (e.g. use next(... if model.startswith(k) ...)) so versioned or suffixed
model IDs match their family rates, and if no match is found log a warning (use
the module/class logger) that the fallback pricing
{"prompt":1.0,"completion":4.0} was used for the given model; compute the final
cost and return round(cost, 6) instead of float(f"{cost:.6f}").
| is_passed = result.success and result.output.find("failed") == -1 | ||
|
|
||
| # Heuristic parsing of pytest output if structured report is missing | ||
| # In production, we'd use 'pytest-json-report' | ||
| output = result.output | ||
| passed_count = output.count("PASSED") | ||
| failed_count = output.count("FAILED") | ||
|
|
||
| return { | ||
| "total": passed_count + failed_count, | ||
| "passed": passed_count, | ||
| "failed": failed_count if failed_count > 0 or not result.success else 0, | ||
| "errors": 1 if not result.success else 0, | ||
| "duration_seconds": 0, # Could extract from output | ||
| "failures": [ | ||
| {"test": "sandbox_execution", "error": result.error} | ||
| ] if not result.success else [], | ||
| } | ||
| except Exception as e: | ||
| logger.error("QA Sandbox execution failed", error=str(e)) | ||
| return { | ||
| "total": 0, "passed": 0, "failed": 1, | ||
| "error": f"QA System Error: {str(e)}" | ||
| } |
There was a problem hiding this comment.
Sandbox-execution failure can still report qa_passed=True.
When result.success is False but the output contains no "FAILED" substring (e.g., pytest import error, collection failure, timeout, non-zero exit before any tests ran), the expression
failed_count if failed_count > 0 or not result.success else 0evaluates to failed_count, which is 0. So failed=0, errors=1, and then at line 106 qa_passed = test_results["failed"] == 0 and security_results["high_severity"] == 0 evaluates to True — a broken build gets shipped as QA-approved. Similarly, passed_count from output.count("PASSED") can be non-zero in partial-failure output, inflating passed/total.
🩹 Suggested fix
- return {
- "total": passed_count + failed_count,
- "passed": passed_count,
- "failed": failed_count if failed_count > 0 or not result.success else 0,
- "errors": 1 if not result.success else 0,
+ effective_failed = failed_count if result.success else max(failed_count, 1)
+ return {
+ "total": passed_count + effective_failed,
+ "passed": passed_count if result.success else 0,
+ "failed": effective_failed,
+ "errors": 0 if result.success else 1,
"duration_seconds": 0,
- "failures": [
- {"test": "sandbox_execution", "error": result.error}
- ] if not result.success else [],
+ "failures": [] if result.success else [
+ {"test": "sandbox_execution", "error": result.error}
+ ],
}Also consider adding --json-report (pytest-json-report) to the command and parsing structured results instead of substring-counting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/qa_agent.py` around lines 437 - 460, The heuristic parsing of pytest
output using result, output, passed_count and failed_count can misreport
qa_passed when result.success is False but no "FAILED" substrings are found;
update the return logic so that when result.success is False you force
passed_count = 0, failed_count = max(1, failed_count), total = passed_count +
failed_count, errors = 1, and include the failure detail in failures (e.g., keep
{"test": "sandbox_execution", "error": result.error}); keep the substring counts
only when result.success is True. Also consider switching to pytest-json-report
(add --json-report and parse structured output) for a more robust fix referenced
in the code that computes qa_passed from test_results and security_results.
| async def _run_security_scan( | ||
| self, context: Any, sandbox_tool: Any | ||
| ) -> dict[str, Any]: | ||
| """Attempt to execute Bandit via Docker Sandbox, falling back to mock.""" | ||
| if context and context.project_id and sandbox_tool: | ||
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | ||
| logger.info("Executing bandit safely in sandbox overlay", work_dir=work_dir) | ||
| try: | ||
| result = await sandbox_tool.run( | ||
| "bandit -r . -f json -q || true", work_dir | ||
| ) | ||
| try: | ||
| import json | ||
|
|
||
| bandit_data = json.loads(result.get("stdout", "{}")) | ||
| metrics = bandit_data.get("metrics", {}).get("_totals", {}) | ||
| return { | ||
| "tool": "bandit", | ||
| "files_scanned": metrics.get("loc", 10), | ||
| "high_severity": metrics.get("SEVERITY.HIGH", 0), | ||
| "medium_severity": metrics.get("SEVERITY.MEDIUM", 0), | ||
| "low_severity": metrics.get("SEVERITY.LOW", 0), | ||
| "issues": bandit_data.get("results", []), | ||
| } | ||
| except json.JSONDecodeError: | ||
| pass | ||
| except Exception as e: | ||
| logger.warning( | ||
| "Sandbox security scan failed, falling back to simulation", | ||
| error=str(e), | ||
| ) | ||
| """Execute Bandit security scan via Docker Sandbox.""" | ||
| if not sandbox_tool or not context: | ||
| return {"high_severity": 0, "medium_severity": 0, "low_severity": 0, "issues": []} | ||
|
|
||
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | ||
| logger.info("Executing bandit safely in sandbox", work_dir=work_dir) | ||
| try: | ||
| result = await sandbox_tool.run( | ||
| action="execute", | ||
| cmd="bandit -r . -f json -q || true", | ||
| work_dir=work_dir | ||
| ) | ||
| import json | ||
|
|
||
| # Simulate baseline | ||
| return { | ||
| "tool": "bandit", | ||
| "files_scanned": 8, | ||
| "high_severity": 0, | ||
| "medium_severity": 0, | ||
| "low_severity": 1, | ||
| "issues": [ | ||
| { | ||
| "severity": "LOW", | ||
| "desc": "Consider using secrets module for token generation", | ||
| try: | ||
| bandit_data = json.loads(result.output if result.success else "{}") | ||
| metrics = bandit_data.get("metrics", {}).get("_totals", {}) | ||
| return { | ||
| "tool": "bandit", | ||
| "files_scanned": metrics.get("loc", 0), | ||
| "high_severity": metrics.get("SEVERITY.HIGH", 0), | ||
| "medium_severity": metrics.get("SEVERITY.MEDIUM", 0), | ||
| "low_severity": metrics.get("SEVERITY.LOW", 0), | ||
| "issues": bandit_data.get("results", []), | ||
| "raw_output": result.output[:1000] if not result.success else "" | ||
| } | ||
| ], | ||
| } | ||
| except json.JSONDecodeError: | ||
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": "Invalid bandit output"} | ||
| except Exception as e: | ||
| logger.warning("Sandbox security scan failed", error=str(e)) | ||
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": str(e)} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check DockerSandboxTool's default image and whether bandit is ever installed in it.
fd -t f 'docker_sandbox.py'
rg -nP -C5 'class\s+DockerSandboxTool\b|def\s+run\b|default.*image|image\s*=' -g '**/docker_sandbox.py'
rg -nP -C2 '\bbandit\b' -g '!**/tests/**'Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 6478
🏁 Script executed:
sed -n '420,450p' agents/qa_agent.pyRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1472
_run_security_scan will fail silently because bandit is not installed in the default container image.
Both _run_test_suite and _run_security_scan use the default image="python:3.11-slim", but that image does not include bandit. When _run_security_scan executes bandit -r . -f json -q || true, the command fails (bandit not found), result.output is empty, json.loads("{}") returns an empty object, and every project incorrectly reports zero security issues.
Since bandit is listed in requirements.txt as a host dependency, not a container dependency, install it within the container before running the scan:
Fix
result = await sandbox_tool.run(
action="execute",
- cmd="bandit -r . -f json -q || true",
+ cmd="pip install --quiet bandit && bandit -r . -f json -q || true",
work_dir=work_dir
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def _run_security_scan( | |
| self, context: Any, sandbox_tool: Any | |
| ) -> dict[str, Any]: | |
| """Attempt to execute Bandit via Docker Sandbox, falling back to mock.""" | |
| if context and context.project_id and sandbox_tool: | |
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | |
| logger.info("Executing bandit safely in sandbox overlay", work_dir=work_dir) | |
| try: | |
| result = await sandbox_tool.run( | |
| "bandit -r . -f json -q || true", work_dir | |
| ) | |
| try: | |
| import json | |
| bandit_data = json.loads(result.get("stdout", "{}")) | |
| metrics = bandit_data.get("metrics", {}).get("_totals", {}) | |
| return { | |
| "tool": "bandit", | |
| "files_scanned": metrics.get("loc", 10), | |
| "high_severity": metrics.get("SEVERITY.HIGH", 0), | |
| "medium_severity": metrics.get("SEVERITY.MEDIUM", 0), | |
| "low_severity": metrics.get("SEVERITY.LOW", 0), | |
| "issues": bandit_data.get("results", []), | |
| } | |
| except json.JSONDecodeError: | |
| pass | |
| except Exception as e: | |
| logger.warning( | |
| "Sandbox security scan failed, falling back to simulation", | |
| error=str(e), | |
| ) | |
| """Execute Bandit security scan via Docker Sandbox.""" | |
| if not sandbox_tool or not context: | |
| return {"high_severity": 0, "medium_severity": 0, "low_severity": 0, "issues": []} | |
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | |
| logger.info("Executing bandit safely in sandbox", work_dir=work_dir) | |
| try: | |
| result = await sandbox_tool.run( | |
| action="execute", | |
| cmd="bandit -r . -f json -q || true", | |
| work_dir=work_dir | |
| ) | |
| import json | |
| # Simulate baseline | |
| return { | |
| "tool": "bandit", | |
| "files_scanned": 8, | |
| "high_severity": 0, | |
| "medium_severity": 0, | |
| "low_severity": 1, | |
| "issues": [ | |
| { | |
| "severity": "LOW", | |
| "desc": "Consider using secrets module for token generation", | |
| try: | |
| bandit_data = json.loads(result.output if result.success else "{}") | |
| metrics = bandit_data.get("metrics", {}).get("_totals", {}) | |
| return { | |
| "tool": "bandit", | |
| "files_scanned": metrics.get("loc", 0), | |
| "high_severity": metrics.get("SEVERITY.HIGH", 0), | |
| "medium_severity": metrics.get("SEVERITY.MEDIUM", 0), | |
| "low_severity": metrics.get("SEVERITY.LOW", 0), | |
| "issues": bandit_data.get("results", []), | |
| "raw_output": result.output[:1000] if not result.success else "" | |
| } | |
| ], | |
| } | |
| except json.JSONDecodeError: | |
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": "Invalid bandit output"} | |
| except Exception as e: | |
| logger.warning("Sandbox security scan failed", error=str(e)) | |
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": str(e)} | |
| async def _run_security_scan( | |
| self, context: Any, sandbox_tool: Any | |
| ) -> dict[str, Any]: | |
| """Execute Bandit security scan via Docker Sandbox.""" | |
| if not sandbox_tool or not context: | |
| return {"high_severity": 0, "medium_severity": 0, "low_severity": 0, "issues": []} | |
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | |
| logger.info("Executing bandit safely in sandbox", work_dir=work_dir) | |
| try: | |
| result = await sandbox_tool.run( | |
| action="execute", | |
| cmd="pip install --quiet bandit && bandit -r . -f json -q || true", | |
| work_dir=work_dir | |
| ) | |
| import json | |
| try: | |
| bandit_data = json.loads(result.output if result.success else "{}") | |
| metrics = bandit_data.get("metrics", {}).get("_totals", {}) | |
| return { | |
| "tool": "bandit", | |
| "files_scanned": metrics.get("loc", 0), | |
| "high_severity": metrics.get("SEVERITY.HIGH", 0), | |
| "medium_severity": metrics.get("SEVERITY.MEDIUM", 0), | |
| "low_severity": metrics.get("SEVERITY.LOW", 0), | |
| "issues": bandit_data.get("results", []), | |
| "raw_output": result.output[:1000] if not result.success else "" | |
| } | |
| except json.JSONDecodeError: | |
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": "Invalid bandit output"} | |
| except Exception as e: | |
| logger.warning("Sandbox security scan failed", error=str(e)) | |
| return {"high_severity": 0, "medium_severity": 0, "issues": [], "error": str(e)} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/qa_agent.py` around lines 462 - 495, The _run_security_scan currently
runs "bandit" inside the sandbox image (via sandbox_tool.run) but the default
image ("python:3.11-slim") doesn't include bandit, so the command silently
fails; fix by ensuring bandit is installed in the container before invoking
it—either change the sandbox invocation in _run_security_scan to use an image
that already has bandit, or prepend an install step to the command passed to
sandbox_tool.run (e.g., run "pip install bandit" before "bandit -r . -f json -q
|| true"); update the cmd passed to sandbox_tool.run or the image parameter used
by sandbox_tool.run in _run_security_scan so result.output contains bandit JSON
output when present.
| async def _run_coverage_analysis( | ||
| self, context: Any, sandbox_tool: Any | ||
| ) -> dict[str, Any]: | ||
| """Run pytest-cov and analyze results.""" | ||
| if not sandbox_tool or not context: | ||
| return {"line_coverage_pct": 0, "passed": False, "error": "No sandbox"} | ||
|
|
||
| work_dir = f"{context.artifacts.output_dir}/{context.project_id}" | ||
| try: | ||
| # Attempt to run coverage | ||
| result = await sandbox_tool.run( | ||
| action="execute", | ||
| cmd="pytest --cov=backend --cov-report=term-missing", | ||
| work_dir=work_dir | ||
| ) | ||
| # In a real environment, we'd parse the .coverage file or JSON report | ||
| # For now, we report success if the tool ran, but don't fake the percentage. | ||
| return { | ||
| "line_coverage_pct": 0.0 if not result.success else 50.0, # Placeholder for real parsing | ||
| "passed": result.success, | ||
| "tools_used": ["pytest-cov"], | ||
| "sandbox_id": context.project_id | ||
| } | ||
| except Exception: | ||
| return {"line_coverage_pct": 0, "passed": False} |
There was a problem hiding this comment.
Hardcoded 50% coverage misleads downstream agents and dashboards.
line_coverage_pct: 50.0 is emitted whenever the sandbox exits 0, regardless of the real coverage — and it's then logged (line 126), persisted via decision_log.log (line 117), and could gate downstream decisions. The comment concedes it's a placeholder, but shipping fabricated coverage numbers is strictly worse than reporting None/0 with a flag indicating the report was not parsed. Suggest either (a) parsing term-missing's trailing "TOTAL … NN%" line, (b) emitting pytest --cov-report=json:coverage.json and parsing totals.percent_covered, or (c) returning line_coverage_pct=None and a clear "parsed": False until real parsing lands.
Additionally, the broad except Exception at line 520 swallows errors without logging — please at least logger.warning(...) like the sibling methods do.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/qa_agent.py` around lines 497 - 521, The _run_coverage_analysis
function currently emits a hardcoded 50.0% when sandbox_tool.run succeeds and
swallows exceptions; change it to produce truthful parsed metadata and log
errors: instead of returning line_coverage_pct: 50.0 in the success path, run
pytest with a machine-readable report (e.g., "pytest --cov=backend
--cov-report=json:coverage.json"), then open and parse the JSON's
totals.percent_covered into line_coverage_pct and set parsed: True; if parsing
fails set line_coverage_pct: None (or 0) and parsed: False (include tools_used
and sandbox_id as before). Also replace the bare except Exception: with an
except Exception as e: that calls logger.warning(...) including the exception
and context (function _run_coverage_analysis and context.project_id) and returns
{"line_coverage_pct": None, "passed": False, "parsed": False}. Use the existing
symbols _run_coverage_analysis, sandbox_tool.run, and context.project_id to
locate and modify the code.
| // Hardening: Prevent panics from crashing the consumer group | ||
| defer func() { | ||
| if r := recover(); r != nil { | ||
| log.Error("panic recovered in Resulthandler", | ||
| zap.Any("panic", r), | ||
| zap.Binary("payload", msg.Value), | ||
| ) | ||
| err = fmt.Errorf("panic in ResultHandler: %v", r) | ||
| } | ||
| }() |
There was a problem hiding this comment.
Nit: typo "Resulthandler" and zap.Binary vs zap.ByteString.
- "Resulthandler" → "ResultHandler" in the panic log message (line 57).
msg.Valueis a JSON payload;zap.ByteString("payload", msg.Value)(already used at line 67) renders it as text in the encoded log, which is far more useful during incident triage thanzap.Binary's base64.
- defer func() {
- if r := recover(); r != nil {
- log.Error("panic recovered in Resulthandler",
- zap.Any("panic", r),
- zap.Binary("payload", msg.Value),
- )
- err = fmt.Errorf("panic in ResultHandler: %v", r)
- }
- }()
+ defer func() {
+ if r := recover(); r != nil {
+ log.Error("panic recovered in ResultHandler",
+ zap.Any("panic", r),
+ zap.ByteString("payload", msg.Value),
+ zap.Stack("stack"),
+ )
+ err = fmt.Errorf("panic in ResultHandler: %v", r)
+ }
+ }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Hardening: Prevent panics from crashing the consumer group | |
| defer func() { | |
| if r := recover(); r != nil { | |
| log.Error("panic recovered in Resulthandler", | |
| zap.Any("panic", r), | |
| zap.Binary("payload", msg.Value), | |
| ) | |
| err = fmt.Errorf("panic in ResultHandler: %v", r) | |
| } | |
| }() | |
| // Hardening: Prevent panics from crashing the consumer group | |
| defer func() { | |
| if r := recover(); r != nil { | |
| log.Error("panic recovered in ResultHandler", | |
| zap.Any("panic", r), | |
| zap.ByteString("payload", msg.Value), | |
| zap.Stack("stack"), | |
| ) | |
| err = fmt.Errorf("panic in ResultHandler: %v", r) | |
| } | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-backend/internal/orchestrator/server/results.go` around lines 54 - 63,
Update the deferred panic handler in ResultHandler: fix the log message text to
"panic recovered in ResultHandler" (replace the typo "Resulthandler") and change
the log field from zap.Binary("payload", msg.Value) to zap.ByteString("payload",
msg.Value) so the JSON payload is rendered as text; keep the existing
zap.Any("panic", r) and the err = fmt.Errorf(...) behavior intact.
| if (res.AgentRole == "Engineer_Backend" || res.AgentRole == "Engineer_Frontend") && status == "done" { | ||
| // Efficiency: Check if all implementation tasks are complete before moving to QA | ||
| var incompleteCount int | ||
| h.db.QueryRow(ctx, "SELECT count(*) FROM tasks WHERE project_id = $1 AND name LIKE '%Implementation%' AND status != 'done'", res.ProjectID).Scan(&incompleteCount) | ||
|
|
||
| if incompleteCount == 0 { | ||
| log.Info("Implementation phase complete, moving to QA", zap.String("project_id", res.ProjectID)) | ||
| return h.dispatchNext(ctx, res, "QA", "System Integration Testing", "test") | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm there's no existing uniqueness constraint preventing duplicate QA tasks per project.
rg -nP -C2 'CREATE\s+(UNIQUE\s+)?INDEX|UNIQUE\s*\(' go-backend/migrations
rg -nP -C2 'agent_role' go-backend/migrationsRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 7593
🏁 Script executed:
# Read the specific lines in question
cat -n go-backend/internal/orchestrator/server/results.go | sed -n '135,150p'Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 966
🏁 Script executed:
# Check for any error handling patterns in the same file
rg -n "QueryRow.*Scan" go-backend/internal/orchestrator/server/results.go -A2Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 303
🏁 Script executed:
# Verify the dispatchNext function and QA task creation
rg -n "dispatchNext|QA" go-backend/internal/orchestrator/server/results.go -B1 -A1Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 1530
🏁 Script executed:
# Check if there are BEGIN/transaction patterns elsewhere in the codebase for reference
rg -n "db.Begin" go-backend --type go -A3Repository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 78
Critical: unchecked Scan error + race condition can dispatch QA twice.
Two confirmed bugs:
- Unchecked
Scanerror (line 140). If the query fails,incompleteCountstays0and QA is immediately dispatched before either engineer completes. TheScan(...)error is silently ignored. - TOCTOU race condition (lines 131–147). Backend and Frontend engineers run in parallel. If both complete within the same window where both see
count == 0, both handlers calldispatchNext(...)and insert duplicate QA tasks. No database-level guard prevents this.
🛡️ Suggested fix
- if (res.AgentRole == "Engineer_Backend" || res.AgentRole == "Engineer_Frontend") && status == "done" {
- // Efficiency: Check if all implementation tasks are complete before moving to QA
- var incompleteCount int
- h.db.QueryRow(ctx, "SELECT count(*) FROM tasks WHERE project_id = $1 AND name LIKE '%Implementation%' AND status != 'done'", res.ProjectID).Scan(&incompleteCount)
-
- if incompleteCount == 0 {
- log.Info("Implementation phase complete, moving to QA", zap.String("project_id", res.ProjectID))
- return h.dispatchNext(ctx, res, "QA", "System Integration Testing", "test")
- }
- return nil
- }
+ if (res.AgentRole == "Engineer_Backend" || res.AgentRole == "Engineer_Frontend") && status == "done" {
+ // Atomically check-and-claim the QA dispatch slot to prevent a race
+ // between concurrent engineer completions dispatching QA twice.
+ tx, err := h.db.Begin(ctx)
+ if err != nil {
+ return err
+ }
+ defer tx.Rollback(ctx) //nolint:errcheck
+
+ var incompleteCount int
+ if err := tx.QueryRow(ctx,
+ `SELECT count(*) FROM tasks
+ WHERE project_id = $1 AND name LIKE '%Implementation%' AND status != 'done'
+ FOR UPDATE`, res.ProjectID).Scan(&incompleteCount); err != nil {
+ log.Error("failed to count implementation tasks", zap.Error(err))
+ return err
+ }
+ if incompleteCount != 0 {
+ return tx.Commit(ctx)
+ }
+ // Guard: only dispatch QA if it hasn't already been inserted for this project.
+ var qaExists bool
+ if err := tx.QueryRow(ctx,
+ `SELECT EXISTS(SELECT 1 FROM tasks WHERE project_id = $1 AND agent_role = 'QA')`,
+ res.ProjectID).Scan(&qaExists); err != nil {
+ return err
+ }
+ if err := tx.Commit(ctx); err != nil {
+ return err
+ }
+ if qaExists {
+ return nil
+ }
+ log.Info("Implementation phase complete, moving to QA", zap.String("project_id", res.ProjectID))
+ return h.dispatchNext(ctx, res, "QA", "System Integration Testing", "test")
+ }A simpler alternative: add a partial unique index on tasks(project_id) where agent_role='QA' and catch the unique-violation error on dispatch.
🧰 Tools
🪛 GitHub Actions: Go CI
[error] 140-140: golangci-lint (errcheck): Error return value of (github.com/jackc/pgx/v5.Row).Scan is not checked.
🪛 GitHub Check: Build and Test Go
[failure] 140-140:
Error return value of (github.com/jackc/pgx/v5.Row).Scan is not checked (errcheck)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-backend/internal/orchestrator/server/results.go` around lines 137 - 147,
The code ignores the error from h.db.QueryRow(...).Scan(&incompleteCount) and
has a TOCTOU race where concurrent Engineer_Backend/Engineer_Frontend handlers
can both call dispatchNext and create duplicate QA tasks; fix by checking and
handling the Scan error (capture the error, log via log.Error or zap and return
the error to avoid treating failure as zero) and make the QA dispatch atomic:
either perform the count+insert inside a DB transaction with appropriate locking
(e.g., SELECT ... FOR UPDATE on a project row or tasks aggregation) or add a
partial unique index on tasks(project_id) WHERE agent_role='QA' and update
dispatchNext to catch and ignore unique-violation errors (refer to dispatchNext,
h.db.QueryRow, Scan, incompleteCount) so duplicate QA tasks cannot be created.
| class LocalFileEditTool(BaseTool): | ||
| NAME = "file_edit" | ||
| DESCRIPTION = "Performs exact string replacements in local files. Requires 'old_string' to match exactly (including indentation) and be unique." | ||
|
|
||
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify where LocalFileEditTool is instantiated and whether any caller passes working_dir.
rg -nP -C2 '\bLocalFileEditTool\s*\('
# Also check how other tools (e.g., DockerSandboxTool in devops_agent.py) plumb working_dir
rg -nP -C2 '\.working_dir\s*=' --type=pyRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 2510
Ensure LocalFileEditTool respects a project-scoped working directory.
At agents/base_agent.py:106, LocalFileEditTool() is instantiated with no working_dir argument, causing edits to fall back to os.getcwd(). If agents are expected to operate in a project-scoped directory (e.g., ./output/<project_id>), file edits will silently land in the server's CWD instead. Either pass working_dir at instantiation (as seen in other tools like DockerSandboxTool in agents/devops_agent.py:183+) or resolve it from the agent's execution context at run() time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/file_edit_tool.py` around lines 13 - 18, LocalFileEditTool is being
instantiated without a project-scoped working directory so it falls back to
os.getcwd(); locate where LocalFileEditTool() is created in the agent
initialization (the BaseAgent/constructor code that builds tools) and fix by
either passing the agent's project-scoped working_dir into the LocalFileEditTool
constructor (same approach used for DockerSandboxTool) or modify
LocalFileEditTool.run() to derive working_dir from the agent execution
context/assigned project path at runtime so edits are applied under the
project-scoped directory rather than the server CWD.
| async def run(self, file_path: str, old_string: str, new_string: str, replace_all: bool = False) -> ToolResult: | ||
| """ | ||
| Args: | ||
| file_path: Path to the file relative to working_dir | ||
| old_string: The exact string to find (must be unique) | ||
| new_string: The string to replace it with | ||
| replace_all: If True, replace all occurrences. If False, fail if multiple matches exist. | ||
| """ | ||
| full_path = os.path.join(self.working_dir, file_path) | ||
|
|
||
| if not os.path.exists(full_path): | ||
| return ToolResult(success=False, output="", error=f"File not found: {file_path}") |
There was a problem hiding this comment.
Path traversal: validate that the resolved path stays inside working_dir.
os.path.join(self.working_dir, file_path) provides zero containment: if an agent (or an LLM-produced argument) passes an absolute path (e.g., /etc/passwd) or a ..-laden relative path, os.path.join will happily return a location outside working_dir. Given LocalFileEditTool is wired directly into every agent's toolset (agents/base_agent.py:105-107) and working_dir defaults to os.getcwd() (tools/base_tool.py:71-74), an LLM hallucination is enough to overwrite arbitrary host files. Symlinks inside the tree are also not resolved.
🛡️ Suggested containment check
- full_path = os.path.join(self.working_dir, file_path)
-
- if not os.path.exists(full_path):
- return ToolResult(success=False, output="", error=f"File not found: {file_path}")
+ base = os.path.realpath(self.working_dir)
+ full_path = os.path.realpath(os.path.join(base, file_path))
+ # Enforce that the resolved target is inside working_dir (prevents absolute paths,
+ # `..` traversal, and symlink escapes).
+ if os.path.commonpath([full_path, base]) != base:
+ return ToolResult(success=False, output="",
+ error=f"Refusing to edit path outside working_dir: {file_path}")
+ if not os.path.isfile(full_path):
+ return ToolResult(success=False, output="", error=f"File not found: {file_path}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def run(self, file_path: str, old_string: str, new_string: str, replace_all: bool = False) -> ToolResult: | |
| """ | |
| Args: | |
| file_path: Path to the file relative to working_dir | |
| old_string: The exact string to find (must be unique) | |
| new_string: The string to replace it with | |
| replace_all: If True, replace all occurrences. If False, fail if multiple matches exist. | |
| """ | |
| full_path = os.path.join(self.working_dir, file_path) | |
| if not os.path.exists(full_path): | |
| return ToolResult(success=False, output="", error=f"File not found: {file_path}") | |
| async def run(self, file_path: str, old_string: str, new_string: str, replace_all: bool = False) -> ToolResult: | |
| """ | |
| Args: | |
| file_path: Path to the file relative to working_dir | |
| old_string: The exact string to find (must be unique) | |
| new_string: The string to replace it with | |
| replace_all: If True, replace all occurrences. If False, fail if multiple matches exist. | |
| """ | |
| base = os.path.realpath(self.working_dir) | |
| full_path = os.path.realpath(os.path.join(base, file_path)) | |
| # Enforce that the resolved target is inside working_dir (prevents absolute paths, | |
| # `..` traversal, and symlink escapes). | |
| if os.path.commonpath([full_path, base]) != base: | |
| return ToolResult(success=False, output="", | |
| error=f"Refusing to edit path outside working_dir: {file_path}") | |
| if not os.path.isfile(full_path): | |
| return ToolResult(success=False, output="", error=f"File not found: {file_path}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/file_edit_tool.py` around lines 20 - 31, The run method in
tools/file_edit_tool.py must validate that the resolved file path stays inside
self.working_dir to prevent path traversal: after computing full_path (from
file_path and self.working_dir) resolve both to their absolute realpaths (to
follow symlinks) and ensure the file realpath startswith the working_dir
realpath (or use os.path.commonpath) and return ToolResult(success=False,
error=...) if it does not; apply this check in LocalFileEditTool.run before any
os.path.exists/os operations and reject absolute paths or paths with .. that
escape the working_dir.
There was a problem hiding this comment.
Actionable comments posted: 15
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
agents/base_agent.py (3)
378-383:⚠️ Potential issue | 🟡 MinorClear usage metadata on semantic-cache hits.
A cached response performs no provider call, but
_last_usageand_last_costkeep their previous values. Downstream cost ledgers can attribute stale usage to the cached call.🛠️ Proposed fix
if cached_response: logger.info( "⚡ Served LLM response directly from Semantic Cache!", agent=self.ROLE, ) + self._last_usage = {"prompt_tokens": 0, "completion_tokens": 0} + self._last_cost = 0.0 return cached_response🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 378 - 383, Cached semantic-cache hits return cached_response but leave stale metrics in self._last_usage and self._last_cost; update the cached_response branch (the code that checks cached_response and returns it) to clear usage/cost metadata before returning by setting self._last_usage = None and self._last_cost = None (or appropriate zero/empty values your codebase uses) so downstream cost ledgers don't attribute previous provider usage to a cache hit.
368-370:⚠️ Potential issue | 🔴 CriticalKeep
call_llm()returning a string in mock mode.
_mock_llm_response()now returns(text, usage), butcall_llm()is annotated and consumed as returningstr. The no-client path currently leaks the tuple to callers.🐛 Proposed fix
""" if self.llm_client is None: - return self._mock_llm_response(messages) + text, usage = self._mock_llm_response(messages) + self._last_usage = usage + self._last_cost = 0.0 + return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 368 - 370, call_llm currently returns whatever _mock_llm_response returns when self.llm_client is None, but _mock_llm_response now returns a (text, usage) tuple while call_llm is annotated to return str; adjust the no-client branch in call_llm to unpack the tuple from _mock_llm_response and return only the text (e.g., text, _ = self._mock_llm_response(messages) or return self._mock_llm_response(messages)[0]) so callers of call_llm continue to receive a string while preserving usage info internally.
197-197:⚠️ Potential issue | 🟠 MajorUse
context.project_idfor budget isolation.
AgentExecutionContextcarriesproject_id; iftask.project_idis absent, Line 197 falls back to"demo"and multiple projects share one Redis budget key.🛠️ Proposed fix
- self._current_project_id = getattr(task, "project_id", "demo") + self._current_project_id = ( + getattr(context, "project_id", None) + or getattr(task, "project_id", None) + or "demo" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` at line 197, The current assignment self._current_project_id = getattr(task, "project_id", "demo") uses a hardcoded "demo" fallback causing cross-project Redis budget collisions; change it to prefer AgentExecutionContext's project_id when task.project_id is absent (e.g., use context.project_id as the fallback) so self._current_project_id is set from task.project_id if present otherwise from the AgentExecutionContext.project_id, preserving unique Redis budget keys per project.
♻️ Duplicate comments (6)
scripts/check_services.sh (1)
58-58:⚠️ Potential issue | 🔴 CriticalDerive container names from Docker Compose instead of a hardcoded prefix.
Line 58 still hardcodes
autonomousmulti-agentaiorganization-, so container lookups fail when the Compose project name differs from that exact string. Preferdocker compose ps -q <service>or compute the project prefix fromCOMPOSE_PROJECT_NAME.Robust direction
-PREFIX="autonomousmulti-agentaiorganization-" +# Prefer resolving by Compose service name in `check_container`: +# container_id=$(docker compose ps -q "$service" 2>/dev/null || true)Then call
check_container "postgres" "PostgreSQL"instead of constructing container names manually.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_services.sh` at line 58, The script currently hardcodes PREFIX="autonomousmulti-agentaiorganization-" which breaks container lookups; remove that hardcoded PREFIX and instead resolve container IDs via docker compose (e.g., use docker compose ps -q <service>) or derive the project prefix from COMPOSE_PROJECT_NAME if needed, then update any calls that construct container names to call check_container with service names (for example call check_container "postgres" "PostgreSQL" rather than building PREFIX+service); update references to PREFIX and any uses in functions like check_container to use the docker compose lookup or the derived COMPOSE_PROJECT_NAME value.agents/base_agent.py (4)
389-413:⚠️ Potential issue | 🟠 MajorMake budget check and increment atomic.
The current Redis
GET→ provider call →INCRBYFLOATflow is race-prone; concurrent calls can all pass the gate and then push a project over budget.Use a Redis Lua script or transaction for the final
used + cost_usd <= limitcheck and increment, keyed bybudget:{project_id}:used.
624-650:⚠️ Potential issue | 🟠 MajorAvoid silent fallback pricing for versioned model IDs.
Exact-key lookup misses suffixed/versioned model names and silently falls back to average pricing, which can distort budget enforcement. Prefer prefix/family matching and log when fallback pricing is used.
🛠️ Proposed direction
- # Fallback pricing for unknown models (Average mid-range) - model_pricing = pricing.get(model, {"prompt": 1.0, "completion": 4.0}) + model_pricing = next( + (value for key, value in pricing.items() if model.startswith(key)), + None, + ) + if model_pricing is None: + logger.warning("Using fallback LLM pricing", model=model) + model_pricing = {"prompt": 1.0, "completion": 4.0} - cost = (prompt_tokens / 1_000_000 * model_pricing["prompt"]) + \ - (completion_tokens / 1_000_000 * model_pricing["completion"]) + cost = ( + prompt_tokens / 1_000_000 * model_pricing["prompt"] + + completion_tokens / 1_000_000 * model_pricing["completion"] + ) - return float(f"{cost:.6f}") + return round(cost, 6)Please verify current provider model IDs and pricing before locking these rates:
Current OpenAI, Anthropic, Google Gemini, and Amazon Bedrock Nova model ID formats and token prices as of April 2026🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 624 - 650, The current exact-key lookup against the pricing dict (pricing, model_pricing, model) silently falls back to average rates for versioned/suffixed model IDs; update the lookup logic in agents/base_agent.py so that instead of only pricing.get(model, ...), you first attempt an exact match, then attempt family/prefix matches (e.g., for keys like "gpt-4o" match model.startswith(key) or normalize by splitting on ":" or "-" for Bedrock/version suffixes) and select the most specific match; if no prefix match is found, log a warning including the provided model id and the fallback rates before using the default, and keep the existing cost calculation (cost, prompt_tokens, completion_tokens) unchanged. Ensure you reference and update the pricing lookup that produces model_pricing and add the logging call where fallback occurs.
106-107:⚠️ Potential issue | 🟠 MajorPass a project-scoped workspace into
LocalFileEditTool.Constructing
LocalFileEditTool()here leaves it on the default working directory, so agent edits can land in the server CWD rather than the active project workspace.Consider registering this tool after
execute_task()receivescontext, or make the tool resolvecontext.project_id/workspace at run time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 106 - 107, The LocalFileEditTool is being instantiated without project context so edits may occur in the server CWD; change registration so the tool receives the active workspace: either defer creating/registering LocalFileEditTool until execute_task(context) and pass context.project_id (or context.workspace) into its constructor, or modify LocalFileEditTool.run to resolve context.project_id at runtime (e.g., accept context or project_id and compute workspace before file operations). Update the self.tools["file_edit"] assignment to bind a project-aware callable (e.g., a lambda that calls LocalFileEditTool(project_id).run or LocalFileEditTool().run(context)) so edits always target the project-scoped workspace.
173-184:⚠️ Potential issue | 🟡 MinorRemove trailing whitespace reported by Ruff.
CI is failing W293 on these blank lines. Strip the trailing spaces to unblock lint.
Also applies to: 410-410, 484-484, 509-509, 539-539, 579-579
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/base_agent.py` around lines 173 - 184, The multiline string assigned to base_instructions in agents/base_agent.py contains trailing whitespace on several blank lines (causing Ruff W293); remove the trailing spaces from those blank/empty lines inside the string literal (e.g., in the LOCAL MASTERY MODE block appended to base_instructions) and the other similar string blocks referenced (around the other reported locations) so the string lines end with no extra spaces; ensure any concatenated instruction blocks (the base_instructions variable and its appended local_mode block) contain no trailing whitespace.tools/file_edit_tool.py (1)
51-56:⚠️ Potential issue | 🔴 CriticalValidate resolved paths stay inside
working_dir.
os.path.join()does not contain absolute paths,.., or symlink escapes. Reject targets whose real path leaves the workspace before any filesystem access.🛡️ Proposed containment check
- full_path = os.path.join(self.working_dir, file_path) + base_path = os.path.realpath(self.working_dir) + full_path = os.path.realpath(os.path.join(base_path, file_path)) + + if os.path.commonpath([base_path, full_path]) != base_path: + return ToolResult( + success=False, + output="", + error=f"Refusing to edit path outside working_dir: {file_path}", + ) - if not os.path.exists(full_path): + if not os.path.isfile(full_path): return ToolResult( success=False, output="", error=f"File not found: {file_path}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/file_edit_tool.py` around lines 51 - 56, The code currently builds full_path with os.path.join(self.working_dir, file_path) but does not ensure the resolved path stays inside the workspace; compute the canonical real paths for both self.working_dir and the joined path (using os.path.realpath or pathlib.Path(...).resolve()), then verify containment (e.g. with os.path.commonpath or Path.commonpath) and return a ToolResult failure if the resolved path escapes the working_dir (rejecting paths with .., absolute inputs, or symlink traversal) before any file access; keep using the existing ToolResult error message pattern and the variables full_path, self.working_dir, and file_path to find where to add this check.
🟡 Minor comments (13)
agents/prompts/finance_analysis.yaml-12-12 (1)
12-12:⚠️ Potential issue | 🟡 MinorAlign the budget alert thresholds.
Line 12 says alert at 80% and halt at 95%, but lines 55-58 classify WARNING at 60-80 and CRITICAL above 80. Pick one policy so generated reports do not contradict themselves.
Suggested threshold alignment
- - Alert at 80% budget utilization, halt non-critical at 95% + - Warn at 80% budget utilization, halt non-critical work at 95% @@ - - INFO: utilization < 60% - - WARNING: utilization 60-80% - - CRITICAL: utilization > 80% + - INFO: utilization < 80% + - WARNING: utilization >= 80% and < 95% + - CRITICAL: utilization >= 95%Also applies to: 55-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/finance_analysis.yaml` at line 12, The prompt YAML contains conflicting budget thresholds: the summary line "Alert at 80% budget utilization, halt non-critical at 95%" conflicts with the classification block that defines WARNING at 60-80 and CRITICAL above 80; pick and apply one consistent policy across the file by either (A) changing the summary line to match the classification (e.g., "Alert at 60% and escalate at 80%") or (B) updating the classification block to match the summary (e.g., WARNING 60-80, ALERT at 80, HALT non-critical at 95), and update both the summary string and the classification block so the strings "Alert at 80% budget utilization, halt non-critical at 95%" and the WARNING/CRITICAL range entries are consistent.README.md-95-97 (1)
95-97:⚠️ Potential issue | 🟡 MinorPoint WebSocket troubleshooting at the ws-hub port.
Line 96 references
localhost:8080, which is the API Gateway in the provided compose context. The ws-hub is exposed on host port8082.Suggested fix
-Ensure `ws-hub` is running. Check browser console for WebSocket connection errors to `localhost:8080`. +Ensure `ws-hub` is running. Check browser console for WebSocket connection errors to `localhost:8082`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 95 - 97, The README section titled "Dashboard/WebSockets" incorrectly points troubleshooting to localhost:8080; update the text under that header to mention the ws-hub host port 8082 instead of 8080 so WebSocket connection errors are checked against ws-hub (port 8082) in the browser console; ensure the sentence references "ws-hub" and "localhost:8082" under the Dashboard/WebSockets heading.scripts/check_services.sh-102-105 (1)
102-105:⚠️ Potential issue | 🟡 MinorInclude
cloudflaredin the service health report.
docker-compose.ymldefinescloudflared, but this script omits it, so tunnel outages will not show up in the summary.Suggested addition
check_container "${PREFIX}prometheus-1" "Prometheus" check_container "${PREFIX}grafana-1" "Grafana" check_container "${PREFIX}jaeger-1" "Jaeger Tracing" +check_container "${PREFIX}cloudflared-1" "Cloudflare Tunnel"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_services.sh` around lines 102 - 105, The observability summary omits the cloudflared service, so add a check for it using the existing helper: call check_container with "${PREFIX}cloudflared-1" (or the exact service name used in docker-compose) and a human label like "cloudflared" alongside the other checks in the Observability Stack block; reuse the same pattern as check_container "${PREFIX}prometheus-1" "Prometheus" so cloudflared health appears in the summary.agents/prompts/frontend_codegen.yaml-50-55 (1)
50-55:⚠️ Potential issue | 🟡 MinorClarify that only the backend can set httpOnly JWT cookies.
Line 55 can lead generated frontend code to attempt direct JWT cookie management. The frontend should send credentialed requests and let the backend issue
Set-Cookie: HttpOnly.Suggested prompt refinement
- - JWT stored in httpOnly cookies, NOT localStorage + - Auth tokens are set by the backend via Set-Cookie with HttpOnly/Secure/SameSite + - Frontend code must not read or write JWTs directly; use credentialed requests (`credentials: "include"`) + - Never store JWTs in localStorage or non-httpOnly cookies🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/frontend_codegen.yaml` around lines 50 - 55, Update the RULES entry that currently reads "JWT stored in httpOnly cookies, NOT localStorage" to explicitly state that only the backend can set HttpOnly cookies and the frontend must never attempt to write them; instruct generators to send credentialed requests (fetch/axios with credentials: 'include' or equivalent) so the backend issues Set-Cookie: HttpOnly, and to avoid client-side cookie manipulation in generated components or helpers (refer to the RULES list and the "JWT stored in httpOnly cookies, NOT localStorage" rule for where to change text).agents/prompts/frontend_codegen.yaml-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorWire this template into
FrontendAgentbefore relying on it.
agents/prompts/frontend_codegen.yamlis not loaded byFrontendAgent.run(). The agent still builds an inlineFRONTEND_SYSTEM_PROMPTand never references the YAML file, making these rules inert.Additionally, the httpOnly cookie rule ("JWT stored in httpOnly cookies, NOT localStorage") is incomplete. httpOnly cookies cannot be read or set by frontend JavaScript—the rule should specify that the frontend must: (1) send credentialed requests with
withCredentials: true/credentials: 'include', and (2) rely on the backend to set the JWT viaSet-Cookieresponse headers. Without this context, the LLM may generate invalid frontend code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/frontend_codegen.yaml` around lines 1 - 2, The frontend prompt YAML (agents/prompts/frontend_codegen.yaml) is never loaded because FrontendAgent.run() still constructs and uses an inline FRONTEND_SYSTEM_PROMPT; update FrontendAgent.run to load and use the YAML-based prompt instead of the inline constant (e.g., import/parse agents/prompts/frontend_codegen.yaml and pass it into the prompt construction used by FrontendAgent.run). Also clarify and expand the httpOnly cookie rule in the YAML: state that frontend code must send credentialed requests (withCredentials: true for axios or fetch with credentials: 'include') and must not attempt to read/set the JWT in JS, instead relying on the backend to set the cookie via Set-Cookie response headers; ensure FRONTEND_SYSTEM_PROMPT semantics mirror this updated YAML text so generated code uses credentialed requests and backend-set httpOnly cookies.agents/prompts/finance_analysis.yaml-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorRemove the false usage claim from line 2 or integrate this template into FinanceAgent.
Line 2 claims the template is "Used by FinanceAgent.run()" but
agents/finance_agent.pyconstructs the cost report directly using internal methods (_generate_optimizations(),_calculate_roi(), etc.) without loading any YAML template. Unlike CEOAgent and CTOAgent, which follow the_load_prompt() → ReasoningChainpattern, FinanceAgent does not referencefinance_analysis.yamlat all.Additionally, the thresholds in the template differ from FinanceAgent's implementation: the YAML defines WARNING as 60–80% and CRITICAL as >80%, but FinanceAgent's
_budget_status()method uses different cutoffs (<60% on-track, <80% monitor, <95% alert, ≥95% critical).Either update line 2 to remove the usage claim, or refactor FinanceAgent to load and use this template alongside a ReasoningChain for multi-turn cost analysis.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/finance_analysis.yaml` around lines 1 - 2, The README line claiming "Used by FinanceAgent.run()" is incorrect because FinanceAgent constructs reports directly via methods like _generate_optimizations(), _calculate_roi(), and _budget_status() instead of loading finance_analysis.yaml via _load_prompt() and a ReasoningChain; either remove or change that line to not assert usage, or refactor FinanceAgent to actually load and use the YAML: update FinanceAgent.run() (and its instantiation in agents/finance_agent.py) to call _load_prompt("finance_analysis.yaml"), instantiate a ReasoningChain, feed the same inputs currently passed into the internal methods, and update _budget_status() to use the threshold values defined in finance_analysis.yaml (or sync the YAML thresholds to match _budget_status()) so code and template thresholds are consistent.Proximus_learning_roadmap.md-41-41 (1)
41-41:⚠️ Potential issue | 🟡 MinorUpdate Go version on line 133 to match repository declaration.
The repository declares
go 1.25.0in go-backend/go.mod, but line 133 references Go 1.24. Update line 133 to Go 1.25.0 to maintain consistency with line 41 and the actual toolchain.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Proximus_learning_roadmap.md` at line 41, Update the Go version mention in the roadmap so it matches the repository toolchain: change the textual reference to "Go 1.24" to "Go 1.25.0" (ensure the "Modern Go" bullet and any other occurrence of "Go 1.24" in the document match the go.mod declaration of Go 1.25.0).tests/unit/test_reasoning_chain.py-205-236 (1)
205-236:⚠️ Potential issue | 🟡 Minor
test_extract_keysdoesn't actually verifyextract_keysbehavior.
mock_llmis a plainasync def, not anAsyncMock, sohasattr(mock_llm, 'call_args_list')is False andsecond_call_promptis always[]. The only remaining assertion isisinstance(result, dict), which passes regardless of whetherextract_keysworks. This test would pass even ifextract_keyswere silently broken.🧪 Proposed fix — capture prompts and assert interpolation
- call_index = {"i": 0} - - async def mock_llm(messages, **kwargs): - resp = responses[call_index["i"]] - call_index["i"] += 1 - return resp + call_index = {"i": 0} + seen_messages: list = [] + + async def mock_llm(messages, **kwargs): + seen_messages.append(messages) + resp = responses[call_index["i"]] + call_index["i"] += 1 + return resp @@ result = await chain.execute(call_llm=mock_llm) - # The second prompt should have had {budget} and {tier} filled in - second_call_prompt = mock_llm.call_args_list if hasattr(mock_llm, 'call_args_list') else [] - assert isinstance(result, dict) + assert isinstance(result, dict) + # Second prompt must have {budget} and {tier} interpolated from step 1 output + second_prompt = json.dumps(seen_messages[1]) + assert "Budget is 100" in second_prompt + assert "tier is small" in second_prompt🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_reasoning_chain.py` around lines 205 - 236, Replace the handwritten async mock with an AsyncMock so you can inspect calls and actually assert interpolation: change the local mock_llm to a unittest.mock.AsyncMock with side_effect returning the two JSON strings, run chain.execute(call_llm=mock_llm), then examine mock_llm.await_args_list (or call_args_list for sync) to get the second call's message/prompt and assert it contains the interpolated values ("Budget is 100" and "tier is small" or that {budget}/{tier} were filled); keep the existing assertions about result being a dict and also assert the extract_keys behavior using ReasoningChain/ReasoningStep names from the test.agents/cto_agent.py-25-32 (1)
25-32:⚠️ Potential issue | 🟡 Minor
_PROMPT_TEMPLATEcan beNone→ AttributeError on every.get(...).
yaml.safe_loadreturnsNonefor an empty or whitespace-only YAML file (not{}). Ifcto_architecture.yamlis ever present but empty (partial write, merge artifact, accidental truncation),_PROMPT_TEMPLATE.get("system", ...)at Line 65 and.get("chain_of_thought", {})at Line 108 raiseAttributeError: 'NoneType' object has no attribute 'get'. Also specifyencoding="utf-8"for cross-platform safety (Windows default is cp1252).🛡️ Proposed fix
def _load_prompt() -> dict: path = os.path.join(_PROMPT_DIR, "cto_architecture.yaml") try: - with open(path, "r") as f: - return yaml.safe_load(f) + with open(path, "r", encoding="utf-8") as f: + data = yaml.safe_load(f) + return data if isinstance(data, dict) else {} except FileNotFoundError: logger.warning("CTO prompt template not found, using inline prompts") return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/cto_agent.py` around lines 25 - 32, _yaml.safe_load in _load_prompt may return None for empty YAML, causing AttributeError when _PROMPT_TEMPLATE is used; change _load_prompt to open the file with encoding="utf-8", call yaml.safe_load, and if the result is None or not a dict return an empty dict so _PROMPT_TEMPLATE is always a dict; ensure the FileNotFoundError branch still returns {}. This touches the _load_prompt function and the use sites where _PROMPT_TEMPLATE.get(...) are called.tests/integration/test_full_lifecycle.py-127-134 (1)
127-134:⚠️ Potential issue | 🟡 Minor
test_events_emitted_in_ordersilently passes whento_dictis absent.If
hasattr(event, "to_dict")returns False (e.g., someone emits a raw dict instead ofAgentEvent), the loop becomes a no-op and the test still passes. Since the whole point of the PR-wide typed-event refactor is to emitAgentEventobjects, consider asserting at least one emission was anAgentEventso regressions to untyped events are caught.🧪 Proposed stronger assertion
+ from agents.events import AgentEvent + typed_events = [ + call[0][0] for call in mock_context.emit_event.call_args_list + if isinstance(call[0][0], AgentEvent) + ] + assert typed_events, "Expected at least one AgentEvent emission" # Check that events have proper structure (AgentEvent.to_dict()) - for call in mock_context.emit_event.call_args_list: - event = call[0][0] - if hasattr(event, "to_dict"): - d = event.to_dict() - assert "type" in d - assert "agent" in d - assert "message" in d + for event in typed_events: + d = event.to_dict() + assert {"type", "agent", "message"}.issubset(d)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_full_lifecycle.py` around lines 127 - 134, The current loop silently skips non-objects with to_dict so the test can pass even if raw dicts are emitted; update the assertion to ensure at least one emitted item is an AgentEvent (or has to_dict) by checking mock_context.emit_event.call_args_list for any call where the event object (call[0][0]) is an instance of AgentEvent or has a to_dict, and fail the test if none match; change the loop around mock_context.emit_event.call_args_list (used in test_events_emitted_in_order / test_full_lifecycle.py) to assert any(hasattr(event, "to_dict") or isinstance(event, AgentEvent) for event in emitted_events) before validating keys, then keep the existing per-event to_dict checks for those that do implement to_dict.agents/devops_agent.py-168-178 (1)
168-178:⚠️ Potential issue | 🟡 MinorUse public GitTool API for git remote operations instead of calling private
_run_subprocess()directly.The code at line 177 calls
git._run_subprocess()to add a remote. Instead, usegit.run("init")pattern or add a public method to GitTool if needed. Note: The authentication viaGITHUB_TOKENalready works correctly — whengit.run("push", ...)is invoked at line 178, GitTool's_push()method reads the token and injects it into the remote URL before pushing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/devops_agent.py` around lines 168 - 178, In _run_git_flow replace the direct call to the private GitTool._run_subprocess with the public GitTool API: either call git.run(...) to perform the remote add (e.g., use git.run("remote", "add", "origin", github_url")) or add a public GitTool.add_remote(repo_url, name="origin") method and call that from _run_git_flow; keep the existing logic that constructs github_url and let GitTool._push() handle token injection when git.run("push", remote="origin", branch="main") is invoked.agents/prompts/ceo_strategy.yaml-43-45 (1)
43-45:⚠️ Potential issue | 🟡 MinorAvoid using an invalid enum value in the JSON template.
"P0/P1/P2"is not accepted byFeaturePriority; models often copy examples literally, which would trigger validation retries or fallback.📝 Proposed prompt fix
"problem_statement": "The pain point in their own words", "mvp_features": [ - {{"name": "Feature", "priority": "P0/P1/P2", "description": "What it does and WHY"}} + {{"name": "Feature", "priority": "P0", "description": "What it does and WHY"}} ],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/ceo_strategy.yaml` around lines 43 - 45, The template for "mvp_features" uses an invalid combined enum value "P0/P1/P2" which will be copied literally and fail FeaturePriority validation; update the example so each feature's "priority" is a single valid enum value (e.g., "P0" or "P1" or "P2") and optionally show multiple entries to demonstrate different priorities (keep the key names "mvp_features" and the example object structure intact but replace the combined enum with one concrete valid value or a clear placeholder like "<P0|P1|P2>").agents/ceo_agent.py-24-31 (1)
24-31:⚠️ Potential issue | 🟡 MinorHarden prompt loading against empty or malformed YAML.
yaml.safe_load(f)can returnNoneor a non-dict, and YAML parse errors currently fail module import instead of falling back to inline prompts.🛡️ Proposed fix
def _load_prompt() -> dict: path = os.path.join(_PROMPT_DIR, "ceo_strategy.yaml") try: with open(path, "r") as f: - return yaml.safe_load(f) - except FileNotFoundError: - logger.warning("CEO prompt template not found, using inline prompts") + loaded = yaml.safe_load(f) or {} + if not isinstance(loaded, dict): + logger.warning("CEO prompt template is not a mapping, using inline prompts") + return {} + return loaded + except (FileNotFoundError, yaml.YAMLError) as e: + logger.warning("CEO prompt template unavailable, using inline prompts", error=str(e)) return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/ceo_agent.py` around lines 24 - 31, The _load_prompt function currently uses yaml.safe_load(f) which can raise yaml.YAMLError or return None/non-dict and that causes import-time failures or malformed prompts; update _load_prompt to catch yaml.YAMLError (and other exceptions) around safe_load, log a warning including the exception, and ensure the function returns an empty dict fallback whenever safe_load returns None or a non-dict (e.g., wrap the result in a type check and if not isinstance(result, dict) return {}). Keep the existing FileNotFoundError branch and use the existing logger to emit clear messages when falling back.
🧹 Nitpick comments (7)
agents/events.py (1)
21-24: Constrain event type and level values.The docstring says this is type-safe, but
event_typeandlevelstill accept any string. A smallLiteralalias prevents invalid event names from leaking to subscribers.Suggested type tightening
-from typing import Any +from typing import Any, Literal + +AgentEventType = Literal["thinking", "tool_call", "artifact", "progress", "error"] +AgentEventLevel = Literal["info", "success", "warning", "error"] @@ - event_type: str # "thinking", "tool_call", "artifact", "progress", "error" + event_type: AgentEventType agent_role: str message: str - level: str = "info" # "info", "success", "warning", "error" + level: AgentEventLevel = "info"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/events.py` around lines 21 - 24, The event_type and level fields currently accept any string; tighten their types by adding Literal aliases and using them on the dataclass fields: define EventType = Literal["thinking","tool_call","artifact","progress","error"] and EventLevel = Literal["info","success","warning","error"] (import Literal from typing), then change the annotated types on the existing fields event_type: EventType and level: EventLevel (preserving the default "info"); update any imports or type references in agents/events.py as needed.tests/unit/test_base_agent.py (1)
88-104: Cover the public mock-modecall_llm()path.These tests now bless
_mock_llm_response()returning a tuple, but they do not catch thatcall_llm()must still return only the text string when no LLM client is configured.🧪 Suggested test coverage
+ `@pytest.mark.asyncio` + async def test_call_llm_mock_mode_returns_text(self, agent): + result = await agent.call_llm( + [{"role": "user", "content": "Hello world"}] + ) + + assert isinstance(result, str) + data = json.loads(result) + assert data["agent"] == "TestAgent"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_base_agent.py` around lines 88 - 104, The tests exercise _mock_llm_response() which returns a (result, usage) tuple, but they miss the public call_llm() path which should return only the text string when no LLM client is configured; add a test that ensures call_llm(messages) on the TestAgent (or agent fixture) with no llm client set returns a str (not a tuple) and contains the expected response text (e.g., contains the last user message or "Hello world"), and assert type(result) is str and that json.loads(result) or substring checks succeed; reference call_llm() and _mock_llm_response() to locate the code under test.tests/unit/test_file_edit_tool.py (2)
76-84: Guardresult.errorbefore substring check.If the tool ever returns
result.error is Noneon failure,"2" in result.errorraisesTypeErrorrather than a readable assertion failure, making CI debugging painful.- assert not result.success - assert "2" in result.error # Should mention count + assert not result.success + assert result.error and "2" in result.error # Should mention count🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_file_edit_tool.py` around lines 76 - 84, The assertion assumes result.error is a string and can raise TypeError if it's None; update the test_multiple_matches_without_flag_fails test to first assert that result.error is present (e.g., assert result.error is not None or assert result.error) before checking the substring, then check that "2" is in result.error; locate the assertion in the test function test_multiple_matches_without_flag_fails that currently does assert not result.success followed by assert "2" in result.error and add the guard prior to the substring check.
57-73: Use a context manager to avoid file-handle leaks.
open(full_path).read()leaves the descriptor open until GC; pytest +-W error::ResourceWarning(used by several CI presets) will flag this.- content = open(full_path).read() + with open(full_path) as f: + content = f.read()Apply the same at Line 72.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_file_edit_tool.py` around lines 57 - 73, The tests open files with open(full_path).read() which leaks file descriptors; update both places (the earlier "Verify file was actually changed" block and the test_exact_match_preserves_indentation) to use context managers when reading files: locate the uses of full_path and content in the tests (in test_exact_match_preserves_indentation and the preceding verification block) and replace the raw open(...).read() calls with a with open(full_path, 'r') as f: content = f.read() pattern to ensure files are closed.agents/devops_agent.py (1)
180-207: Dead assignment —sandbox.working_dirset twice.Line 183 assigns
deliverables_path, then Line 199 unconditionally overwrites it withtf_dir. Drop the first assignment to avoid confusion.♻️ Proposed cleanup
async def _run_terraform_command(self, cmd: str, deliverables_path: str, context: Any): """Execute a terraform command via DockerSandboxTool.""" sandbox = DockerSandboxTool() - sandbox.working_dir = deliverables_path - # Inject AWS credentials if they exist env = { "AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID", ""), "AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY", ""), "AWS_REGION": os.getenv("AWS_REGION", "us-east-1"), "TF_IN_AUTOMATION": "1" } # Mount the terraform directory specifically tf_dir = os.path.join(deliverables_path, "terraform") if not os.path.exists(tf_dir): # Fallback if the agent didn't put it in a 'terraform/' subdir tf_dir = deliverables_path sandbox.working_dir = tf_dir🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/devops_agent.py` around lines 180 - 207, The method _run_terraform_command sets sandbox.working_dir twice (first to deliverables_path, then to tf_dir), making the first assignment dead; remove the initial sandbox.working_dir = deliverables_path (or refactor to compute tf_dir first and then set sandbox.working_dir once) so sandbox.working_dir is assigned only after computing tf_dir (references: _run_terraform_command, sandbox.working_dir, deliverables_path, tf_dir).tests/unit/test_ceo_agent.py (1)
70-81: Make this test actually exercise string-feature normalization.Right now the fallback path returns dict features, so this can pass without covering the branch that converts string features into dicts.
🧪 Proposed test adjustment
`@pytest.mark.asyncio` async def test_run_normalizes_string_features(self, ceo_agent): """If the reasoning chain returns string features, they should be normalized.""" - # The mock LLM will trigger the fallback path - result = await ceo_agent.run( - business_idea="Test app", - budget_usd=50.0, - ) + raw_plan = ceo_agent._extract_plan_fallback("Test app") + raw_plan["mvp_features"] = ["Login", "Dashboard"] + + with patch( + "agents.ceo_agent.ReasoningChain.execute", + new=AsyncMock(return_value=raw_plan), + ): + result = await ceo_agent.run( + business_idea="Test app", + budget_usd=50.0, + ) + if "mvp_features" in result: for f in result["mvp_features"]: assert isinstance(f, dict) assert "name" in f🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_ceo_agent.py` around lines 70 - 81, The test test_run_normalizes_string_features currently never forces the fallback branch that returns string features; update it to mock or stub the component that produces the reasoning chain output (e.g., the LLM or method used inside ceo_agent.run) so it returns mvp_features as a list of strings (e.g., ["Login", "Search"]) instead of dicts, then call ceo_agent.run(business_idea="Test app", budget_usd=50.0) and assert the result["mvp_features"] elements are converted to dicts with a "name" key and string value; reference ceo_agent.run and the test function name to locate where to inject the mocked response.agents/prompts/ceo_strategy.yaml (1)
87-107: Complete the few-shot output so it matchesBusinessPlan.The example omits required
milestones, and also skipsrisk_assessment,estimated_users_year1, andgo_to_market, which can bias structured outputs toward schema-incomplete JSON.📝 Proposed prompt addition
success_metrics: - "500 meal plans generated in first month" - "40% week-2 retention rate" - "Average plan generation time < 30 seconds" revenue_model: "Freemium: free 1 plan/week, $9.99/month for unlimited plans + nutrition tracking" + estimated_users_year1: 5000 + go_to_market: "LinkedIn creator partnerships, Product Hunt launch, and nutrition newsletter sponsorships" + milestones: + - phase: "MVP" + duration_days: 30 + deliverables: ["Preference quiz", "Meal plan generator", "Grocery list export"] + risk_assessment: + - risk: "Generated plans may not match dietary restrictions" + impact: "High" + mitigation: "Validate restrictions with deterministic filters before returning plans"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/prompts/ceo_strategy.yaml` around lines 87 - 107, The few_shot_examples example is missing required BusinessPlan fields (milestones, risk_assessment, estimated_users_year1, go_to_market); update the single example under few_shot_examples so its output fully matches the BusinessPlan schema by adding a milestones array (3–5 dated, measurable milestones tied to MVP features), a risk_assessment object (key risks with mitigation plans), an estimated_users_year1 numeric or range, and a go_to_market section (channels, launch strategy, and target KPIs); ensure you keep existing keys (vision, target_users, problem_statement, mvp_features, success_metrics, revenue_model) and add the new fields with realistic, concise entries so the example serves as a complete schema-compliant template for the LLM.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d7f37a30-e281-46cc-97d2-a4adac4fe2e4
📒 Files selected for processing (41)
.editorconfig.github/workflows/python-ci.yml.gitignoreCHANGELOG.mdCONTRIBUTING.mdPRIORITIES.mdProximus_learning_roadmap.mdREADME.mdagents/__init__.pyagents/base_agent.pyagents/ceo_agent.pyagents/cto_agent.pyagents/devops_agent.pyagents/events.pyagents/frontend_agent.pyagents/prompts/_shared_critique.yamlagents/prompts/backend_codegen.yamlagents/prompts/ceo_strategy.yamlagents/prompts/cto_architecture.yamlagents/prompts/devops_infra.yamlagents/prompts/finance_analysis.yamlagents/prompts/frontend_codegen.yamlagents/prompts/qa_test_gen.yamlagents/reasoning.pyagents/schemas.pymerge_my_prs.shscripts/check_services.shscripts/create_prs.shtests/__init__.pytests/conftest.pytests/integration/__init__.pytests/integration/test_full_lifecycle.pytests/unit/__init__.pytests/unit/test_base_agent.pytests/unit/test_ceo_agent.pytests/unit/test_cto_agent.pytests/unit/test_events_schemas.pytests/unit/test_file_edit_tool.pytests/unit/test_messaging_schemas.pytests/unit/test_reasoning_chain.pytools/file_edit_tool.py
✅ Files skipped from review due to trivial changes (9)
- tests/unit/init.py
- tests/integration/init.py
- CHANGELOG.md
- tests/init.py
- .editorconfig
- PRIORITIES.md
- .gitignore
- CONTRIBUTING.md
- agents/prompts/_shared_critique.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/python-ci.yml
- agents/frontend_agent.py
| __all__ = [ | ||
| "AgentEvent", | ||
| "Architecture", | ||
| "BackendAgent", | ||
| "BaseAgent", | ||
| "BusinessPlan", | ||
| "CEOAgent", | ||
| "CTOAgent", | ||
| "DevOpsAgent", | ||
| "FinancialReport", | ||
| "FinanceAgent", | ||
| "FrontendAgent", | ||
| "QAAgent", | ||
| "QAReport", | ||
| "ReasoningChain", | ||
| "ReasoningStep", | ||
| ] |
There was a problem hiding this comment.
Fix RUF022: __all__ must be sorted (CI failure).
"FinancialReport" sorts after "FinanceAgent" (e < i at position 6), so swap them.
🛠️ Proposed fix
__all__ = [
"AgentEvent",
"Architecture",
"BackendAgent",
"BaseAgent",
"BusinessPlan",
"CEOAgent",
"CTOAgent",
"DevOpsAgent",
- "FinancialReport",
"FinanceAgent",
+ "FinancialReport",
"FrontendAgent",
"QAAgent",
"QAReport",
"ReasoningChain",
"ReasoningStep",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = [ | |
| "AgentEvent", | |
| "Architecture", | |
| "BackendAgent", | |
| "BaseAgent", | |
| "BusinessPlan", | |
| "CEOAgent", | |
| "CTOAgent", | |
| "DevOpsAgent", | |
| "FinancialReport", | |
| "FinanceAgent", | |
| "FrontendAgent", | |
| "QAAgent", | |
| "QAReport", | |
| "ReasoningChain", | |
| "ReasoningStep", | |
| ] | |
| __all__ = [ | |
| "AgentEvent", | |
| "Architecture", | |
| "BackendAgent", | |
| "BaseAgent", | |
| "BusinessPlan", | |
| "CEOAgent", | |
| "CTOAgent", | |
| "DevOpsAgent", | |
| "FinanceAgent", | |
| "FinancialReport", | |
| "FrontendAgent", | |
| "QAAgent", | |
| "QAReport", | |
| "ReasoningChain", | |
| "ReasoningStep", | |
| ] |
🧰 Tools
🪛 GitHub Actions: AI Organization — CI/CD Pipeline
[error] 23-23: ruff check . failed: RUF022 __all__ is not sorted
🪛 GitHub Check: 🧪 Test & Lint
[failure] 23-39: ruff (RUF022)
agents/init.py:23:11: RUF022 __all__ is not sorted
help: Apply an isort-style sorting to __all__
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/__init__.py` around lines 23 - 39, The __all__ list is not
alphabetically sorted causing RUF022; reorder the exported symbols so they are
in ascending alphabetical order—specifically swap "FinancialReport" and
"FinanceAgent" (ensure "FinanceAgent" appears before "FinancialReport") and
verify the entire list (AgentEvent, Architecture, BackendAgent, BaseAgent,
BusinessPlan, CEOAgent, CTOAgent, DevOpsAgent, FinanceAgent, FinancialReport,
FrontendAgent, QAAgent, QAReport, ReasoningChain, ReasoningStep) is sorted.
| # In Standalone/Local mode, we simulate the success if Terraform is missing | ||
| # to prevent blocking the entire local mission. | ||
| return { | ||
| "status": "deployed", | ||
| "github_repo": f"local://{project_name}", | ||
| "local_path": deliverables_path, | ||
| "public_url": f"Local Folder: {deliverables_path}", | ||
| "deployment_time_seconds": 1, | ||
| "health_checks_passed": True, | ||
| "provisioning_status": "success", | ||
| } |
There was a problem hiding this comment.
Deployment reports success even when Terraform init/plan fail.
init_res and plan_res failures only emit a logger.warning (Lines 143, 150), then the function unconditionally returns status: "deployed", provisioning_status: "success", health_checks_passed: True. This corrupts deployment_info, the persisted deployment_url artifact, and the decision log's recorded outcome — downstream consumers (orchestrator/planner.py, artifact URL fallbacks) will advertise a successful deployment for a mission whose infrastructure provisioning actually failed. The comment acknowledges simulating success "if Terraform is missing", but the code doesn't distinguish "binary missing" from "plan errored" — both paths collapse to success.
🛡️ Proposed fix — propagate real status and only simulate on `terraform not found`
- # Init
- init_res = await self._run_terraform_command(
- "init", deliverables_path, context
- )
- if not init_res.success:
- logger.warning(f"Terraform init failed: {init_res.error}")
-
- # Plan
- plan_res = await self._run_terraform_command(
- "plan -no-color -out=tfplan", deliverables_path, context
- )
- if not plan_res.success:
- logger.warning(f"Terraform plan failed: {plan_res.error}")
-
- # In Standalone/Local mode, we simulate the success if Terraform is missing
- # to prevent blocking the entire local mission.
- return {
- "status": "deployed",
- "github_repo": f"local://{project_name}",
- "local_path": deliverables_path,
- "public_url": f"Local Folder: {deliverables_path}",
- "deployment_time_seconds": 1,
- "health_checks_passed": True,
- "provisioning_status": "success",
- }
+ # Init
+ init_res = await self._run_terraform_command("init", deliverables_path, context)
+ plan_res = None
+ if init_res.success:
+ plan_res = await self._run_terraform_command(
+ "plan -no-color -out=tfplan", deliverables_path, context
+ )
+
+ tf_ok = init_res.success and (plan_res is not None and plan_res.success)
+ # Only mark a local-mode simulated success if the binary is genuinely unavailable.
+ binary_missing = "not found" in (init_res.error or "").lower() or \
+ "no such file" in (init_res.error or "").lower()
+
+ return {
+ "status": "deployed" if tf_ok else ("simulated" if binary_missing else "failed"),
+ "local_path": deliverables_path,
+ "public_url": f"Local Folder: {deliverables_path}",
+ "health_checks_passed": tf_ok,
+ "provisioning_status": "success" if tf_ok else ("simulated" if binary_missing else "failed"),
+ "terraform_init_error": None if init_res.success else init_res.error,
+ "terraform_plan_error": None if (plan_res and plan_res.success) else (plan_res.error if plan_res else "skipped"),
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/devops_agent.py` around lines 152 - 162, The current branch always
returns a successful deployment dict even when Terraform init/plan failed
(init_res and plan_res); change the logic so that you only simulate success when
the failure is due to Terraform binary missing (detect via init_res/plan_res
error type/message or a sentinel like "terraform: not found"), otherwise return
a failed deployment result: set "status" to "failed", "provisioning_status" to
"error" (or include the specific error code), "health_checks_passed" to False,
and include error details (from init_res/plan_res) in the returned dict so
downstream consumers see the real failure; update the code paths that create the
local simulated return (the current return block that yields
"status":"deployed", "provisioning_status":"success", etc.) to only execute for
the explicit terraform-not-found condition and otherwise propagate/record the
real error from init_res/plan_res.
| few_shot_examples: | ||
| - input: "E-commerce API with user auth, product catalog, and cart" | ||
| output: | ||
| frontend: | ||
| framework: "Next.js 14" | ||
| language: "TypeScript" | ||
| rationale: "SSR for SEO on product pages, React for interactive cart" | ||
| backend: | ||
| framework: "FastAPI" | ||
| language: "Python 3.12" | ||
| rationale: "Async I/O for catalog queries, automatic OpenAPI docs" | ||
| database: | ||
| framework: "PostgreSQL 15" | ||
| language: "SQL" | ||
| rationale: "ACID for cart transactions, JSONB for flexible product attributes" | ||
| estimated_monthly_cost_usd: 85.0 | ||
| cost_breakdown: | ||
| - service: "ECS Fargate (backend)" | ||
| monthly_usd: 25.0 | ||
| - service: "RDS db.t3.micro" | ||
| monthly_usd: 15.0 | ||
| - service: "ALB" | ||
| monthly_usd: 20.0 | ||
| - service: "S3 (product images)" | ||
| monthly_usd: 5.0 | ||
| - service: "CloudWatch" | ||
| monthly_usd: 10.0 | ||
| - service: "ECR + Secrets Manager" | ||
| monthly_usd: 10.0 |
There was a problem hiding this comment.
Make the few-shot example match the required schema.
The example omits cache, database_schema, api_contracts, security, scaling_policy, and disaster_recovery, despite the prompt requiring the same JSON schema. This can bias the CTO output toward invalid/incomplete architecture objects.
Either remove the few-shot example or add the missing fields with realistic values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/prompts/cto_architecture.yaml` around lines 98 - 126, The
few_shot_examples entry for input "E-commerce API with user auth, product
catalog, and cart" is missing required schema fields (cache, database_schema,
api_contracts, security, scaling_policy, disaster_recovery); update the example
under few_shot_examples to include realistic values for each missing field
(e.g., cache: Redis config and rationale; database_schema: high-level
tables/relations for users/products/orders; api_contracts: sample endpoints and
request/response shapes; security: auth, encryption, OWASP mitigations;
scaling_policy: autoscaling rules, instance sizes; disaster_recovery: RTO/RPO
and backup/replication plan) so the example fully matches the prompt schema, or
alternatively remove the entire few_shot_examples item if you prefer not to
provide an example.
| - terraform/vpc.tf (VPC, subnets, NAT gateway) | ||
| - terraform/ecs.tf (cluster, task definition, service) | ||
| - terraform/rds.tf (PostgreSQL instance, subnet group) | ||
| - terraform/alb.tf (load balancer, target groups, listeners) | ||
| - terraform/iam.tf (roles, policies, instance profiles) | ||
| - terraform/cloudwatch.tf (log groups, alarms, dashboards) | ||
| - Dockerfile (multi-stage build, non-root user) | ||
| - .github/workflows/deploy.yml (CI/CD pipeline) | ||
|
|
||
| COST RULES: | ||
| - If budget < $50: db.t3.micro, 256 CPU Fargate, no NAT gateway | ||
| - If budget < $200: db.t3.small, 512 CPU Fargate, single NAT | ||
| - If budget >= $200: db.t3.medium, 1024 CPU Fargate, multi-AZ |
There was a problem hiding this comment.
Resolve the NAT and Multi-AZ budget contradictions.
The file list always asks for a NAT gateway, while the low-budget rule says no NAT. The system prompt also says databases are Multi-AZ, but the cost tiers reserve Multi-AZ for higher budgets. This can push generated Terraform over budget.
🛠️ Proposed prompt adjustment
- - terraform/vpc.tf (VPC, subnets, NAT gateway)
+ - terraform/vpc.tf (VPC, subnets, NAT gateway only when budget permits)
...
- - Multi-AZ for databases, single-AZ for compute (cost savings)
+ - Multi-AZ for production/high-budget databases; single-AZ is acceptable for low-budget MVPs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/prompts/devops_infra.yaml` around lines 46 - 58, The prompt currently
contradicts itself by always including a NAT gateway in the file list
(terraform/vpc.tf) and enforcing Multi-AZ databases while the COST RULES reserve
"no NAT" and non-Multi-AZ for low budgets; update the prompt so terraform/vpc.tf
and terraform/rds.tf are conditional: make the NAT gateway optional based on
budget (no NAT for budget < $50, single NAT for $50–$199, multi-AZ NAT/HA for
budget >= $200) and make Multi-AZ for the DB only enabled when budget >= $200,
and remove any system-level assertion that databases are always Multi-AZ; also
ensure the COST RULES block explicitly references these conditionals so
.github/workflows/deploy.yml and generated Terraform respect budget-driven
choices.
| Generate a JSON dictionary of test files: | ||
| - tests/conftest.py (shared fixtures: test DB, client, auth_headers) | ||
| - tests/test_auth.py (register, login, token validation) | ||
| - tests/test_items.py (CRUD operations, pagination, ownership) | ||
| - tests/test_security.py (SQL injection, XSS, JWT tampering, large payloads) | ||
| - tests/test_health.py (health check, root, docs endpoints) |
There was a problem hiding this comment.
Do not hard-code items tests for every API contract.
For APIs without an item resource, this prompt can generate failing tests unrelated to the provided contracts. Make the file layout contract-derived and keep test_items.py only as an example when matching endpoints exist.
🛠️ Proposed prompt adjustment
- Generate a JSON dictionary of test files:
- - tests/conftest.py (shared fixtures: test DB, client, auth_headers)
- - tests/test_auth.py (register, login, token validation)
- - tests/test_items.py (CRUD operations, pagination, ownership)
- - tests/test_security.py (SQL injection, XSS, JWT tampering, large payloads)
- - tests/test_health.py (health check, root, docs endpoints)
+ Generate a JSON dictionary of test files derived from the API contracts:
+ - tests/conftest.py (shared fixtures: test DB, client, auth_headers)
+ - tests/test_auth.py only when auth endpoints exist
+ - tests/test_<resource>.py for each resource group present in the contracts
+ - tests/test_security.py for contract-applicable security scenarios
+ - tests/test_health.py only when health/root/docs endpoints exist📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Generate a JSON dictionary of test files: | |
| - tests/conftest.py (shared fixtures: test DB, client, auth_headers) | |
| - tests/test_auth.py (register, login, token validation) | |
| - tests/test_items.py (CRUD operations, pagination, ownership) | |
| - tests/test_security.py (SQL injection, XSS, JWT tampering, large payloads) | |
| - tests/test_health.py (health check, root, docs endpoints) | |
| Generate a JSON dictionary of test files derived from the API contracts: | |
| - tests/conftest.py (shared fixtures: test DB, client, auth_headers) | |
| - tests/test_auth.py only when auth endpoints exist | |
| - tests/test_<resource>.py for each resource group present in the contracts | |
| - tests/test_security.py for contract-applicable security scenarios | |
| - tests/test_health.py only when health/root/docs endpoints exist |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/prompts/qa_test_gen.yaml` around lines 43 - 48, The prompt currently
always generates a tests/test_items.py file causing failing tests for APIs that
lack an item resource; update the prompt text to make test file generation
contract-driven by checking the API contract for item-like endpoints (e.g.,
resource named "items", CRUD paths like GET /items, POST /items/{id}, or OpenAPI
tags/paths); only include tests/test_items.py when such endpoints exist and
otherwise omit or replace it with resource-specific test file names (reference
the generated filenames tests/test_items.py, tests/test_auth.py,
tests/test_security.py and the logic that decides to emit them) so the test
layout is derived from the contract rather than hard-coded.
| * **[Developer Hub](./docs/DEVELOPER_HUB.md)**: The central technical entry point for the polyglot codebase. | ||
| * **[Architecture Guide](./architecture.md)**: High-level system design and data flow. | ||
| * **[API Reference](./docs/API_REFERENCE.md)**: Go Gateway routes and WebSocket events. | ||
| * **[Desktop Mastery](./docs/DESKTOP_MASTERY.md)**: Internals of the local standalone engine. | ||
| * **[Enterprise Setup](./docs/ENTERPRISE_SAS_GUIDE.md)**: How to run the full Docker-based SaaS stack. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
# Check local README markdown links that point to repository files.
python - <<'PY'
from pathlib import Path
import re
readme = Path("README.md")
text = readme.read_text()
failed = False
for raw in re.findall(r'\]\((\./[^)#]+)\)', text):
target = Path(raw[2:])
if not target.exists():
print(f"Missing README link target: {raw}")
failed = True
raise SystemExit(1 if failed else 0)
PYRepository: DsThakurRawat/Autonomous-Multi-Agent-AI-Organization
Length of output: 312
Add missing documentation files or remove broken links from README.
The README references four documentation files that do not exist in the repository:
./docs/DEVELOPER_HUB.md./docs/API_REFERENCE.md./docs/DESKTOP_MASTERY.md./docs/ENTERPRISE_SAS_GUIDE.md
Either create these files or remove the corresponding links from lines 56–60 to prevent shipping broken documentation references.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 56 - 60, The README contains broken documentation
links for "Developer Hub", "API Reference", "Desktop Mastery", and "Enterprise
Setup"; either add the missing markdown files with the matching link targets
(creating docs/DEVELOPER_HUB.md, docs/API_REFERENCE.md, docs/DESKTOP_MASTERY.md,
docs/ENTERPRISE_SAS_GUIDE.md) populated with the appropriate content, or
remove/replace those four link lines in README.md so they no longer point to
non-existent documentation; update any navigation or TOC that references these
link labels to keep consistency (search for the exact link labels "Developer
Hub", "API Reference", "Desktop Mastery", "Enterprise Setup" to locate the lines
to change).
| # Step 0: Commit all uncommitted changes to the feature branch | ||
| echo "" | ||
| echo "── Step 0: Commit all changes to feature branch ──" | ||
| git checkout "$FEATURE_BRANCH" | ||
| git add -A | ||
| git commit -m "hardening v2: test repairs, prompt templates, project hygiene" || echo "Nothing new to commit" | ||
| git push origin "$FEATURE_BRANCH" --force |
There was a problem hiding this comment.
Do not auto-commit and force-push the caller’s entire working tree.
This can publish unrelated edits or secrets and rewrite the shared feature branch. Require a clean tree and let the user commit intentionally.
Safer preflight
-# Step 0: Commit all uncommitted changes to the feature branch
+# Step 0: Require explicit commits before automation
echo ""
-echo "── Step 0: Commit all changes to feature branch ──"
+echo "── Step 0: Verify clean working tree ──"
git checkout "$FEATURE_BRANCH"
-git add -A
-git commit -m "hardening v2: test repairs, prompt templates, project hygiene" || echo "Nothing new to commit"
-git push origin "$FEATURE_BRANCH" --force
+if ! git diff --quiet || ! git diff --cached --quiet; then
+ echo "Working tree has uncommitted changes. Commit or stash them before running this script." >&2
+ exit 1
+fi
+git push origin "$FEATURE_BRANCH" --force-with-lease📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Step 0: Commit all uncommitted changes to the feature branch | |
| echo "" | |
| echo "── Step 0: Commit all changes to feature branch ──" | |
| git checkout "$FEATURE_BRANCH" | |
| git add -A | |
| git commit -m "hardening v2: test repairs, prompt templates, project hygiene" || echo "Nothing new to commit" | |
| git push origin "$FEATURE_BRANCH" --force | |
| # Step 0: Require explicit commits before automation | |
| echo "" | |
| echo "── Step 0: Verify clean working tree ──" | |
| git checkout "$FEATURE_BRANCH" | |
| if ! git diff --quiet || ! git diff --cached --quiet; then | |
| echo "Working tree has uncommitted changes. Commit or stash them before running this script." >&2 | |
| exit 1 | |
| fi | |
| git push origin "$FEATURE_BRANCH" --force-with-lease |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/create_prs.sh` around lines 15 - 21, The script currently auto-adds,
commits everything and force-pushes the FEATURE_BRANCH (git add -A; git commit
-m ... || echo ...; git push ... --force), which can publish unrelated changes
or secrets; change it to abort if the working tree is dirty and prompt the user
to review/commit explicitly: check git status --porcelain and if non-empty print
a clear message instructing the caller to review and commit (or provide an
explicit --auto-commit flag), do not run git add -A or git commit automatically,
and replace the unconditional git push ... --force with a safe push (git push
origin "$FEATURE_BRANCH") or require a confirmation prompt before force-pushing.
Ensure you reference FEATURE_BRANCH, the git commit invocation, and the git push
--force call when making the change.
| gh pr create \ | ||
| --title "fix(tests): repair broken test imports and assertions" \ | ||
| --body "## What | ||
| - \`test_messaging_schemas.py\`: rewrote to match actual schema class names (\`ResultMessage\`, not \`TaskResultMessage\`) | ||
| - \`test_base_agent.py\`: updated mock LLM test for new \`(text, usage)\` tuple return | ||
| - \`test_file_edit_tool.py\`: fixed difflib fuzzy test — 8-line block keeps SequenceMatcher ratio ≥ 0.85 | ||
|
|
||
| ## Tests | ||
| All 236 unit tests passing." \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 1..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch | ||
| sleep 8 | ||
| git checkout "$MAIN_BRANCH" | ||
| git pull origin "$MAIN_BRANCH" | ||
|
|
||
| # ─── PR 2: DevOps _emit_thinking cleanup ───────────────────── | ||
| BRANCH="fix/devops-emit-cleanup" | ||
| echo "" | ||
| echo "══ PR 2: $BRANCH ══" | ||
| git checkout -b "$BRANCH" "$MAIN_BRANCH" 2>/dev/null || git checkout "$BRANCH" | ||
| git checkout "$FEATURE_BRANCH" -- agents/devops_agent.py | ||
| git add -A | ||
| git commit -m "fix(devops): remove _emit_thinking wrapper, use self.emit() directly | ||
|
|
||
| Replaced 3 call sites with self.emit() and deleted the unnecessary | ||
| indirection wrapper. Consistent with all other agents." || echo "Already committed" | ||
| git push origin "$BRANCH" --force | ||
| gh pr create \ | ||
| --title "fix(devops): remove _emit_thinking wrapper" \ | ||
| --body "## What | ||
| Removed \`_emit_thinking\` wrapper method from \`DevOpsAgent\` and replaced all 3 call sites with direct \`self.emit()\` calls. | ||
|
|
||
| ## Why | ||
| Every other agent uses \`self.emit()\` directly. This wrapper was unnecessary indirection." \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 2..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch | ||
| sleep 8 | ||
| git checkout "$MAIN_BRANCH" | ||
| git pull origin "$MAIN_BRANCH" | ||
|
|
||
| # ─── PR 3: Add 5 missing prompt templates ──────────────────── | ||
| BRANCH="feat/agent-prompt-templates" | ||
| echo "" | ||
| echo "══ PR 3: $BRANCH ══" | ||
| git checkout -b "$BRANCH" "$MAIN_BRANCH" 2>/dev/null || git checkout "$BRANCH" | ||
| git checkout "$FEATURE_BRANCH" -- \ | ||
| agents/prompts/backend_codegen.yaml \ | ||
| agents/prompts/frontend_codegen.yaml \ | ||
| agents/prompts/qa_test_gen.yaml \ | ||
| agents/prompts/devops_infra.yaml \ | ||
| agents/prompts/finance_analysis.yaml | ||
| git add -A | ||
| git commit -m "feat(prompts): add YAML prompt templates for 5 remaining agents | ||
|
|
||
| - backend_codegen.yaml: FastAPI code gen with security critique | ||
| - frontend_codegen.yaml: Next.js/React with accessibility critique | ||
| - qa_test_gen.yaml: pytest generation with coverage gap analysis | ||
| - devops_infra.yaml: Terraform/AWS with cost-tier rules | ||
| - finance_analysis.yaml: AWS cost analysis with alert thresholds | ||
|
|
||
| All templates follow the analyze→generate→critique→refine pattern." || echo "Already committed" | ||
| git push origin "$BRANCH" --force | ||
| gh pr create \ | ||
| --title "feat(prompts): add YAML prompt templates for 5 remaining agents" \ | ||
| --body "## What | ||
| Created structured YAML prompt templates for Backend, Frontend, QA, DevOps, and Finance agents. | ||
|
|
||
| All 8/8 agents now have externalized YAML-based prompts following the \`analyze→generate→critique→refine\` chain-of-thought pattern." \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 3..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch | ||
| sleep 8 | ||
| git checkout "$MAIN_BRANCH" | ||
| git pull origin "$MAIN_BRANCH" | ||
|
|
||
| # ─── PR 4: Frontend agent dead code removal ────────────────── | ||
| BRANCH="refactor/frontend-agent-cleanup" | ||
| echo "" | ||
| echo "══ PR 4: $BRANCH ══" | ||
| git checkout -b "$BRANCH" "$MAIN_BRANCH" 2>/dev/null || git checkout "$BRANCH" | ||
| git checkout "$FEATURE_BRANCH" -- agents/frontend_agent.py | ||
| git add -A | ||
| git commit -m "refactor(frontend): remove dead backend code from FrontendAgent | ||
|
|
||
| - Removed BACKEND_SYSTEM_PROMPT constant (copy-paste from backend_agent) | ||
| - Removed _generate_backend() method (~50 lines of dead code) | ||
| - Fixed module docstring to describe frontend-only scope" || echo "Already committed" | ||
| git push origin "$BRANCH" --force | ||
| gh pr create \ | ||
| --title "refactor(frontend): remove dead backend code from FrontendAgent" \ | ||
| --body "## What | ||
| Removed \`BACKEND_SYSTEM_PROMPT\` and \`_generate_backend()\` from \`frontend_agent.py\`. | ||
|
|
||
| ## Impact | ||
| ~65 lines of dead code removed. No behavior change." \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 4..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch | ||
| sleep 8 | ||
| git checkout "$MAIN_BRANCH" | ||
| git pull origin "$MAIN_BRANCH" | ||
|
|
||
| # ─── PR 5: Agent exports + test __init__.py ────────────────── | ||
| BRANCH="chore/agent-exports-test-init" | ||
| echo "" | ||
| echo "══ PR 5: $BRANCH ══" | ||
| git checkout -b "$BRANCH" "$MAIN_BRANCH" 2>/dev/null || git checkout "$BRANCH" | ||
| git checkout "$FEATURE_BRANCH" -- \ | ||
| agents/__init__.py \ | ||
| tests/__init__.py \ | ||
| tests/unit/__init__.py \ | ||
| tests/integration/__init__.py | ||
| git add -A | ||
| git commit -m "chore: update agents/__init__.py exports + add test __init__.py files | ||
|
|
||
| - Export AgentEvent, ReasoningChain, ReasoningStep, and Pydantic schemas | ||
| - Add __init__.py to tests/, tests/unit/, tests/integration/" || echo "Already committed" | ||
| git push origin "$BRANCH" --force | ||
| gh pr create \ | ||
| --title "chore: update agent exports + test package init files" \ | ||
| --body "## What | ||
| 1. **\`agents/__init__.py\`**: Added exports for \`AgentEvent\`, \`ReasoningChain\`, schemas | ||
| 2. **Test \`__init__.py\`**: Prevents \`ModuleNotFoundError\` in some pytest configs" \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 5..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch | ||
| sleep 8 | ||
| git checkout "$MAIN_BRANCH" | ||
| git pull origin "$MAIN_BRANCH" | ||
|
|
||
| # ─── PR 6: Project hygiene ─────────────────────────────────── | ||
| BRANCH="chore/project-hygiene" | ||
| echo "" | ||
| echo "══ PR 6: $BRANCH ══" | ||
| git checkout -b "$BRANCH" "$MAIN_BRANCH" 2>/dev/null || git checkout "$BRANCH" | ||
| git checkout "$FEATURE_BRANCH" -- \ | ||
| .editorconfig \ | ||
| .gitignore \ | ||
| .github/workflows/python-ci.yml | ||
| git add -A | ||
| git commit -m "chore: add .editorconfig, update .gitignore, add CI coverage gate | ||
|
|
||
| - .editorconfig: enforce 4-space Python, 2-space YAML/JS | ||
| - .gitignore: add uvicorn.log, dashboard_build*.log patterns | ||
| - python-ci.yml: add --cov-fail-under=70" || echo "Already committed" | ||
| git push origin "$BRANCH" --force | ||
| gh pr create \ | ||
| --title "chore: .editorconfig + CI coverage gate + gitignore updates" \ | ||
| --body "## What | ||
| 1. **\`.editorconfig\`**: Consistent formatting rules | ||
| 2. **\`.gitignore\`**: Added log file patterns | ||
| 3. **\`python-ci.yml\`**: Added \`--cov-fail-under=70\` coverage gate | ||
|
|
||
| ## Impact | ||
| No code changes. CI now enforces minimum 70% test coverage." \ | ||
| --base "$MAIN_BRANCH" \ | ||
| --head "$BRANCH" || echo "PR already exists" | ||
| echo "Merging PR 6..." | ||
| gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch |
There was a problem hiding this comment.
Remove checked-in admin auto-merge automation.
The script creates PRs and immediately merges them, using --admin first. That can bypass review/branch-protection controls for multiple branches from a local script. Prefer create-only automation and let normal CI/review gates merge the PRs.
Safer pattern
-echo "Merging PR 1..."
-gh pr merge "$BRANCH" --squash --delete-branch --admin || gh pr merge "$BRANCH" --squash --delete-branch
-sleep 8
-git checkout "$MAIN_BRANCH"
-git pull origin "$MAIN_BRANCH"
+echo "Created PR for $BRANCH. Wait for CI and merge through the normal review process."Apply the same removal to each PR block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/create_prs.sh` around lines 47 - 212, The script currently creates
PRs and then immediately merges them using gh pr merge with the --admin flag
(see the repeated "echo 'Merging PR ...'" blocks and gh pr merge "$BRANCH"
--squash --delete-branch --admin || gh pr merge ...), which bypasses review and
branch-protection; remove the auto-merge steps by deleting or commenting out the
gh pr merge invocations (and their fallback duplicates) and the surrounding
"Merging PR ..." echo/sleep/git checkout post-merge lines for each PR block
(references: occurrences of gh pr merge "$BRANCH" and the echo "Merging PR X..."
lines, and the final git checkout "$MAIN_BRANCH"/git pull calls tied to those
merges), leaving only gh pr create so PRs are created but not auto-merged so
normal CI/review gates can handle merges.
| # ------- Tier 1: Exact match ------- | ||
| exact_count = content.count(old_string) | ||
|
|
||
| if exact_count == 1 or (exact_count > 1 and replace_all): | ||
| new_content = content.replace( | ||
| old_string, new_string, -1 if replace_all else 1 | ||
| ) |
There was a problem hiding this comment.
Reject empty old_string before replacement.
content.count("") is non-zero, and content.replace("", new_string, -1) inserts text between every character when replace_all=True, corrupting the file.
🛡️ Proposed guard
+ if old_string == "":
+ return ToolResult(
+ success=False,
+ output="",
+ error="old_string must not be empty.",
+ )
+
# ------- Tier 1: Exact match -------
exact_count = content.count(old_string)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # ------- Tier 1: Exact match ------- | |
| exact_count = content.count(old_string) | |
| if exact_count == 1 or (exact_count > 1 and replace_all): | |
| new_content = content.replace( | |
| old_string, new_string, -1 if replace_all else 1 | |
| ) | |
| if old_string == "": | |
| return ToolResult( | |
| success=False, | |
| output="", | |
| error="old_string must not be empty.", | |
| ) | |
| # ------- Tier 1: Exact match ------- | |
| exact_count = content.count(old_string) | |
| if exact_count == 1 or (exact_count > 1 and replace_all): | |
| new_content = content.replace( | |
| old_string, new_string, -1 if replace_all else 1 | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/file_edit_tool.py` around lines 62 - 68, The code performs a
replacement using content.count(old_string) and content.replace(..., -1 if
replace_all else 1) without guarding against an empty old_string, which causes
content.replace("", ...) to insert text between every character; add an explicit
guard early (before computing exact_count or calling content.replace) that
rejects empty old_string (e.g., raise a ValueError or return an error) when
old_string == "" so functions using exact_count, content.count(old_string),
content.replace(...), replace_all and new_content never execute with empty
old_string.
| # ------- Tier 2: Normalized whitespace ------- | ||
| match = self._normalized_find(content, old_string) | ||
| if match is not None: | ||
| start, end = match | ||
| new_content = content[:start] + new_string + content[end:] | ||
| return self._write_and_respond( | ||
| full_path, | ||
| file_path, | ||
| content, | ||
| new_content, | ||
| match_count=1, | ||
| strategy="normalized_whitespace", | ||
| confidence=0.95, | ||
| ) | ||
|
|
||
| # ------- Tier 3: Line-level difflib ------- | ||
| match = self._difflib_find(content, old_string) | ||
| if match is not None: | ||
| start, end, ratio = match | ||
| new_content = content[:start] + new_string + content[end:] | ||
| return self._write_and_respond( | ||
| full_path, | ||
| file_path, | ||
| content, | ||
| new_content, | ||
| match_count=1, | ||
| strategy="difflib", | ||
| confidence=round(ratio, 4), | ||
| ) |
There was a problem hiding this comment.
Apply the same ambiguity safety to fuzzy matches.
Exact matching fails on multiple matches, but normalized and difflib tiers silently replace the first/best match. For surgical agent edits, ambiguous fuzzy candidates should fail unless the caller provides more context.
Require fuzzy helpers to return all above-threshold candidates, then reject len(matches) > 1 with an error asking for more surrounding context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/file_edit_tool.py` around lines 92 - 120, The normalized and difflib
fuzzy-match tiers currently take the first/ best candidate and apply edits;
change _normalized_find and _difflib_find to return all above-threshold
candidate spans (e.g., lists of (start,end) or (start,end,ratio)), then in the
caller logic (the block using _normalized_find and the block using
_difflib_find) check the returned list: if len(matches) == 0 continue to next
tier, if len(matches) > 1 return an error/abort asking the caller to provide
more surrounding context (don’t perform any replacement), and only when
len(matches) == 1 proceed to build new_content and call _write_and_respond (use
the single match tuple and preserve using confidence from ratio for difflib).
Ensure _write_and_respond is used unchanged for the single-match path.
Hardening & Optimization Phase
awaitto_scrub_textinbase_agent.py.budget:{project_id}:used).SagaCoordinatorto use Redis for storage (persists through restarts).*default inconfig.go.infra,bus,core,agents,obs) for modular startup.docker-compose.yml.Makefilewith modular start commands.This PR addresses all critical blockers identified in the local scorecard and optimizes the 22-service stack for better resource management.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation