The Nerves of the AGI OS β training, fine-tuning, experience collection, and model deployment
JARVIS Reactor (Reactor-Core) is the training and learning layer of the JARVIS AGI ecosystem. It provides ML training (DPO, RLHF, curriculum, meta-learning, world models, causal reasoning), model serving with hot-reload, experience collection from JARVIS Body, model deployment to JARVIS-Prime, and Trinity Protocol integration for cross-repo coordination. As of v244.0 (JARVIS Body-side), command lifecycle events now flow through TrinityEventBus providing richer training signals (intent, domain, execution outcomes) for DPO pair generation, and brain vacuum fallback properly classifies commands during J-Prime downtime (producing valid telemetry even during outages). It is started either standalone (run_reactor.py) or by the unified supervisor in JARVIS (python3 unified_supervisor.py).
This session extends Reactor-Core's Trinity ingestion surface so biometric unlock outcomes become first-class learning signals instead of being dropped or collapsed into generic events.
backend/core/telemetry/events.py now defines three unlock-specific events:
VOICE_UNLOCK_GRANTEDVOICE_UNLOCK_DENIEDVOICE_UNLOCK_ROUTING
These events distinguish route correctness from auth outcome, enabling cleaner causal analysis in training data.
backend/api/unified_command_processor.py::_handle_voice_unlock_action() now emits auth outcome telemetry after each unlock attempt using fire-and-forget semantics.
- Preserves unlock UX latency while still producing training telemetry.
- Captures both positive and negative authentication outcomes.
reactor_core/ingestion/telemetry_ingestor.py now maps unlock event types to InteractionOutcome categories used by the training pipeline.
This allows unlock routing/auth behavior to be represented in downstream datasets with consistent labels rather than ad hoc parsing.
With unlock-specific event separation, Reactor can now support:
- Routing quality analysis: detect cases where unlock intent took non-unlock routes before eventual correction.
- Auth outcome baselines: track unlock grant/deny trends over time.
- Preference data hygiene: avoid mixing biometric events into unrelated workspace/general quality metrics.
Cross-repo routing nuance tests for unlock phrasing completed with 50/50 pass rate this session, indicating stable end-to-end routing for tested command variants.
| Role | Repository | Responsibility |
|---|---|---|
| Body | JARVIS (JARVIS-AI-Agent) | macOS integration, computer use, unified supervisor, voice/vision |
| Mind | JARVIS-Prime | LLM inference, Neural Orchestrator Core, OpenAI-compatible API |
| Nerves | Reactor-Core (this repo) | Training, fine-tuning, experience collection, model deployment, Trinity coordination |
Reactor-Core is the nervous system: it trains and improves models, collects experience from JARVIS, and deploys models to JARVIS-Prime. The unified supervisor in JARVIS discovers and starts Reactor-Core (default port 8090) alongside JARVIS-Prime (8000) and the JARVIS backend (8010).
JARVIS Reactor is a production-grade ML infrastructure combining:
- Advanced Training Methods: DPO, RLHF, Constitutional AI, Curriculum Learning, Meta-Learning, World Models, Causal Reasoning
- Model Serving: Hot-reload model server with multi-backend support (vLLM, llama.cpp, MLX, Transformers)
- Async Infrastructure: Circuit breakers, backpressure, bulkheads, dead letter queues, structured concurrency
- API Platform: FastAPI server with telemetry, scheduling, model registry, health monitoring
- Trinity Orchestration: Multi-repo coordination with heartbeat monitoring and state sync
- Event Streaming: Real-time WebSocket/Redis pub-sub across JARVIS ecosystem
- GCP Integration: Spot VM resilience, Cloud SQL storage, auto-checkpointing
- MLForge C++ Core: High-performance ML primitives (optional submodule)
- Unified Supervisor: One-command startup for entire AGI OS ecosystem (
python3 run_supervisor.py) - Connection Pooling: Efficient HTTP/Redis connection management with automatic lifecycle
- Dynamic Configuration: Zero hardcoding, XDG-compliant paths, environment-driven config
- Structured Concurrency: Python 3.11+ TaskGroup patterns for robust async operations
- Architecture
- Key Features
- Installation
- Quick Start
- Unified Supervisor (One-Command Startup)
- Advanced Features
- Integration Architecture
- Configuration Guide
- API Documentation
- Troubleshooting
- Development Guide
- Version History
- Roadmap
- Links
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGI OS UNIFIED SUPERVISOR v92.0 β
β (Central Coordination Hub) β
β python3 run_supervisor.py β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ βββββββββββββββ βββββββββββ
β JARVIS ββββββββΊβ TRINITY ββββββββΊβ J-PRIME β
β (Body) βEvents β ORCHESTRATORβEvents β (Mind) β
β β β β β β
β macOS β β Heartbeats β β LLM β
β Actions β β Commands β β Inference
βββββββββββ β State Sync β βββββββββββ
βββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
βREACTOR CORE β β ONLINE β β DISTRIBUTED β
β (Nerves) β β LEARNING β β TRAINING β
β β β β β β
β Training β β Experience β β Multi-VM β
β Learning β β Replay β β Gradient β
β Serving β β EWC/Drift β β Sync β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REACTOR CORE v2.10.0 β
β (AGI OS Nervous System) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UNIFIED API SERVER (v77.0) β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ β β
β β β Telemetry β β Night β β Model β β β
β β β Collector β β Scheduler β β Registry β β β
β β β + WebSocket β β + Cron Jobs β β + A/B Testing β β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ β β
β β βββββββββββββββ βββββββββββββββ β β
β β β Health β β Cost β β β
β β β Aggregator β β Tracker β β β
β β βββββββββββββββ βββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β HOT-RELOAD MODEL SERVER (v77.1) β β
β β β’ Multi-backend: vLLM, llama.cpp, MLX, Transformers β β
β β β’ Zero-downtime model swaps via file watcher β β
β β β’ LRU cache with memory-aware eviction β β
β β β’ Priority request queue for SLA compliance β β
β β β’ Semantic response caching (hash-based deduplication) β β
β β β’ Circuit breaker for backend failure protection β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ADVANCED TRAINING ENGINE (v76.0-v80.0) β β
β β β β
β β Experience Buffer β Data Selector β Training Router β β
β β β β β
β β βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ β β
β β β β β β
β β βΌ βΌ βΌ β β
β β DPO Trainer RLHF Pipeline Constitutional AI β β
β β β’ Preference β’ PPO Algorithm β’ Self-supervisedβ β
β β Learning β’ Reward Modeling β’ Safety β β
β β β’ Memory Efficient β’ Value Functions β’ Alignment β β
β β β β
β β Curriculum Learning Meta-Learning World Models β β
β β β’ Progressive β’ MAML/Reptile β’ Latent dynamics β β
β β difficulty β’ Few-shot learning β’ Planning β β
β β β’ Adaptive β’ Task adaptation β’ Counterfactual β β
β β scheduling reasoning β β
β β β β
β β Causal Reasoning FSDP Training Federated Learningβ β
β β β’ SCMs β’ Multi-GPU/Node β’ Cross-repo β β
β β β’ Do-calculus β’ Gradient sharding β’ Byzantine-robustβ β
β β β’ Causal discovery β’ Memory efficient β’ Privacy-preservingβ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ASYNC INFRASTRUCTURE (v76.1, v92.0) β β
β β β’ CircuitBreaker β’ Backpressure β’ DeadLetterQueue β β
β β β’ Bulkhead β’ HealthMonitor β’ AdaptiveRateLimiterβ β
β β β’ TimeoutPolicy β’ MetricsCollector β’ AsyncRetry β β
β β β’ StructuredTaskGroup β’ ConnectionPool β’ AsyncBarrier β β
β β β’ AsyncContextGroup β’ AsyncLatch β’ ScatterGather β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TRINITY ORCHESTRATOR (v75.0) β β
β β β’ Multi-repo heartbeat monitoring (JARVIS, Prime, Reactor) β β
β β β’ Command routing with intelligent load balancing β β
β β β’ State reconciliation across distributed system β β
β β β’ Dead Letter Queue for failed commands with auto-retry β β
β β β’ Atomic file I/O (zero-corruption operations) β β
β β β’ Circuit breakers for fault tolerance β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ONLINE LEARNING & DATA VERSIONING (v91.0) β β
β β β’ Prioritized experience replay with importance sampling β β
β β β’ Elastic Weight Consolidation (EWC) - prevents forgetting β β
β β β’ Concept Drift Detection (Page-Hinkley test) β β
β β β’ Data Versioning: Content-addressed storage (DVC compatible)β β
β β β’ Lineage tracking and reproducibility β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β DISTRIBUTED TRAINING (v91.0) β β
β β β’ Multi-VM coordination with gradient compression β β
β β β’ GCP Spot VM checkpointing with predictive preemption β β
β β β’ Dynamic resource allocation with cost-aware decisions β β
β β β’ Gradient checksum validation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EVENT STREAMING (v10.3) β β
β β β’ WebSocket real-time events with priority queues β β
β β β’ Redis pub/sub (optional) for scale β β
β β β’ Safety audit trail with kill switch β β
β β β’ Cost tracking & budget alerts β β
β β β’ Multi-transport: WebSocket, file-watching, Redis β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βΌ βΌ βΌ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β MLForge C++ β β Cloud SQL β β GCP Storage β β
β β (Optional) β β (Events DB) β β(Checkpoints) β β
β β pybind11 β β PostgreSQL β β GCS Bucket β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
JARVIS-Reactor/
βββ reactor_core/
β βββ training/ # Advanced training methods
β β βββ advanced_training.py # DPO, RLHF, Constitutional AI (2,899 lines)
β β βββ unified_pipeline.py # End-to-end training orchestration
β β βββ trainer.py # Base trainer class
β β βββ lora.py # LoRA/QLoRA implementations
β β
β βββ serving/ # Model serving infrastructure
β β βββ model_server.py # Hot-reload model server (1,545 lines)
β β βββ inference_engine.py # Multi-backend inference (1,891 lines)
β β
β βββ api/ # REST API server
β β βββ server.py # FastAPI endpoints (2,252 lines)
β β βββ telemetry.py # Metrics & observability (1,128 lines)
β β βββ scheduler.py # Night Shift scheduler (1,030 lines)
β β βββ model_registry.py # Model versioning (1,301 lines)
β β βββ health_aggregator.py # Health monitoring (999 lines)
β β
β βββ orchestration/ # Trinity coordination
β β βββ trinity_orchestrator.py # Multi-repo orchestrator
β β
β βββ utils/ # Core utilities
β β βββ async_helpers.py # Async patterns (1,746 lines)
β β βββ dependencies.py # Dependency injection (913 lines)
β β
β βββ integration/ # Cross-repo integration
β β βββ event_bridge.py # Event streaming
β β βββ cost_bridge.py # Cost tracking
β β βββ jarvis_connector.py # JARVIS integration
β β βββ prime_connector.py # Prime integration
β β
β βββ eval/ # Model evaluation
β β βββ advanced_evaluation.py # Comprehensive eval suite (1,536 lines)
β β
β βββ data/ # Data loading & preprocessing
β βββ gcp/ # GCP Spot VM support
β βββ config/ # Configuration management
β
βββ run_supervisor.py # AGI OS unified supervisor (1,635 lines)
βββ mlforge/ # C++ ML core (submodule)
βββ docker/ # Docker configurations
βββ scripts/ # Utility scripts
βββ tests/ # Test suite
Total: ~18,996+ lines of production code added in v75.0-v77.1
- DPO (Direct Preference Optimization): Preference learning without reward models
- RLHF (Reinforcement Learning from Human Feedback): Full PPO pipeline
- Constitutional AI: Self-supervised safety alignment
- Curriculum Learning: Progressive difficulty scheduling
- Memory Management: Dynamic batch sizing, gradient checkpointing, CPU offloading
- FSDP Support: Fully Sharded Data Parallel for large models
- Experience Replay: Priority-based sampling from interaction logs
- CircuitBreaker: Automatic failure detection and recovery
- Backpressure: Adaptive load management with queue shedding
- Bulkhead: Failure isolation between components
- DeadLetterQueue: Failed operation tracking and replay
- HealthMonitor: Real-time component health tracking
- AdaptiveRateLimiter: Dynamic rate limiting based on success rates
- TimeoutPolicy: Configurable timeouts with fallback strategies
- MetricsCollector: Comprehensive observability
- FastAPI Server: Production-grade REST API with auto-docs
- Telemetry Collector: Real-time metrics ingestion with WebSocket streaming
- Night Shift Scheduler: Automated training during off-peak hours
- Model Registry: Version management, A/B testing, rollback support
- Health Aggregator: Multi-service health dashboard
- Cost Tracking: Budget alerts and spend analytics
- WebSocket Events: Real-time training progress streaming
- Hot-Reload: Zero-downtime model updates via file watcher
- Multi-Backend Support: vLLM, llama.cpp, MLX, Transformers
- LRU Model Cache: Memory-aware model eviction
- Priority Queue: Request prioritization for SLA compliance
- Semantic Caching: Hash-based response deduplication
- Circuit Breaker: Backend failure protection
- Async Loading: Non-blocking model initialization
- Version Management: Seamless model version switching
- Multi-Repo Coordination: Heartbeat monitoring across JARVIS, Prime, Reactor
- Command Routing: Intelligent load balancing with priority queues
- State Reconciliation: Consistent state across distributed system
- Dead Letter Queue: Failed command tracking and retry
- Atomic File I/O: Zero-corruption file operations (v73.0)
- Self-Heartbeat: Liveness monitoring (v72.0)
- Circuit Breakers: Fault tolerance with automatic recovery
- WebSocket Streaming: Real-time event broadcasting
- Redis Pub/Sub: Optional Redis backend for scale
- Event Deduplication: Hash-based duplicate prevention
- Priority System: Safety-critical event prioritization
- Safety Audit Trail: Comprehensive action logging
- Cost Events: Budget tracking with alerts
- Multi-Transport: WebSocket, file-watching, Redis
- Spot VM Resilience: Auto-resume from preemption
- Cloud SQL Storage: Event and metric persistence
- GCS Checkpointing: Distributed checkpoint storage
- Auto-Detection: M1 local vs GCP remote environment detection
pip install jarvis-reactor# Clone with submodules
git clone --recursive https://github.com/drussell23/JARVIS-Reactor.git
cd JARVIS-Reactor
# Install dependencies (requires CMake and pybind11)
pip install pybind11 cmake
# Build and install
pip install -e .# For local development (M1 Mac)
pip install jarvis-reactor[local]
# For GCP training (32GB+ VM)
pip install jarvis-reactor[gcp]
# For full development (includes testing, linting, docs)
pip install -e ".[dev]"# Build Docker image
docker-compose build
# Run API server
docker-compose up api
# Run model server
docker-compose up model-server
# Run unified supervisor
docker-compose up supervisor# From JARVIS-AI-Agent repo β starts Body + Prime + Reactor-Core
cd /path/to/JARVIS-AI-Agent
python3 unified_supervisor.pyReactor-Core will start on port 8090 and register with Trinity. Health: http://localhost:8090/health.
# From Reactor-Core repo
cd /path/to/Reactor-Core
python3 run_reactor.py --port 8090from reactor_core import Trainer, TrainingConfig
from reactor_core.gcp import SpotVMCheckpointer
# Configure training
config = TrainingConfig(
model_name="llama-2-7b",
use_lora=True,
lora_rank=16,
num_epochs=3,
batch_size=4,
gradient_checkpointing=True,
)
# Auto-detect environment (M1 local vs GCP remote)
trainer = Trainer(config)
# Train with auto-resume on Spot VM preemption
trainer.train("./data/train.jsonl")from reactor_core.training.advanced_training import (
DPOTrainer,
DPOConfig,
PreferenceDataset,
)
# Configure DPO
dpo_config = DPOConfig(
model_name="llama-2-7b",
beta=0.1, # KL divergence penalty
learning_rate=5e-7,
max_length=512,
batch_size=4,
)
# Initialize DPO trainer
dpo_trainer = DPOTrainer(dpo_config)
# Train on preference pairs
await dpo_trainer.train(
preference_dataset=PreferenceDataset(
chosen_responses=chosen_data,
rejected_responses=rejected_data,
),
num_epochs=3,
)from reactor_core.serving.model_server import ModelServer, ModelServerConfig
# Configure model server
config = ModelServerConfig(
models_dir="/path/to/models",
enable_hot_reload=True,
backend="vllm", # or "transformers", "llamacpp", "mlx"
max_cached_models=3,
)
# Initialize server
server = ModelServer(config)
await server.start()
# Serve inference requests
response = await server.predict(
prompt="What is machine learning?",
model_id="llama-2-7b",
max_tokens=256,
)
print(response.text)
# Hot-reload: Just update the model file, server auto-reloads!# Start API server
uvicorn reactor_core.api.server:app --host 0.0.0.0 --port 8003 --reloadimport requests
# Trigger training via API
response = requests.post(
"http://localhost:8003/training/trigger",
json={
"model_name": "llama-2-7b",
"training_type": "dpo",
"config": {
"num_epochs": 3,
"batch_size": 4,
"learning_rate": 5e-7,
},
},
)
# Schedule nightly training
response = requests.post(
"http://localhost:8003/scheduler/schedule",
json={
"name": "nightly_dpo_training",
"schedule_type": "cron",
"cron_expression": "0 2 * * *", # 2 AM daily
"job_config": {
"training_type": "dpo",
"model_name": "llama-2-7b",
},
},
)from reactor_core.orchestration.trinity_orchestrator import (
initialize_orchestrator,
get_orchestrator,
)
# Initialize orchestrator
orchestrator = await initialize_orchestrator()
# Dispatch command to JARVIS/Prime
await orchestrator.dispatch_command(
intent="start_surveillance",
payload={
"app_name": "Chrome",
"trigger_text": "bouncing ball",
},
target_components=["jarvis"],
)
# Check component health
health = await orchestrator.get_health_status()
print(f"JARVIS: {health['jarvis'].status}")
print(f"Prime: {health['prime'].status}")
print(f"Reactor: {health['reactor'].status}")| Entry Point | Purpose | When to Use |
|---|---|---|
| Unified Supervisor (JARVIS) | python3 unified_supervisor.py in JARVIS-AI-Agent |
Recommended β starts Body + Prime + Reactor-Core with Trinity coordination; discovers Reactor via REACTOR_CORE_REPO_PATH or default path |
run_reactor.py |
Trinity-integrated Reactor entry point | Standalone Reactor, or when supervisor calls it (e.g. python3 run_reactor.py --port 8090) |
run_supervisor.py (in this repo) |
Legacy/alternative supervisor in Reactor repo | When running orchestration from the Reactor repo instead of JARVIS |
The unified supervisor lives in JARVIS-AI-Agent. It starts Reactor-Core by running run_reactor.py (or the configured script) in this repo, typically on port 8090. Reactor exposes /health for supervisor health checks and Trinity state sync.
The unified supervisor is in JARVIS-AI-Agent (unified_supervisor.py). It is the single entry point for the entire AGI OS ecosystem and automatically discovers, starts, and coordinates JARVIS (Body), JARVIS-Prime (Mind), and Reactor-Core (Nerves).
# From JARVIS-AI-Agent repo β start entire AGI OS ecosystem (recommended)
python3 unified_supervisor.py
# With options (see JARVIS-AI-Agent unified_supervisor.py for full CLI)
# python3 unified_supervisor.py --mode supervisor --skip-trinity ...What the Supervisor Does (in JARVIS-AI-Agent):
- Component Discovery: Automatically finds JARVIS, JARVIS Prime, and Reactor Core repos
- Health Monitoring: Continuous health checks with automatic recovery
- Event Bridge: Sets up real-time event streaming between components
- Trinity Orchestration: Initializes multi-repo coordination
- Service Startup: Starts all Reactor Core services (API, Model Server, Training, etc.)
- Experience Collection: Continuous learning from JARVIS interactions
- Graceful Shutdown: Clean shutdown of all components on Ctrl+C
Startup Phases:
Phase 1: Initialize Trinity Orchestrator
Phase 2: Initialize Event Bridge
Phase 3: Discover Components
Phase 4: Start Reactor Core Services
Phase 5: Initialize v91.0 Advanced Services
Phase 6: Start JARVIS (Body)
Phase 7: Start J-Prime (Mind)
Phase 8: Start Background Tasks
Phase 9: Wait for Component Health
Output Example:
======================================================================
AGI OS UNIFIED SUPERVISOR - PROJECT TRINITY
======================================================================
[Phase 1] Initializing Trinity Orchestrator...
[OK] Trinity Orchestrator running
[Phase 2] Initializing Event Bridge...
[OK] Event Bridge running
[Phase 3] Discovering components...
Found JARVIS at /path/to/JARVIS-AI-Agent
Found J-Prime at /path/to/jarvis-prime
Reactor Core at /path/to/reactor-core
[Phase 4] Starting Reactor Core services...
[OK] Telemetry Collector started
[OK] Model Registry initialized (5 models)
[OK] Health Aggregator started
[OK] Scheduler started (daily/weekly training)
[OK] Model Server started
[Phase 5] Initializing v91.0 Advanced Services...
[OK] Online Learning Engine started
[OK] Distributed Coordinator started
[OK] Data Version Controller started
[OK] Spot VM Checkpointer started
[Phase 6] Starting JARVIS (Body)...
[OK] JARVIS started (PID: 12345)
[Phase 7] Starting J-Prime (Mind)...
[OK] J-Prime started (PID: 12346)
[Phase 8] Starting background services...
[OK] Health monitoring started
[OK] Experience collection started
[OK] Event processing started
[Phase 9] Waiting for component health...
======================================================================
AGI OS READY - All Systems Operational
======================================================================
Component Status:
JARVIS: β
Running (http://localhost:8000)
J-Prime: β
Running (http://localhost:8001)
Reactor API: β
Running (http://localhost:8003)
Model Server: β
Running (http://localhost:8004)
Background Services:
Health Monitor: β
Active
Experience Collector: β
Active (0 experiences collected)
Event Processor: β
Active
Trinity Experience Receiver: β
Active
Press Ctrl+C to shutdown gracefully...
Train models on preference pairs without reward models:
from reactor_core.training.advanced_training import DPOTrainer, DPOConfig
config = DPOConfig(
model_name="llama-2-7b",
beta=0.1, # KL divergence penalty
learning_rate=5e-7,
max_length=512,
)
trainer = DPOTrainer(config)
await trainer.train(
preference_dataset=PreferenceDataset(
chosen_responses=chosen_data,
rejected_responses=rejected_data,
),
num_epochs=3,
)Variants Supported:
- Standard DPO
- IPO (Identity Preference Optimization)
- KTO (Kahneman-Tversky Optimization)
- ORPO (Odds Ratio Preference Optimization)
Full PPO pipeline with reward modeling:
from reactor_core.training.advanced_training import RLHFTrainer, RLHFConfig
config = RLHFConfig(
model_name="llama-2-7b",
reward_model_name="reward-model",
ppo_config={
"clip_epsilon": 0.2,
"value_coef": 0.1,
"entropy_coef": 0.01,
},
)
trainer = RLHFTrainer(config)
await trainer.train(
preference_dataset=preference_data,
num_epochs=3,
)Progressive difficulty scheduling for faster convergence:
from reactor_core.training.curriculum_learning import CurriculumLearner
curriculum = CurriculumLearner(
model=model,
dataset=dataset,
difficulty_metric="perplexity",
progression_strategy="exponential", # or "linear", "adaptive"
)
# Automatic difficulty progression
await curriculum.train(num_epochs=10)Benefits: 30-50% faster convergence, better generalization
Few-shot learning with MAML, Reptile, Meta-SGD:
from reactor_core.training.meta_learning import MAMLTrainer
maml = MAMLTrainer(
model=model,
inner_lr=0.01,
outer_lr=0.001,
adaptation_steps=5,
)
# Learn to learn from few examples
await maml.meta_train(
tasks=task_distribution,
meta_batch_size=4,
num_meta_iterations=1000,
)Learn latent dynamics for planning and counterfactual reasoning:
from reactor_core.training.world_model_training import WorldModelTrainer
world_model = WorldModelTrainer(
latent_dim=512,
action_dim=128,
reward_dim=1,
)
await world_model.train(
trajectories=trajectory_data,
num_epochs=100,
)
# Counterfactual reasoning: "What if I had done X?"
counterfactual = await world_model.imagine_rollout(
initial_state=state,
alternative_action=action,
horizon=10,
)Understand cause-effect relationships:
from reactor_core.training.causal_reasoning import CausalReasoner
reasoner = CausalReasoner(
model=model,
causal_graph=graph,
)
# Do-calculus: P(Y | do(X))
interventional_prob = await reasoner.interventional_inference(
intervention={"X": value},
query="Y",
)
# Causal discovery
discovered_graph = await reasoner.discover_causality(data)Python 3.11+ compatible structured concurrency with TaskGroup:
from reactor_core.utils.async_helpers import StructuredTaskGroup, run_in_task_group
# Structured task group with automatic error handling
async with StructuredTaskGroup(
name="training_pipeline",
max_concurrent=5,
cancel_on_error=True,
timeout_seconds=3600.0,
) as tg:
tg.create_task(load_data(), name="data_loading")
tg.create_task(preprocess_data(), name="preprocessing")
tg.create_task(train_model(), name="training")
tg.create_task(validate_model(), name="validation")
# Get results
results = tg.results
for result in results:
if result.success:
print(f"{result.name}: {result.result}")
else:
print(f"{result.name}: {result.exception}")
# Convenience function
results = await run_in_task_group(
[fetch_url(url) for url in urls],
names=[f"fetch_{i}" for i in range(len(urls))],
max_concurrent=10,
)Efficient HTTP and Redis connection management:
from reactor_core.config.unified_config import (
HTTPConnectionPool,
RedisConnectionPool,
ConnectionPoolConfig,
)
# HTTP connection pool
pool = await HTTPConnectionPool.get_instance("api_client")
async with pool.request("GET", "https://api.example.com/data") as response:
data = await response.json()
# Redis connection pool
redis_pool = await RedisConnectionPool.get_instance()
client = await redis_pool.get_client(host="localhost", port=6379)
await client.set("key", "value")Features:
- Singleton pattern per configuration
- Automatic session lifecycle management
- Connection reuse with keepalive
- Configurable pool sizes via environment variables
Automatic failure detection and recovery:
from reactor_core.utils.async_helpers import CircuitBreaker
breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=60.0,
half_open_max_calls=3,
)
@breaker.protect
async def risky_operation():
# This will be protected by circuit breaker
return await external_api_call()
# Circuit states: CLOSED β OPEN β HALF_OPEN β CLOSEDPrevents memory exhaustion under high load:
from reactor_core.utils.async_helpers import BackpressureController
controller = BackpressureController(
max_queue_size=1000,
queue_full_strategy="reject", # or "block", "drop_oldest"
)
async def process_item(item):
await controller.acquire()
try:
await process(item)
finally:
controller.release()Failed operation tracking and automatic retry:
from reactor_core.utils.async_helpers import DeadLetterQueue
dlq = DeadLetterQueue(
name="training_operations",
persist_path=Path("/tmp/dlq"),
auto_retry_interval=300.0, # Retry every 5 minutes
)
# Register operation for retry
dlq.register_operation("publish_model_ready", publish_model_ready_func)
# Add failed operation
await dlq.add(
operation="publish_model_ready",
args=(model_name, model_path),
kwargs={},
exception=exception,
)
# Automatic retry with exponential backoffLearn continuously from JARVIS interactions:
from reactor_core.training.online_learning import OnlineLearningEngine
engine = OnlineLearningEngine(
buffer_size=100000,
importance_sampling=True,
ewc_lambda=100.0, # Elastic Weight Consolidation
)
# Add experiences from JARVIS
await engine.add_experience({
"user_input": "Hello",
"assistant_output": "Hi there!",
"feedback": "positive",
})
# Trigger incremental update
await engine.incremental_update(
model=model,
batch_size=32,
num_steps=100,
)Automatic model adaptation when data distribution changes:
from reactor_core.training.online_learning import DriftDetector
detector = DriftDetector(
threshold=0.1,
window_size=1000,
test_type="page_hinkley", # or "adwin", "kswin"
)
# Monitor for drift
drift_detected = await detector.check_drift(
current_batch=recent_data,
reference_batch=historical_data,
)
if drift_detected:
# Trigger model retraining
await retrain_model()Content-addressed storage with lineage tracking:
from reactor_core.data.versioning import DataVersionController
controller = DataVersionController(
version_store_path=Path("/data/versions"),
)
# Version a dataset
version = await controller.create_version(
dataset_path=Path("/data/train.jsonl"),
metadata={"source": "jarvis_interactions", "date": "2025-01-15"},
)
# Get version lineage
lineage = await controller.get_lineage(version.id)
print(f"Version {version.id} derived from {lineage.parent_id}")
# Reproduce exact dataset
dataset = await controller.load_version(version.id)Train across multiple GCP Spot VMs with gradient compression:
from reactor_core.training.distributed_coordinator import DistributedCoordinator
coordinator = DistributedCoordinator(
num_workers=8,
gradient_compression="fp16", # or "int8", "sparse"
checkpoint_interval=300, # seconds
)
# Start distributed training
await coordinator.start_training(
model=model,
dataset=dataset,
num_epochs=10,
)
# Automatic checkpoint/resume on VM preemptionPredictive preemption detection and automatic resume:
from reactor_core.gcp.checkpointer import SpotVMCheckpointer
checkpointer = SpotVMCheckpointer(
gcs_bucket="my-checkpoints",
checkpoint_interval=300,
enable_preemption_prediction=True,
)
# Automatic checkpointing during training
async with checkpointer.protect_training():
await train_model()
# Resume from latest checkpoint
await checkpointer.resume_training()Preemption Signals Monitored:
- GCP metadata API warnings
- System load spikes
- Network latency increases
- Memory pressure indicators
Reactor-Core is the Nerves in the three-repo Trinity architecture. It is started and monitored by the JARVIS unified supervisor and coordinates with JARVIS-Prime for inference and model deployment.
How JARVIS (Body) uses Reactor-Core:
- Discovery: Supervisor resolves
REACTOR_CORE_REPO_PATH(or default~/Documents/repos/Reactor-Core). - Startup: Supervisor runs
run_reactor.py(or configured script) with port 8090; Reactor starts HTTP server and health endpoint. - Health: Supervisor polls
GET /healthon port 8090; Reactor reports training readiness and Trinity connection state. - State: Reactor reads/writes shared state under
~/.jarvis/(e.g. Trinity state, experience queue) for coordination.
How Reactor-Core uses JARVIS-Prime:
- Inference: Reactor can call Primeβs OpenAI-compatible API for generation during training, evaluation, or distillation.
- Model deployment: Trained/updated models can be deployed to Prime (e.g. hot swap, model registry).
- Trinity Protocol: Events and heartbeats flow via file IPC and/or WebSocket; Reactor participates in Trinity state sync and experience collection from JARVIS Body.
- DPO Training from Telemetry (v238.0+): JARVIS Body's
TelemetryEmittercaptures every interaction β query, complexity classification, response, latency, and source. Reactor-Core uses this telemetry to build DPO preference pairs (e.g., chosen:"10", rejected:"Of course, the sum of five and five is ten...") for fine-tuning Mistral-7B. This training loop makes the v236.0/v238.0 adaptive prompt system's conciseness enforcement permanent β encoding terse-vs-detailed behavior in the model's weights instead of relying on prompt instructions. See the JARVIS-Prime README for the full training loop architecture. - Voice Conversation Training Data (v238.0+): JARVIS Body's real-time voice conversation pipeline generates a new class of training data: multi-turn conversation traces (20-turn sliding window sessions), barge-in events (proxy for response quality), turn detection accuracy logs, and conversation-mode-specific telemetry (
session_type: "conversation",time_to_first_audio_ms,barge_in_count). This data enables conversational DPO pairs (sustained-engagement vs. conversation-ending responses), conciseness training (shorter responses preferred in voice mode), and turn-detection classifier training (replacing the heuristic V1 with an ML-based V2). See v248.0 roadmap below. - Autonomy Event Ingestion (Phase 2): JARVIS Body emits 7 canonical autonomy lifecycle events (
intent_written,committed,failed,policy_denied,deduplicated,superseded,no_journal_lease) through the existing experience forwarder. Reactor ingests these viaAutonomyEventIngestorwith strict validation (7 required metadata keys), composite-key deduplication (50K window), and disk-based quarantine for malformed events. The centralizedAutonomyEventClassifiermaps each event type to a training label β onlycommittedandfailedfeed the training pipeline; infrastructure events (policy_denied,no_journal_lease) are excluded viaInteractionOutcome.INFRASTRUCTURE.
Reactor-Core serves as the ingestion and classification layer for autonomy events. It ensures only well-formed, non-duplicate, trainable events reach the training pipeline.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REACTOR AUTONOMY ROLE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Inbound (from Body via ExperienceForwarder): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ExperienceEvent (type=METRIC) β β
β β .metadata = { β β
β β autonomy_event_type: "committed", β β
β β autonomy_schema_version: "1.0", β β
β β idempotency_key: "...", β β
β β trace_id: "...", β β
β β correlation_id: "...", β β
β β action: "workspace:send_email", β β
β β request_kind: "autonomous" β β
β β } β β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AutonomyEventIngestor β β
β β β β
β β Step 1: VALIDATE β β
β β ββ Check 7 required keys present β β
β β ββ Verify event_type β known set β β
β β ββ Verify schema_version supported β β
β β ββ Reject β quarantine to disk β β
β β β β
β β Step 2: DEDUPLICATE β β
β β ββ Composite key: (idempotency_key, β β
β β β autonomy_event_type, trace_id) β β
β β ββ 50K sliding window β β
β β ββ Duplicate β skip silently β β
β β β β
β β Step 3: CLASSIFY β β
β β ββ AutonomyEventClassifier.classify() β β
β β ββ committed β POSITIVE (trainable=true) β β
β β ββ failed β NEGATIVE (trainable=true) β β
β β ββ policy_denied β INFRASTRUCTURE (false) β β
β β ββ no_journal_lease β INFRASTRUCTURE (false)β β
β β ββ deduplicated β NEUTRAL (false) β β
β β ββ intent_written β NEUTRAL (false) β β
β β ββ superseded β NEUTRAL (false) β β
β β β β
β β Step 4: BUILD RawInteraction β β
β β ββ Passes to UnifiedPipeline β β
β ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UnifiedPipeline._build_dataset() β β
β β β β
β β Training Exclusion Filter: β β
β β if outcome β {INFRASTRUCTURE, DEFERRED}: β β
β β skip (not trainable) β β
β β else: β β
β β include in DPO/LoRA dataset β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Quarantine: ~/.jarvis/quarantine/autonomy_events/ β
β β’ Retention: 7 days β
β β’ Max size: 100 MB β
β β’ Alert threshold: 10 malformed events β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
New files:
reactor_core/ingestion/autonomy_classifier.pyβ CentralizedAutonomyEventClassifier(single source of truth for training eligibility)reactor_core/ingestion/autonomy_event_ingestor.pyβ FullAbstractIngestorwith validation, dedup, quarantine
Modified files:
reactor_core/ingestion/base_ingestor.pyβ AddedINFRASTRUCTUREtoInteractionOutcomeenumreactor_core/training/unified_pipeline.pyβ Training exclusion filter for non-trainable outcomesreactor_core/ingestion/__init__.pyβ Exports for new modules
run_reactor.py:
- Trinity-integrated entry point for Reactor-Core. Designed to be started by the unified supervisor (
python3 run_reactor.py --port 8090). - Exposes health (
/health) for supervisor monitoring and training/API endpoints for the ecosystem. - Environment:
REACTOR_PORT(default 8090),JARVIS_PRIME_URL,TRINITY_ENABLED,MODEL_OUTPUT_DIR,LOG_LEVEL.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JARVIS AGI ECOSYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β JARVIS-AI-Agent βββββββββββΊβ JARVIS Prime β β
β β (Claude Body) β Events β (LLM Mind) β β
β β β β β β
β β β’ Computer Use β β β’ Local LLM β β
β β β’ macOS Control β β β’ Reasoning β β
β β β’ Voice Auth β β β’ Context β β
β βββββββββββ¬βββββββββ βββββββββββ¬βββββββββ β
β β β β
β β Event Bridge β β
β β (WebSocket/Redis) β β
β β β β
β βββββββββββΌβββββββββββββββββββββββββββββββΌβββββββββ β
β β Reactor Core (Nervous System) β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Trinity Orchestrator β β β
β β β β’ Heartbeat monitoring β β β
β β β β’ Command routing β β β
β β β β’ State reconciliation β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Training & Serving β β β
β β β β’ DPO, RLHF, Constitutional AI β β β
β β β β’ Hot-reload model server β β β
β β β β’ Night Shift scheduler β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Event Streaming β β β
β β β β’ Safety audit trail β β β
β β β β’ Cost tracking β β β
β β β β’ Telemetry collection β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βΌ βΌ β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Cloud SQL β β GCP Storage β β
β β (Events DB) β β (Checkpoints) β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
JARVIS Reactor uses environment variables for all configuration (zero hardcoding):
# Path Configuration (XDG-compliant defaults)
export JARVIS_EVENTS_DIR="/custom/path/events"
export TRINITY_EVENTS_DIR="/custom/path/trinity/events"
export EXPERIENCE_QUEUE_DIR="/custom/path/experience_queue"
export MODEL_REGISTRY_PATH="/custom/path/models"
export DATA_VERSION_PATH="/custom/path/data_versions"
# API Configuration
export AGI_API_PORT=8003
export AGI_SERVING_PORT=8001
export AGI_JPRIME_PORT=8000
# Connection Pooling
export HTTP_POOL_SIZE=100
export HTTP_POOL_PER_HOST=10
export HTTP_KEEPALIVE_TIMEOUT=30.0
export REDIS_POOL_SIZE=10
# Training Configuration
export REACTOR_EXPERIENCE_BUFFER_THRESHOLD=100
export REACTOR_AUTO_TRAINING_THRESHOLD=1000
export REACTOR_CHECKPOINT_INTERVAL=300
# GCP Configuration
export GCP_PROJECT_ID="my-project"
export GCP_CHECKPOINT_BUCKET="my-checkpoints"
export GCP_SPOT_VM_ENABLED=true
# Feature Flags
export REACTOR_ENABLE_ONLINE_LEARNING=true
export REACTOR_ENABLE_DISTRIBUTED_TRAINING=true
export REACTOR_ENABLE_DATA_VERSIONING=trueConfiguration is loaded in this priority order:
- Environment variables (highest priority)
~/.jarvis/reactor/config.json(user config)reactor_core/config/default_config.json(defaults)
Example config file:
{
"api": {
"port": 8003,
"host": "0.0.0.0"
},
"training": {
"default_model": "llama-2-7b",
"use_lora": true,
"lora_rank": 16
},
"serving": {
"max_cached_models": 5,
"enable_hot_reload": true,
"default_backend": "auto"
},
"trinity": {
"heartbeat_interval": 5.0,
"health_check_timeout": 10.0
}
}All paths are resolved dynamically with XDG compliance:
- Environment Variable (if set)
- base_config.resolve_path() (if available)
- XDG_DATA_HOME/jarvis/ (fallback)
No hardcoded Path.home() calls - fully portable across systems.
Symptoms: run_supervisor.py shows component failures
Solutions:
# Check component paths
python3 run_supervisor.py --dev --log-level DEBUG
# Verify component health
curl http://localhost:8003/health
# Check logs
tail -f ~/.jarvis/reactor/logs/supervisor.logSymptoms: Out of memory errors during training
Solutions:
# Enable gradient checkpointing
config = TrainingConfig(
gradient_checkpointing=True,
use_qlora=True, # 4-bit quantization
cpu_offload=True, # Offload to CPU
)
# Use smaller batch size
config.batch_size = 1
config.gradient_accumulation_steps = 8Symptoms: Model updates don't appear in server
Solutions:
# Verify file watcher is enabled
config = ModelServerConfig(
enable_hot_reload=True,
watch_directories=["/path/to/models"],
)
# Check file permissions
ls -la /path/to/models
# Verify model format
# Server supports: .gguf, .safetensors, .binSymptoms: Events not flowing between JARVIS, Prime, Reactor
Solutions:
# Check event bridge status
curl http://localhost:8003/api/v1/events/status
# Verify event directories exist
ls -la ~/.jarvis/events/
ls -la ~/.jarvis/trinity/events/
# Check WebSocket connection
# Open browser console: ws://localhost:8003/wsSymptoms: Training stuck at barrier synchronization
Solutions:
# Check network connectivity
await coordinator.check_connectivity()
# Verify all workers are healthy
health = await coordinator.get_worker_health()
# Enable gradient checksum validation
coordinator.enable_gradient_verification = TrueEnable comprehensive debugging:
# Set debug environment
export REACTOR_DEBUG=true
export REACTOR_LOG_LEVEL=DEBUG
# Run with debug flags
python3 run_supervisor.py --dev --log-level DEBUG
# Check debug logs
tail -f ~/.jarvis/reactor/logs/debug.log# Clone repository
git clone --recursive https://github.com/drussell23/JARVIS-Reactor.git
cd JARVIS-Reactor
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest tests/
# Run linting
black reactor_core/
ruff check reactor_core/reactor_core/
βββ training/ # Training methods and pipelines
βββ serving/ # Model serving infrastructure
βββ api/ # REST API endpoints
βββ orchestration/ # Trinity coordination
βββ integration/ # Cross-repo integration
βββ utils/ # Utilities (async_helpers, etc.)
βββ config/ # Configuration management
βββ data/ # Data processing and versioning
βββ eval/ # Model evaluation
βββ gcp/ # GCP-specific features
-
Create feature branch:
git checkout -b feature/my-feature
-
Follow code style:
- Use
blackfor formatting - Follow type hints (use
mypy) - Add docstrings (Google style)
- Use
-
Write tests:
# tests/test_my_feature.py import pytest from reactor_core.my_module import MyFeature @pytest.mark.asyncio async def test_my_feature(): feature = MyFeature() result = await feature.do_something() assert result is not None
-
Update documentation:
- Add to README.md
- Update API docs
- Add examples
-
Submit PR:
- Ensure all tests pass
- Update version in
__init__.py - Add to CHANGELOG.md
# Run all tests
pytest
# Run specific test file
pytest tests/test_training.py
# Run with coverage
pytest --cov=reactor_core --cov-report=html
# Run integration tests
pytest tests/integration/ -v# Format code
black reactor_core/
# Lint code
ruff check reactor_core/
# Type checking
mypy reactor_core/
# Security scanning
bandit -r reactor_core/JARVIS Body v238.0 introduced a complete real-time voice conversation pipeline β continuous, bidirectional, streaming voice dialogue that transforms JARVIS from a command-response system into a conversational AI companion. This has significant implications for Reactor Core's training pipeline.
Architecture (7 layers in JARVIS Body):
- Layer -1: FullDuplexDevice (synchronized mic+speaker at same sample clock) + PlaybackRingBuffer
- Layer 0: AudioBus singleton (all audio I/O flows through) + AEC via speexdsp
- Layer 1: Streaming TTS (Piper β local neural TTS, ~50ms time-to-first-audio)
- Layer 2: Streaming STT (faster-whisper β partial + final transcripts)
- Layer 3: Turn Detector (adaptive silence thresholds: 300/600/900ms)
- Layer 4: Barge-In Controller (user interrupts JARVIS mid-sentence via AEC-cleaned signal)
- Layer 5: Conversation Pipeline (full orchestrator with 20-turn sliding context)
- Layer 6: Mode Dispatcher (COMMAND / CONVERSATION / BIOMETRIC routing)
- Bootstrap: Two-phase factory (AudioBus before narrator, pipeline after Intelligence)
Impact on Reactor Core training data:
| Data Type | Source | Training Application |
|---|---|---|
| Multi-turn conversation traces | ConversationSession (20-turn sliding window) | Fine-tune on conversational coherence β multi-turn dialogue pairs instead of single-turn Q&A |
| Barge-in events | BargeInController logs | Train models to produce "checkpoint sentences" β natural pause points where interruption feels organic |
| Turn detection accuracy | TurnDetector heuristic vs actual user behavior | Training signal for ML-based turn detection classifier (v248.0 in JARVIS Body) |
| Conversation mode LLM telemetry | J-Prime /v1/chat/completions with conversation context |
DPO pairs from multi-turn context: "which response kept the conversation flowing vs. which caused a restart?" |
| Self-voice echo transcripts (dropped) | STT hallucination guard conversation_mode filter |
Negative examples for STT fine-tuning β these are NOT user speech and should be classified as noise |
New telemetry fields flowing to Reactor Core:
session_type: "conversation"vs"command"β enables per-mode DPO pair generationconversation_turn_countβ tracks where in a multi-turn session the response occurredbarge_in_countβ how many times the user interrupted (proxy for response quality)time_to_first_audio_msβ end-to-end latency metric for streaming pipeline
What Reactor Core should prepare for (v248.0+):
- Conversational DPO pairs β Responses that sustained multi-turn engagement ("chosen") vs. responses that ended the conversation ("rejected")
- Latency-aware training β Shorter, more concise responses score higher in conversation mode (unlike detailed analysis mode)
- Turn-detection classifier training data β When the heuristic V1 makes mistakes (too-early or too-late turn detection), log the error as training data for an ML-based V2
- JARVIS Body v238.0 fixes degenerate LLM responses ("...") via 3-layer defense-in-depth
- SIMPLE classification narrowed: "what is X?" queries promoted to MODERATE (512 tokens)
- Backend degenerate response detection with safe retry using MODERATE parameters
- Client-side degenerate suppression with zombie timeout re-arming
requestIdecho in WebSocket responses enables frontend deduplication- Reactor-Core's DPO training pipeline receives improved telemetry (complexity + source fields) for preference pair generation
- Note: v238.0 changes are in JARVIS (Body) and documented here for ecosystem coherence
- Structured Concurrency: Python 3.11+ TaskGroup patterns for robust async operations
- Connection Pooling: Efficient HTTP/Redis connection management with automatic lifecycle
- Dynamic Path Resolution: Zero hardcoding, XDG-compliant paths, environment-driven config
- Atomic File Writes: Prevents checkpoint corruption from partial writes
- Circuit Breaker Pattern: Protects external service calls with auto-recovery
- Backpressure Control: Prevents memory exhaustion under high load
- Proper Async Patterns: Deadlock-free async/await with timeouts
- Gradient Verification: Checksum validation for distributed training
- Memory Pressure Awareness: Adaptive behavior under resource constraints
- Unified Error Handling: Centralized error classification and routing
- Online/Incremental Learning: Prioritized experience replay with importance sampling
- Elastic Weight Consolidation (EWC): Prevents catastrophic forgetting during updates
- Concept Drift Detection: Page-Hinkley test for automatic model adaptation
- Data Versioning: Content-addressed storage with lineage tracking (DVC compatible)
- GCP Spot VM Checkpointing: Predictive preemption with multi-signal detection
- Distributed Training: Multi-VM coordination with gradient compression
- Dynamic Resource Allocation: Auto-scaling with cost-aware decisions
- MLForge C++ Bindings: High-performance matrix/neural ops with pybind11
- Hot-reload model server with zero-downtime updates (1,545 lines)
- Multi-backend inference engine: vLLM, llama.cpp, MLX, Transformers (1,891 lines)
- Unified supervisor for one-command AGI OS startup (1,635 lines)
- LRU model cache with memory-aware eviction
- Priority request queue for SLA compliance
- Semantic response caching with hash-based deduplication
- Telemetry collection system with WebSocket streaming (1,128 lines)
- Night Shift scheduler for automated training (1,030 lines)
- Model registry with versioning and A/B testing (1,301 lines)
- Health aggregator with multi-service dashboard (999 lines)
- Enhanced FastAPI server (2,252 lines)
- Advanced async patterns library (1,746 lines)
- Circuit breaker, backpressure, bulkhead patterns
- Dead letter queue, health monitor, adaptive rate limiter
- Dependency injection system (913 lines)
- DPO, RLHF, Constitutional AI, Curriculum Learning (2,899 lines)
- Memory manager with dynamic batch sizing
- Advanced evaluation suite (1,536 lines)
- World model training with latent dynamics and planning
- Causal reasoning with SCMs and do-calculus
- Advanced data preprocessing with quality gates
- Synthetic data generation (3-10x augmentation)
- Active learning for efficient labeling
- Curriculum learning with progressive difficulty
- Meta-learning (MAML, Reptile, Meta-SGD)
- Dependency injection framework
- DLQ for failed/expired commands
- Automatic retry with exponential backoff
- Zero-corruption file operations via atomic renames
- Safety audit trail and kill switch mechanism
- Real-time event streaming across JARVIS ecosystem
- PyTorch-first ML training framework
- LoRA/QLoRA, DPO, FSDP support
- GCP Spot VM resilience
Status: Infrastructure ~95% built. Schemas verified identical across repos. All code exists. Zero training jobs have ever run. This version activates the pipeline with ~200-400 lines of wiring changes, zero new Python files.
Approach: Supervisor-Driven Activation β the unified supervisor already has ReactorCoreClient with trigger_training(), stream_experience(), get_experience_count(), and health monitoring. v239.0 wires these existing methods into the startup and runtime loops.
What v239.0 changes:
-
Wire supervisor startup (~40 lines in
unified_supervisor.py)- Call
initialize_reactor_core()during Phase 5 (Trinity) - Start
ReactorCoreWatcheras background task - Check for in-progress training jobs on startup (prevents duplicates after restart)
- Wire shutdown calls for clean cleanup
- Call
-
Verify HTTP connectivity (debug, no code change)
- Confirm Reactor Core API server binds to port 8090 and accepts POSTs
- Verify
/api/v1/experiences/streamendpoint is registered and reachable - Manual
curltest before any code changes
-
First training job (manual trigger)
- Trigger via
ReactorCoreClient.trigger_training()using accumulated telemetry - Validate:
DatasetBuilderβLoRATrainerβGGUFExporterchain works - Goal:
jobs.jsongoes from{}to having one completed job
- Trigger via
-
Deployment smoke test gate (~60 lines in
reactor_core_watcher.py)- Before deploying GGUF to Prime, load model in a subprocess (avoids OOM on 16GB Mac)
- Run 5 test prompts, verify non-garbage output
- Block deployment if smoke test fails
- Extensible
DeploymentGateinterface for future JARVIS-Bench integration
-
Deployment feedback loop (~30 lines in
reactor_core_watcher.py)- After hot-swap notification to Prime, wait 10s, check Prime health
- Write
deployment_status.jsonto~/.jarvis/reactor/feedback/with schema:{ "schema_version": "1.0", "model_path": "...", "deployed_at": "2026-02-15T10:30:00Z", "smoke_test_passed": true, "hot_swap_notified": true, "health_verified": true, "previous_model": "qwen2.5-coder-7b-v1", "deployment_latency_ms": 12400 } - Reactor Core consumes these to track deployment success rate
When this works, the full loop is:
User β JARVIS Body β J-Prime (inference + telemetry capture)
β ~/.jarvis/telemetry/ (JSONL logs with model_id, task_type)
β Reactor Core TelemetryIngestor (via file watch or HTTP POST)
β Experience accumulation (threshold: 100 weighted experiences)
β Supervisor auto-triggers training via ReactorCoreClient
β LoRA fine-tuning (SFT first, DPO in v242.0)
β GGUF export β Smoke test gate (subprocess)
β Deploy to J-Prime β Feedback file
β Models improve, automatically
Status: Depends on v239.0 pipeline activation. DPO pair generation code exists in dpo_pair_generator.py but has never run on real data.
What v242.0 adds on top of v239.0:
-
Automatic DPO preference pairs from multi-model routing
- v241.1's task-type routing creates implicit quality comparisons:
Query: "solve 5x+3=18" routed to Mistral-7B β "x=11" (wrong) Same query type routed to Qwen-Math-7B β "x=3" (correct) β Automatic DPO pair: {prompt, chosen: "x=3", rejected: "x=11"} - Multi-model routing IS the labeling mechanism. No human annotation needed.
model_idin telemetry (viaX-Model-Idresponse header) enables per-model performance tracking
- v241.1's task-type routing creates implicit quality comparisons:
-
Ground truth sources for DPO pairs (not just self-assessment)
- User corrections β when a user re-asks or explicitly corrects, the correction is "chosen" and the original is "rejected"
- Claude-as-judge β use Claude API to evaluate which of two outputs is better (stronger model judging weaker ones)
- Objective metrics β for code tasks: does the code compile/run? For math: is the answer correct?
- Avoids circular reasoning (system training on its own quality judgments)
-
Fine-tune and export
UnifiedTrainingPipelinesupports DPO training with LoRA/QLoRA- Training requires full-precision FP16 base models (~14 GB for 7B), not the GGUFs
- Elastic Weight Consolidation (EWC) prevents catastrophic forgetting
- Per-task-type regression tests run after every training run (all task types, not just the one trained on)
A comprehensive three-way architectural audit of the JARVIS ecosystem was conducted across JARVIS Body, JARVIS Prime, and Reactor Core. Three independent analyses were cross-verified against actual code, producing the corrected status below.
The training data pipeline from JARVIS Body β J-Prime β Reactor Core is ~95% built but never activated. All infrastructure exists, schemas are verified identical, and handoff code is implemented. The gap is operational β nobody has run the pipeline.
| Component | Location | Status | Verified State (Feb 2026) |
|---|---|---|---|
TelemetryEmitter |
JARVIS Body | Built and active | Writes JSONL to ~/.jarvis/telemetry/. Telemetry files confirmed present (e.g., interactions_20260210.jsonl). |
TelemetryIngestor |
Reactor Core | Built | Reads from ~/.jarvis/telemetry/. Schema verified byte-identical to TelemetryEmitter output (v1.0 canonical). Not actively polling β only runs when UnifiedTrainingPipeline is explicitly invoked. |
ReactorCoreBridge.upload_training_data() |
J-Prime | Fully implemented | 992 LOC, v242.0. Includes batch upload, file fallback, job tracking. |
ExperienceEvent schema |
All 3 repos | Unified | One canonical ExperienceEvent dataclass with 5 adapter functions for legacy formats. |
UnifiedTrainingPipeline |
Reactor Core | Built | DatasetBuilder β LoRATrainer β GGUFExporter chain exists. Zero training jobs have ever run (jobs.json is empty). |
HotSwapManager |
J-Prime | Built | Accepts GGUF files for zero-downtime swap. ReactorCoreWatcher in Prime detects new model files. |
ModelDeploymentManager |
Reactor Core | Built | GGUF export and deployment signaling exists. Untested end-to-end. |
initialize_reactor_core() |
JARVIS Body | Built but never called | Function exists in backend/autonomy/reactor_core_integration.py but supervisor does not invoke it during startup. |
start_reactor_core_watcher() |
JARVIS Body | Built but never called | Function exists in backend/autonomy/reactor_core_watcher.py but supervisor does not start it. |
Root Cause (Corrected): The pipeline is not "broken" β it was never turned on. The schemas match, the code is written, the APIs exist. The supervisor needs to call initialize_reactor_core() and start_reactor_core_watcher() during its startup sequence, and Reactor Core's API server needs to be verified as actively listening and accepting experience POSTs on port 8090. This is a wiring problem, not an architecture problem. Target: v239.0 Supervisor-Driven Pipeline Activation.
What v239.0 will wire:
- Supervisor calls
initialize_reactor_core()during Phase 5 (Trinity) - Supervisor starts
ReactorCoreWatcheras a background task - Verify Reactor Core API accepts POSTs on port 8090
- First manual training job triggered via
ReactorCoreClient.trigger_training() - Deployment feedback file (
~/.jarvis/reactor/feedback/deployment_status.json) closes the loop - Smoke test gate validates GGUF before deployment (runs in subprocess to avoid OOM)
The v245.0 Google Workspace fixes in JARVIS Body have a direct impact on Reactor Core's future training data:
- Draft email body generation now works β Previously silent failures meant no email body generation telemetry was captured. Now, every draft email request generates a real LLM inference call (with
X-Model-Id), producing training-relevant interaction data. - Agent singleton fix eliminates noise β The 49s recreation bug caused timeout errors that would have polluted training data with failed interactions. Clean request/response pairs are now the norm.
- Task-type metadata flows correctly β Workspace commands now carry proper task-type metadata, enabling Reactor Core to generate per-model DPO pairs from workspace interactions.
All 9 LangGraph reasoning graphs in JARVIS Body are dead code because langgraph is not installed. This means:
- The reasoning engine uses linear fallback (analysis β planning β validation β execution β reflection β learning) instead of conditional graph routing
- The
route_after_reflection()loop-back (for iterative reasoning on low confidence) has never executed - Training data from the reasoning engine reflects single-pass linear thinking, not the intended iterative, graph-based reasoning
- Impact on Reactor Core: When the training pipeline activates, the quality of reasoning traces available for fine-tuning will be lower than designed until LangGraph is installed (v246.0 in JARVIS Body)
The JARVIS Body Unified Agent Runtime will generate a new class of training data for Reactor Core:
- Multi-step goal traces β Complete autonomous workflows (sense β think β act β verify β reflect) with sub-step decomposition, producing rich sequential decision-making data
- Cross-agent coordination traces β When the Runtime dispatches work to Neural Mesh agents, the coordination patterns become training data for improving multi-agent orchestration
- Failure recovery traces β When autonomous goals fail and the Runtime retries or replans, the recovery patterns become training data for improving resilience
- Human escalation signals β When the Runtime escalates to the user for approval, the decision boundary becomes a training signal for the safety classifier
New training data types Reactor Core should prepare for:
| Data Type | Source | Training Method |
|---|---|---|
| Goal decomposition traces | Agent Runtime THINK phase | Supervised fine-tuning on planning |
| Sub-step success/failure | Agent Runtime VERIFY phase | DPO pairs (successful vs. failed approaches) |
| Escalation decisions | Agent Runtime escalation protocol | Constitutional AI for safety boundaries |
| Multi-agent coordination | Neural Mesh dispatch logs | Curriculum learning on orchestration complexity |
β v243.0/v243.1 β Command Lifecycle Events + Event Bus Lifecycle (COMPLETED β JARVIS Body-side)
v243.0/v243.1 shipped as Command Lifecycle Events and Event Infrastructure Lifecycle Management in the JARVIS Body repo. This directly impacts Reactor Core because command lifecycle events create a new source of training data.
What this means for Reactor Core:
Command lifecycle events (command.received, command.classified, command.completed, command.failed) now flow through TrinityEventBus. NeuralMesh's Knowledge Graph subscribes to these events, building semantic memory of command patterns. This creates richer training signals for the DPO pipeline:
BEFORE v243.0:
User command β J-Prime inference β response
Training data: (query, response) pairs only
AFTER v243.0:
User command β command.received event
β J-Prime inference β command.classified event
β Execution β command.completed/failed event
Training data: (query, response, intent, domain, execution_outcome, latency)
β Reactor Core TelemetryIngestor can now consume:
- Successful vs failed executions as quality signals
- Intent classification accuracy as routing feedback
- Domain distribution for curriculum learning
- Latency metrics for performance optimization
Impact on Reactor Core training pipeline:
- DPO pair quality improvement β Command outcomes (success/failure) provide ground truth for preference pairs. A response that correctly classified
intent="action"and executed successfully is a stronger "chosen" signal than one based solely on response text quality. - Curriculum learning data β Domain distribution from
command.classifiedevents enables data-driven curriculum: train on high-frequency domains first (general, system), then expand to rare domains (smart_home, media). - Drift detection signals β
command.failedevents withintentmetadata enable per-domain quality monitoring. A spike in failures fordomain="workspace"suggests the workspace model needs retraining.
Event infrastructure lifecycle (v243.1):
- TrinityEventBus explicitly started in Phase 4 (before any subscriber)
- Health checks registered with HealthAggregator
- Graceful shutdown in correct dependency order
- Boot-order races eliminated (NeuralMesh no longer needs 10s retry)
Files modified (all in JARVIS Body repo):
unified_supervisor.pyβ Event state tracking, explicit startup, health checks, shutdownbackend/core/trinity_event_bus.pyβ Command lifecycle event typesbackend/api/unified_command_processor.pyβ Event emission at each command stagebackend/neural_mesh/neural_mesh_coordinator.pyβ Knowledge Graph subscription
β v244.0 β Startup Warning Root Fix + Brain Vacuum Classification (COMPLETED β JARVIS Body-side)
v244.0 shipped in the JARVIS Body repo with three fix categories. The brain vacuum classification fix is most relevant to Reactor Core's training pipeline:
Brain Vacuum Classification Fix:
When J-Prime is unreachable, _brain_vacuum_fallback() in jarvis_prime_client.py now includes a classification prompt prefix. The fallback LLM (Claude/Gemini) outputs a CLASSIFICATION: {"intent", "domain", "requires_action", "suggested_actions"} line before its response. This means:
- Better training data during downtime β Fallback responses now include proper intent/domain classification, not hardcoded
intent="answer". Telemetry events from brain vacuum periods produce valid DPO pairs. - Action commands execute β "Lock my screen" during J-Prime downtime returns
intent="action"and actually executes, instead of becoming a text explanation.
Other v244.0 changes:
- 858 lines of dead code removed (orphaned tiered routing system)
- Cloud SQL proxy startup reduced from ~47s to ~3-5s (learning_database initializes faster)
Impact on Reactor Core: Faster Cloud SQL proxy startup means the learning_database (which stores voiceprints, command history, and training metadata) initializes sooner, reducing the window where telemetry events might be lost during boot.
Support the training side of JARVIS self-programming:
- Code quality evaluation β Evaluate generated code diffs for correctness, style, security. Feed scores back as DPO signals.
- Self-programming telemetry β Capture Ouroboros cycles (architect plan β generated code β verifier review β human decision) as training data.
- Architect/Implementer specialization β Fine-tune DeepSeek-R1-14B on architectural reasoning traces and Qwen-Coder-14B on code generation from plans, using Ouroboros interaction data.
- Constitutional AI for code β Apply Constitutional AI training to code generation: "Is this code safe? Does it follow the existing patterns? Does it handle errors?"
- Night Shift automation β
NightShiftScheduleralready exists. Wire it to trigger DPO training runs during off-peak hours using accumulated telemetry. - Concept drift detection β
PageHinkleyDriftDetectoralready exists. Monitor model performance metrics and trigger retraining when quality degrades. - A/B model testing β
ModelRegistrysupports versioned models and A/B testing. Deploy fine-tuned models alongside originals, compare performance, promote winners. - Curriculum learning β Start fine-tuning on easy tasks (general chat), progressively add harder tasks (math, code, reasoning) using curriculum learning infrastructure already built in v79.0.
- Multi-VM gradient aggregation β v91.0 built distributed training with gradient compression. Activate for 14B model fine-tuning which exceeds single-VM memory.
- Spot VM resilience β Predictive preemption with checkpoint save already built. Test with real training runs.
- Cost-aware scheduling β Train on spot VMs during cheap hours, pause during expensive hours.
DynamicResourceAllocatorhas the framework.
Ingest and process training data from JARVIS Body's real-time voice conversation infrastructure (v238.0):
- Conversation trace schema β Define JSONL schema for multi-turn conversation sessions: session_id, turns (role + text + audio_duration_ms), barge_in_events, turn_detection_errors, latency metrics. Compatible with existing
TelemetryIngestor. - Conversational DPO pairs β Responses that sustained multi-turn engagement (user continued the conversation) are "chosen"; responses that ended the conversation (user said "goodbye" or restarted with a new topic) are "rejected". Barge-in count per response is a quality proxy (more interruptions β worse response).
- Conciseness training for conversation mode β In conversation mode, shorter responses feel more natural. Generate DPO pairs where concise, direct responses are "chosen" over verbose, over-explained responses. This is the inverse of detailed-analysis mode training.
- Turn detection classifier training β Log TurnDetector heuristic decisions (silence_duration_ms, threshold_used, was_correct) as training data. When the heuristic triggers too early (user wasn't done) or too late (awkward pause), these errors become labeled examples for a small ML classifier.
- Self-voice echo negative examples β Transcripts dropped by the
stt_hallucination_guardin conversation mode are negative examples for STT fine-tuning. These represent JARVIS's own speech (imperfect AEC residual) and should be classified as noise. - Session-level metrics for curriculum learning β Use conversation session duration, turn count, and user engagement as difficulty metrics for curriculum learning: short exchanges first, then longer multi-topic conversations.
Prepare Reactor Core to ingest and process training data from the JARVIS Body Unified Agent Runtime:
- Goal trace schema β Define JSONL schema for multi-step autonomous goal traces (goal β sub-steps β outcomes β reflections) compatible with
TelemetryIngestor - Sequential DPO pairs β Generate preference pairs from goal execution sequences: successful multi-step approaches vs. failed approaches for the same goal type
- Escalation boundary training β Use human escalation decisions (approve/reject) as Constitutional AI training signals for the safety classifier
- Multi-agent coordination curriculum β Build progressive difficulty curriculum from simple single-agent tasks to complex multi-agent workflows
- Failure recovery fine-tuning β Fine-tune reasoning models on recovery traces: when a sub-step fails, what replanning strategies worked vs. didn't
- Cross-model comparison at scale β With the Agent Runtime generating higher request volume across all specialist models, DPO pair generation becomes more statistically significant
Cross-repo verification and integration testing. Many items previously planned here were resolved during the Feb 2026 audit:
- JSONL format contract β
Define and enforce shared schemaVERIFIED: Schemas are byte-identical across all three repos (v1.0 canonicalExperienceEvent). No action needed. - Implement
ReactorCoreBridge.upload_training_data()βBroken link in J-PrimeVERIFIED: Fully implemented (992 LOC, v242.0) with batch upload, fallback, job tracking. No action needed. - Deployment signal verification β Test the Reactor Core β Trinity Protocol β J-Prime
HotSwapManagerpath end-to-end with a dummy GGUF (partially addressed in v239.0 smoke test) - Integration test suite β Automated test that writes a telemetry event in JARVIS Body format, ingests it in Reactor Core, runs a mock training step, exports a GGUF, and signals J-Prime for hot swap
- Monitoring dashboard β Track pipeline health: events written/day, events ingested/day, training runs completed, models deployed, deployment feedback success rate
- Model lineage tracking β Every deployed model records: base model, training method, training steps, dataset hash, evaluation scores, previous model scores
- Data versioning activation β Content-addressed dataset storage with DVC-compatible lineage tracking (infrastructure exists, needs activation)
Once the API server is running (python3 run_supervisor.py), access:
- API Base URL:
http://localhost:8003 - Interactive Docs:
http://localhost:8003/docs(Swagger UI) - ReDoc:
http://localhost:8003/redoc - Health Check:
http://localhost:8003/health
# Trigger training
POST /api/v1/training/trigger
{
"model_name": "llama-2-7b",
"training_type": "dpo",
"config": {
"num_epochs": 3,
"batch_size": 4
}
}
# Get training status
GET /api/v1/training/status/{job_id}
# Cancel training
POST /api/v1/training/cancel/{job_id}# List models
GET /api/v1/models
# Get model info
GET /api/v1/models/{model_id}
# Register model
POST /api/v1/models/register
{
"model_id": "my-model-v1",
"model_path": "/path/to/model",
"metadata": {...}
}# Schedule job
POST /api/v1/scheduler/schedule
{
"name": "nightly_training",
"schedule_type": "cron",
"cron_expression": "0 2 * * *",
"job_config": {...}
}
# List scheduled jobs
GET /api/v1/scheduler/jobs# Submit telemetry
POST /api/v1/telemetry/submit
{
"event_type": "interaction",
"data": {...}
}
# Query metrics
GET /api/v1/telemetry/metrics?start_time=...&end_time=...Connect to ws://localhost:8003/ws for real-time events:
const ws = new WebSocket('ws://localhost:8003/ws');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Event:', data.type, data.payload);
};
// Subscribe to training events
ws.send(JSON.stringify({
type: 'subscribe',
channels: ['training:progress', 'training:complete']
}));Model server runs on port 8001 (configurable):
# Inference
POST http://localhost:8001/predict
{
"prompt": "What is machine learning?",
"model_id": "llama-2-7b",
"max_tokens": 256,
"temperature": 0.7
}
# List loaded models
GET http://localhost:8001/models
# Load model
POST http://localhost:8001/models/load
{
"model_id": "my-model",
"model_path": "/path/to/model",
"backend": "vllm"
}| Role | Repository | URL |
|---|---|---|
| Body | JARVIS (JARVIS-AI-Agent) | https://github.com/drussell23/JARVIS-AI-Agent |
| Mind | JARVIS-Prime | https://github.com/drussell23/jarvis-prime |
| Nerves | Reactor-Core (this repo) | https://github.com/drussell23/JARVIS-Reactor |
| C++ Core | MLForge | https://github.com/drussell23/MLForge |
- Architecture Docs: See
ARCHITECTURE_ADVANCED.md - Trinity Integration: See
TRINITY_INTEGRATION_COMPLETE.md - Version History: See
CHANGELOG.md(if available)
- Issues: https://github.com/drussell23/JARVIS-Reactor/issues
- Discussions: https://github.com/drussell23/JARVIS-Reactor/discussions
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following our code style
- Add tests for new features
- Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Formatting: Black (line length 100)
- Linting: Ruff
- Type Hints: Required for all functions
- Docstrings: Google style
Reactor-Core is the learning and adaptation layer for autonomous Gmail triage. It does not triage inboxes directly; it consumes outcome signals and feeds safe, bounded improvements back into scoring behavior.
- Ingest behavioral outcomes (opened, replied, ignored, relabeled) with confidence controls.
- Track sender/domain reputation and outcome distributions over time.
- Drive bounded adaptive weight proposals for Body-side scoring.
- Preserve auditability: explainable adaptation events, rollback capability, and deterministic safety bounds.
flowchart TD
A[Gmail triage decision in JARVIS Body] --> B[User behavior outcome observed]
B --> C[OutcomeCollector classification]
C --> D[Experience queue + reputation updates]
D --> E[Reactor-Core learning pipelines]
E --> F[Bounded adapted weights]
F --> G[Shadow validation + drift checks]
G --> H[Safe activation in Body scoring]
- Adaptation is not immediate/unsafe by default; it should run in bounded mode with guardrails.
- Low-confidence outcomes are excluded from adaptation input.
- Weight changes remain bounded (no runaway drift), and disagreements trigger rollback behavior.
- User-facing notifications and UI delivery continue through Body-side channels; Reactor-Core influences prioritization quality over time.
MIT License - See LICENSE file for details.
Built with β€οΈ for the JARVIS AGI Ecosystem
Special Thanks:
- PyTorch team for the excellent ML framework
- Hugging Face for transformers and PEFT
- FastAPI for the amazing async web framework
- All contributors and users of the JARVIS ecosystem
Version: 2.12.0 (v239.0 target)
Last Updated: February 2026
Status: β
Infrastructure Complete | β³ Pipeline Activation In Progress (v239.0 β wiring existing components, ~200-400 lines across 4 files, zero new Python files)
Feb 2026 Audit Corrections: A three-way cross-verification against actual code corrected several previously reported issues: ReactorCoreBridge.upload_training_data() IS fully implemented (992 LOC), experience schemas ARE byte-identical across repos, and ExperienceEvent IS the single canonical schema with legacy adapters. The remaining gap is operational activation, not missing code.
- Training data pipeline built but never activated β All infrastructure exists and schemas are verified identical across repos (v1.0 canonical).
TelemetryEmitter,TelemetryIngestor,ReactorCoreBridge.upload_training_data()(992 LOC, v242.0),UnifiedTrainingPipeline, andReactorCoreWatcherare all built. The gap is activation: zero training jobs have ever run (jobs.jsonis empty). The Reactor Core API server on port 8090 needs to be verified as accepting POSTs, and the supervisor needs to callinitialize_reactor_core()andstart_reactor_core_watcher()during startup. Target: v239.0 (wiring, not building). - Deployment feedback loop is one-way β Reactor Core can export GGUF and deploy to J-Prime, and
ReactorCoreWatcherin Prime can detect new models. But there is no feedback mechanism β Prime never tells Reactor Core "model deployed successfully" or "model caused regression, rolling back." Adeployment_status.jsonfeedback file is needed. Target: v239.0. - No deployment quality gate β The pipeline goes Training β GGUF export β Deploy with no validation step. A smoke test (load model, run test inference, verify non-garbage output) must be inserted before deployment. Must run in a subprocess to avoid OOM on 16GB Mac. Target: v239.0.
- No real production training data yet β
UnifiedTrainingPipelinehas never run on actual user interaction data. Telemetry JSONL files exist in~/.jarvis/telemetry/but have never been ingested into a training run. Target: v239.0 (first run). - LangGraph reasoning traces unavailable β JARVIS Body's reasoning engine produces linear fallback traces, not rich graph-based reasoning data (depends on JARVIS Body v246.0)
- Agent Runtime training data schema undefined β When autonomous goal pursuit generates multi-step traces, Reactor Core needs a new ingestion schema (v246.0 target)
- Voice conversation training data schema undefined β JARVIS Body v238.0 generates multi-turn conversation traces, barge-in events, and turn detection logs. Reactor Core needs a conversation trace ingestion schema and conversational DPO pair generator (v248.0 target)