Integrate ROSE + SmartSim Data Client

This ticket tracks the progress of integrating ROSE with HPC-Cray SmartSim DataClient.

Very high-level design:

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   ROSE                                       │
│──────────────────────────────────────────────────────────────────────────────│
│ • Manages and scales distributed AL/RL, and ML workflows across HPC resources│
│ • Submits multiple Learners in parallel (each manages Simulation + Training) │
│ • Owns an integrated DataClient (leveraging SmartSim) for in-memory exchange │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐     │
│   │                         DataClient (SmartSim)                      │     │
│   │────────────────────────────────────────────────────────────────────│     │
│   │ • ROSE-integrated interface for data movement                      │     │
│   │ • Uses SmartSim/Redis backend for in-memory tensor exchange        │     │
│   │ • Enables fast communication between Simulations and AI tasks      │     │
│   └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐     │
│   │                            Learners (Parallel)                     │     │
│   │────────────────────────────────────────────────────────────────────│     │
│   │  Each Learner orchestrates coupled Simulation and Training tasks:  │     │
│   │                                                                    │     │
│   │   ┌────────────────────────┐       ┌────────────────────────┐      │     │
│   │   │   Simulation Task      │──────▶│  AI / Training Task    │      │     │
│   │   │  (Data Producer)       │◀──────│ (Data Consumer/Updater)│      │     │
│   │   └────────────────────────┘       └────────────────────────┘      │     │
│   │            │                                │                      │     │
│   │           └─────────── uses DataClient ─────┘                      │     │
│   │                                                                    │     │
│   │   ... (many Learners running concurrently, managed by ROSE) ...    │     │
│   └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐     │
│   │       Underlying SmartSim / Redis/Dragon Backend (In-memory Store) │     │
│   │────────────────────────────────────────────────────────────────────│     │
│   │ • Global in-memory database shared across all Learners             │     │
│   │ • Handles tensors, metadata, and model states                      │     │
│   │ • Scales across HPC nodes with SmartSim integration                │     │
│   └────────────────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘

```

Current work is under the following branch: https://github.com/radical-cybertools/ROSE/tree/integration/smartsim
Example (not a final version): https://github.com/radical-cybertools/ROSE/blob/integration/smartsim/examples/data_flow_learner.py 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ROSE + SmartSim Data Client #73

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate ROSE + SmartSim Data Client #73

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions