Bob's World is now a Docker Swarm simulation platform for coordinating many AI-enabled replicas.
Important
This is an experimental build. For operational steps, commands, and troubleshooting workflow, ask OpenClaw / GPT-5.x for instructions first. Try it — the agent works like having a Human DevOps partner on demand.
- Swarm replica tiles with per-tile ARM state
- Leader/manager visualization and deterministic election
- Memcached-backed inter-container signaling
- Leader-published Rock/Paper/Scissors (RPS) rounds with per-task scoring
- Aggregator API for scoreboard + ON/OFF state
- OpenClaw/PicoClaw collaboration model:
- each
clawbuckettask includes its own local PicoClaw runtime/context - all tasks share a single Ollama backend for local model inference/fallback
- each
- Starship Troopers-style generation prompts (unit flavor)
- Swarm — A cluster of Docker engines operating together as one orchestration system.
- Node — A machine participating in the swarm.
- Manager node — A node that maintains cluster state and makes orchestration decisions.
- Worker node — A node that runs tasks assigned by managers.
- Leader — The manager currently elected to coordinate swarm management through Raft.
- Raft — The consensus protocol managers use to keep swarm state consistent.
- Quorum — The minimum number of managers required to agree on cluster state changes.
- Service — The declarative definition of how containers should run in the swarm.
- Task — A single scheduled instance of a service container on a node.
- Replica — One desired running copy of a service.
- Replicated service — A service configured to run a specified number of replicas.
- Global service — A service configured to run exactly one task on every eligible node.
- Stack — A group of services, networks, and configs deployed together, usually from a Compose file.
- Desired state — The target configuration the swarm tries to maintain.
- Actual state — The real current condition of services and tasks in the cluster.
- Reconciliation — The manager process that continuously adjusts actual state toward desired state.
- Scheduler — The swarm component that decides where tasks should be placed.
- Placement constraint — A hard rule limiting which nodes may run a task.
- Placement preference — A soft rule influencing task distribution across nodes.
- Label — Metadata attached to nodes or objects for filtering and placement logic.
- Availability — A node state controlling whether it can receive tasks.
- Active — A node availability state allowing normal task scheduling.
- Pause — A node availability state preventing new tasks while leaving existing ones running.
- Drain — A node availability state that removes existing tasks and blocks new ones.
- Overlay network — A multi-host virtual network used for communication across swarm nodes.
- Ingress network — The special overlay network used for published service traffic and routing mesh.
- Routing mesh — The swarm traffic layer that accepts published port requests on any node and routes them to service tasks.
- Publish port — A port exposed externally by a swarm service.
- Internal port — The port the container listens on inside the service task.
- Endpoint mode — The method swarm uses to expose service discovery to clients.
- VIP — Virtual IP mode where a service gets a single internal IP for load-balanced access.
- DNSRR — DNS round-robin mode where service discovery returns multiple task IPs directly.
- Slot — The stable ordinal position of a replica within a replicated service.
- Rolling update — A controlled process for replacing service tasks with new versions incrementally.
- Rollback — Reverting a service to its previous configuration after an update issue.
- Health check — A container-level test used to determine whether a task is healthy.
- Secret — Sensitive data distributed securely to services at runtime.
- Config — Non-sensitive configuration data distributed to services by the swarm.
- Join token — A token used by new nodes to join the swarm as worker or manager.
- Swarm CA — The certificate authority that issues node certificates for swarm trust.
- mTLS — Mutual TLS used for encrypted and authenticated communication between swarm nodes.
- Autolock — A security feature that requires an unlock key after manager restart to access Raft data.
- Unlock key — The key required to unlock an autolocked manager.
- Advertise address — The network address a node tells other nodes to use for communication.
- Listen address — The local address a node binds to for swarm control traffic.
- Dispatcher — The manager component that assigns tasks and monitors worker status.
- Allocator — The manager component that assigns network and resource-related settings to swarm objects.
- Control plane — The management communication path for orchestration and cluster state.
- Data plane — The application traffic path used by running service containers.
- Pending — A task state indicating it has been accepted but not yet scheduled or started.
- Running — A task state indicating the container is currently executing.
- Shutdown — A task state indicating the task has been intentionally stopped.
- Failed — A task state indicating the task exited unexpectedly or could not start.
- Orphaned task — A task left behind or disconnected from expected management state, usually after failures or node issues.
- Gossip — The peer-to-peer mechanism used to distribute certain network state across nodes.
- Swarm init — The action that creates a new swarm and promotes the first node to manager.
- Swarm join — The action that adds a node to an existing swarm.
- Swarm leave — The action that removes a node from the swarm.
- Swarm update — The action that modifies swarm-wide settings.
- Node promote — The action that changes a worker into a manager.
- Node demote — The action that changes a manager into a worker.
- Service scale — The action of changing the number of replicas for a replicated service.
- Service update — The action of changing service configuration, image, ports, or placement rules.
- Service inspect — Viewing the detailed configuration and current state of a service.
- Stack deploy — Deploying a stack definition into the swarm.
- Stack rm — Removing a deployed stack and its swarm-managed resources.
clawbucket runs as a replicated Swarm service. Each replica is both:
- A participant in the simulation loop (heartbeat, RPS player/publisher, ARM events)
- Its own lightweight AI agent runtime (local PicoClaw CLI + local context)
This allows one-to-many scaling of OpenClaw-style behavior:
- one container = one autonomous trooper
- many containers = coordinated AI squad/army
In short: OpenClaw can run an army of one or many.
A single shared picoclaw service handled generation requests for all tasks.
Each clawbucket task image embeds PicoClaw directly:
Dockerfilecopies/usr/local/bin/picoclawfromsipeed/picoclaw:latest- task-local config at
/root/.picoclaw/config.json app.pycallspicoclaw agent -m ...via subprocess in the same container
- Isolated context per task (no single shared chat/context window)
- Better swarm identity (each replica has its own voice/history/runtime state)
- Fewer cross-service hops for agent calls
- Shared model economics still preserved through common
ollamaservice
This gives a practical hybrid:
- distributed agents at the edge (per task)
- shared model backend in the center (Ollama)
Defined in docker-stack.yml:
-
clawbucket- Flask app (
app.py) + local PicoClaw binary - Exposes
8080 - Mounts Docker socket for Swarm inspection/scale operations
- Flask app (
-
clawbucket-aggregator- Flask app (
aggregator.py) - Exposes
8090
- Flask app (
-
memcached- Shared transient state bus for coordination
-
ollama- Shared model runtime
- Exposes
11434 - Uses
ollama_datavolume for model persistence
Note: standalone shared
picoclawservice is no longer required for generation flow.
-
Dashboard (
:8080)- shows replicas, slot, short task/node IDs, generated names
- supports scaling via
/api/scale
-
ARM toggles
- per-tile arm button and ON/OFF visual state
- emits Memcached event stream (
on/off)
-
Manager selection
- exactly one
MANAGERtile - deterministic leader-aware selection
- exactly one
-
Conversation + generated text
- chat panel backed by Memcached
- per-task generated short phrases stored by task key
- displayed text truncated to 50 chars
-
RPS loop
- single publisher task writes shared leader move
- non-manager tasks score against leader move
-
Haiku loop
- periodic generated haiku with PicoClaw-first, Ollama-fallback behavior
clawbucket:chat:messages
clawbucket:arm:events
clawbucket:rps:stateclawbucket:rps:interval_seconds
clawbucket:rps:score:<task_id>clawbucket:rps:last_seen:<task_id>
clawbucket:heartbeat:<task_name>
clawbucket:picoclaw:threewords:<task_id>(primary per-instance value)clawbucket:picoclaw:threewords:latest(shared latest snapshot)
Main app (:8080):
GET /api/swarmPOST /api/scaleGET /api/chatPOST /api/chatGET /api/arm/eventsPOST /api/armGET /api/rpsPOST /api/rps/configGET /api/haiku
Aggregator (:8090):
GET /healthzGET /api/scoreboard
# build image used by stack
docker build -t mallond/clawbucket:arm-agg-local .
# deploy/update
docker stack deploy -c docker-stack.yml clawbucket
# verify
docker service ls
docker service ps clawbucket_clawbucket
docker service ps clawbucket_clawbucket-aggregator
docker service ps clawbucket_memcached
docker service ps clawbucket_ollamaOpen:
http://<host>:8080http://<host>:8090/api/scoreboardhttp://<host>:11434
OLLAMA_CID=$(docker ps --filter label=com.docker.swarm.service.name=clawbucket_ollama -q | head -n1)
docker exec -it "$OLLAMA_CID" ollama pull smollm2:135m
docker exec -it "$OLLAMA_CID" ollama list
docker exec -it "$OLLAMA_CID" ollama run smollm2:135m "Reply with exactly OLLAMA_OK"- ensure per-task key path is active (
...threewords:<task_id>) - confirm new image is deployed to all replicas
- indicates mixed old/new tasks during rollout
- wait for convergence or force update service
- check logs:
docker service logs clawbucket_clawbucket-aggregator - ensure image includes
aggregator.py
- verify
clawbucket_memcachedis healthy (1/1)
This is simulation-first, not production-hardened.
Current tradeoffs include:
- Docker socket mount in app service
- unauthenticated Memcached bus
- lightweight coordination (not strict distributed consensus)
For production hardening: authn/authz, least-privilege control plane, stronger state/locking, and network isolation.
Current prompt flavor supports a Starship Troopers-style military tone for generated chatter.
Unit motto:
Follow Me
This README is the current canonical snapshot of behavior and architecture.
To run two isolated copies of the original design (rack-1 + rack-2) via Docker-in-Docker:
cd clawbucket
chmod +x deploy-racks.sh
./deploy-racks.shThis starts two nested Docker hosts:
rack-1-dind→ stackclawbucket(BOT 1 / Machine 1)rack-2-dind→ stackclawbucket(BOT 2 / Machine 2)
Dashboard identity labels (header badges):
- Machine Rack 1 (BOT 1)
- Machine Rack 2 (BOT 2)
Host endpoints:
- BOT 1 / Machine 1 dashboard:
http://localhost:18080 - BOT 1 / Machine 1 scoreboard:
http://localhost:18090/api/scoreboard - BOT 2 / Machine 2 dashboard:
http://localhost:28080 - BOT 2 / Machine 2 scoreboard:
http://localhost:28090/api/scoreboard
Files:
docker-compose.racks.yml(boot both DinD racks)deploy-racks.sh(swarm init + stack deploy per rack)
Revolt is a per-task action that transfers live task state from one rack to the other.
What gets transferred:
- score
- three-word text
- arm state
Flow:
- Click Revolt on a task tile in the source dashboard.
- Source rack snapshots state to disk under:
/tmp/clawbucket-revolt-snapshots/<snapshot_id>.json
- Target rack receives and persists the snapshot, then restores the task from that file.
APIs:
POST /api/revolt(source initiates handoff)POST /api/revolt/accept(target accepts + restores)GET /api/revolt/events(activity feed)
UI visibility:
- Revolt Activity panel shows handoff events.
- Tile badges show handoff state:
FROM <source>on target taskDEFECTEDon source task (if still present)
Peer linkage in dual-rack deploy is injected automatically:
- Rack 1 points to Rack 2 dashboard (
18080↔28080) - Rack 2 points to Rack 1 dashboard (
28080↔18080)