How the codebase is structured and how to modify it.
PersonalDataHub/
├── src/
│ ├── index.ts # Entry point — loads config, creates DB, starts server
│ ├── cli.ts # CLI commands (init, start, stop, status, mcp, reset)
│ ├── config/ # Configuration system
│ │ ├── types.ts # TypeScript interfaces (HubConfig, SourceConfig, etc.)
│ │ ├── schema.ts # Zod schemas for config validation
│ │ └── loader.ts # YAML loading with ${ENV_VAR} resolution
│ ├── db/ # Database layer
│ │ ├── db.ts # SQLite connection (WAL mode, foreign keys)
│ │ ├── schema.ts # CREATE TABLE statements
│ │ └── encryption.ts # AES-256-GCM encrypt/decrypt for OAuth tokens
│ ├── auth/ # Authentication
│ │ ├── oauth-routes.ts # OAuth flows (Gmail, GitHub)
│ │ └── token-manager.ts # Encrypted token storage and refresh
│ ├── connectors/ # Source service integrations
│ │ ├── types.ts # DataRow interface, SourceConnector interface
│ │ ├── gmail/
│ │ │ └── connector.ts # Gmail fetch, executeAction
│ │ └── github/
│ │ ├── connector.ts # GitHub access validation, fetch issues/PRs
│ │ └── setup.ts # Grant/revoke collaborator access
│ ├── filters.ts # Quick filter types, catalog, and apply logic
│ ├── mcp/ # MCP server
│ │ ├── server.ts # Stdio MCP server with source-specific tools
│ │ └── server.test.ts # MCP server tests
│ ├── audit/
│ │ └── log.ts # AuditLog class — typed write methods + filtered queries
│ ├── server/ # HTTP layer
│ │ ├── server.ts # Hono app setup, mounts API + GUI + OAuth routes
│ │ └── app-api.ts # POST /pull, POST /propose, GET /sources
│ ├── gui/
│ │ └── routes.ts # Self-contained HTML GUI with inline JS
│ └── test-utils.ts # Shared test utilities
├── packages/
│ └── personaldatahub/ # OpenClaw skill
│ └── src/
│ ├── index.ts # Plugin registration
│ ├── hub-client.ts # HTTP client for PersonalDataHub API
│ ├── tools.ts # Tool definitions
│ └── prompts.ts # System prompt for teaching agents
├── tests/
│ └── e2e/ # End-to-end integration tests
│ ├── helpers.ts # Shared setup (mock connector, in-memory DB)
│ ├── gmail-recent-readonly.test.ts
│ ├── gmail-metadata-only.test.ts
│ ├── gmail-full-access-redacted.test.ts
│ └── gmail-staged-action.test.ts
├── systemdesigns/ # Documentation
├── hub-config.example.yaml
├── package.json
├── tsconfig.json
├── vitest.config.ts
└── eslint.config.js
| Component | Choice |
|---|---|
| Language | TypeScript (strict, ESM, NodeNext modules) |
| Runtime | Node.js >= 22 |
| Database | better-sqlite3 (WAL mode) |
| Encryption | AES-256-GCM (application-level, for OAuth tokens) |
| HTTP Server | Hono (bound to 127.0.0.1) |
| Agent Protocol | MCP via @modelcontextprotocol/sdk |
| Config | YAML + Zod validation |
| Gmail API | googleapis |
| GitHub API | octokit |
| Tests | Vitest |
| Package Manager | pnpm |
# Build
pnpm build
# Watch mode for TypeScript
pnpm dev
# Run all tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Lint
pnpm lintTables in SQLite:
| Table | Purpose |
|---|---|
oauth_tokens |
Encrypted OAuth tokens per source |
owner_auth |
Bcrypt-hashed owner password for GUI access |
filters |
Quick filter definitions per source (type, value, enabled) |
staging |
Outbound actions pending owner review |
audit_log |
Every data movement with timestamps and purpose |
The normalized shape for all source data:
type DataRow = {
source: string; // "gmail", "github"
source_item_id: string; // original ID in source system
type: string; // "email", "issue", "pr", "commit"
timestamp: string; // ISO 8601
data: Record<string, unknown>; // all content fields
}Stored in the filters table:
type QuickFilter = {
id: string;
source: string; // "gmail", "github"
type: string; // "time_after", "from_include", "subject_include", etc.
value: string; // filter value (e.g., date, sender name, field name)
enabled: number; // 1 = active, 0 = disabled
}Available filter types: time_after, from_include, subject_include, exclude_sender, exclude_keyword, has_attachment, hide_field.
- Create
src/connectors/<source>/connector.tsimplementingSourceConnector:
interface SourceConnector {
name: string;
fetch(boundary: SourceBoundary, params?: Record<string, unknown>): Promise<DataRow[]>;
executeAction(actionType: string, actionData: Record<string, unknown>): Promise<ActionResult>;
}-
Register the connector in
src/index.tsby adding it to the connector registry. -
Add MCP tools for the new source in
src/mcp/server.ts— add aregisterXxxTools()function and wire it to the source name instartMcpServer(). -
Add tests in
src/connectors/<source>/<source>.test.ts. -
The connector maps source API responses into
DataRow[]— all content goes indata, the four fixed fields (source,source_item_id,type,timestamp) are set at the top level.
- Fetches live data from the connector using the configured boundary
- Loads enabled quick filters from the
filterstable for that source - Applies filters: row predicates first (time_after, from_include, etc.), then field removal (hide_field)
- Logs the access to the audit log
- Returns the filtered data
Request → fetch data (live from source) → apply filters → audit log → response
- Agent calls the MCP tool via stdio
- The MCP server builds an HTTP request body from the tool arguments
- Calls
POST /app/v1/pullon the local HTTP server - Returns the JSON response as MCP text content
Agent → MCP stdio → fetch(hubUrl/app/v1/pull) → HTTP server → connector → filters → response
- Inserts the action into the
stagingtable with statuspending - Logs the proposal to the audit log
- Owner reviews in the GUI and approves/rejects
- On approval, the connector's
executeActionis called
Tests are co-located with source files (*.test.ts) for unit tests, and in tests/e2e/ for integration tests.
The e2e tests use an in-memory SQLite database and a mock Gmail connector (defined in tests/e2e/helpers.ts) so they run without external services.
To run a specific test file:
npx vitest run tests/e2e/gmail-recent-readonly.test.tsTo run just the MCP server tests:
npx vitest run src/mcp/server.test.tsThe skill in packages/personaldatahub/ is a standalone package with its own tsconfig.json and test suite. It wraps the PersonalDataHub API endpoints as OpenClaw tools.
To work on it:
cd packages/personaldatahub
pnpm testThe skill has no dependency on the main PersonalDataHub source — it only talks to the Hub over HTTP.
The MCP server (src/mcp/server.ts) provides a stdio transport for MCP-compatible agents. It:
- Reads
~/.pdh/config.jsonfor the hub URL - Health-checks the running HTTP server
- Queries
GET /app/v1/sourcesto discover connected sources - Registers source-specific tools dynamically (only for connected sources)
To test the MCP server manually:
npx pdh mcp
# Logs registered tools to stderr, then listens on stdio for MCP protocol messagesTo add tools for a new source, add a registerXxxTools() function in src/mcp/server.ts and call it conditionally based on the source's connection status.