Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4b37779
Initial plan
Copilot Mar 24, 2026
64f791a
data_sources: add account_id field to DataSource model
Copilot Mar 24, 2026
6745921
migration: add account_id to data_source table (9877450113f6)
Copilot Mar 24, 2026
78fb04f
migration: replace N+1 loop with single UPDATE...FROM for account_id …
Copilot Mar 24, 2026
65efbdd
tests/sensor_data: verify account_id is set on data source when posti…
Copilot Mar 24, 2026
85778e4
migration: use correlated subquery for account_id data migration; add…
Copilot Mar 24, 2026
06bbec9
data_sources: fix account_id assignment to avoid SAWarning and handle…
Copilot Mar 24, 2026
8b02034
agents/architecture: add migration checklist and DataSource account_i…
Copilot Mar 24, 2026
d7f7120
agents/test-specialist: add DataSource property testing pattern and l…
Copilot Mar 24, 2026
20a4e15
agents/review-lead: add agent selection checklist; document repeated …
Copilot Mar 24, 2026
fff71b2
agents/coordinator: document PR #2058 persistent self-improvement fai…
Copilot Mar 24, 2026
4191ae3
data_sources: use explicit is not None check for account_id assignment
Copilot Mar 24, 2026
38db656
feat: improve test by using a user that is not already registered as …
Flix6x Mar 24, 2026
67eba6e
fix: set approximate create datetime based on git info, assuming loca…
Flix6x Mar 24, 2026
b7a4a5c
style: flake8
Flix6x Mar 24, 2026
44cff24
feat: add migration check to architecture agent instructions
Flix6x Mar 24, 2026
95653f1
docs: add db upgrade warning and update agent instructions accordingly
Flix6x Mar 24, 2026
4e22e43
data_sources: drop FK constraints on user_id and account_id for data …
Copilot Mar 25, 2026
5c29f56
tests/data-lineage: also assert account_id and user_id preservation i…
Copilot Mar 25, 2026
97efde0
docs: expand PR #2058 changelog entry to cover FK drop and lineage pr…
Copilot Mar 25, 2026
6abc8d9
coordinator: document no-FK lineage pattern and changelog completenes…
Copilot Mar 25, 2026
ebfba1d
coordinator: clarify account_id fallback pattern references actual im…
Copilot Mar 25, 2026
c64b4ee
feat: add test covering CLI command to delete user
Flix6x Mar 27, 2026
531cb74
Merge remote-tracking branch 'origin/main' into copilot/add-account-i…
Flix6x Mar 27, 2026
57c1f8e
feat: let audit log entries retain the account ID and user IDs after …
Flix6x Mar 27, 2026
e1d6cdc
docs: rewrite changelog entry
Flix6x Mar 27, 2026
7ec2cb1
fix: dropped foreign key name
Flix6x Mar 27, 2026
0fefb2e
fix: when deleting a user, still log the ID of the deleted user
Flix6x Mar 27, 2026
6bcf9c6
feat: allow tying a newly CLI-created data source to an account
Flix6x Mar 27, 2026
faa876f
fix: add missing drop_constraints
Flix6x Mar 27, 2026
317c687
chore: minimize diff
Flix6x Mar 27, 2026
4401355
style: flake8
Flix6x Mar 27, 2026
d7a4ec1
fix: test
Flix6x Mar 27, 2026
36989e3
Merge origin/main into copilot/add-account-id-to-data-source
Flix6x Apr 2, 2026
8bb5ebe
agents: rename Review Lead to Lead across all agent instructions
Flix6x Apr 2, 2026
86d3ebb
db: fix merge migration to properly resolve conflicting heads
Flix6x Apr 2, 2026
7d5fa60
agents: remove merge conflict markers from coordinator.md
Flix6x Apr 2, 2026
4724959
style: flake8
Flix6x Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 91 additions & 39 deletions .github/agents/coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,79 +292,79 @@ Agents should escalate to the Coordinator when:
- Encourage agent autonomy and expertise
- Provide actionable feedback via review comments

### Review Lead Delegation Pattern Monitoring
### Lead Delegation Pattern Monitoring

**The Coordinator MUST verify Review Lead delegation patterns during governance reviews.**
**The Coordinator MUST verify Lead delegation patterns during governance reviews.**

**Context:** Review Lead has a recurring failure mode of working solo instead of delegating to specialists (observed in session 2026-02-08).
**Context:** Lead has a recurring failure mode of working solo instead of delegating to specialists (observed in session 2026-02-08).

**What to check:**

When reviewing a session where Review Lead was involved:
When reviewing a session where Lead was involved:

- [ ] **Delegation occurred**: Did Review Lead invoke appropriate specialists?
- [ ] **No solo execution**: Did Review Lead make code/API/docs changes itself?
- [ ] **Git commit author check**: Are there Review Lead commits with production code?
- [ ] **Request interpretation**: Did Review Lead parse user intent correctly?
- [ ] **Delegation occurred**: Did Lead invoke appropriate specialists?
- [ ] **No solo execution**: Did Lead make code/API/docs changes itself?
- [ ] **Git commit author check**: Are there Lead commits with production code?
- [ ] **Request interpretation**: Did Lead parse user intent correctly?
- [ ] **Regression indicators**: Any signs of "too simple to delegate" thinking?

**Red flags (immediate governance concern):**

- 🚩 Review Lead commits containing code changes (should be specialist commits)
- 🚩 Review Lead commits containing test changes (should be Test Specialist)
- 🚩 Review Lead commits containing doc changes (should be Documentation Specialist)
- 🚩 Lead commits containing code changes (should be specialist commits)
- 🚩 Lead commits containing test changes (should be Test Specialist)
- 🚩 Lead commits containing doc changes (should be Documentation Specialist)
- 🚩 User says "You are regressing" or "You must handle requests as a team"
- 🚩 Session closed without specialist involvement on implementation tasks
- 🚩 Review Lead justifies solo work with "too simple to delegate"
- 🚩 Lead justifies solo work with "too simple to delegate"

**Verification commands:**

```bash
# Check who made commits
git log --oneline --all --since="1 day ago" --format="%h %an %s"

# Check Review Lead commit types
git log --author="Review Lead" --oneline -10
# Check Lead commit types
git log --author="Lead" --oneline -10

# Look for code changes by Review Lead (should be empty or synthesis only)
git log --author="Review Lead" --stat -5
# Look for code changes by Lead (should be empty or synthesis only)
git log --author="Lead" --stat -5
```

**When delegation failure detected:**

1. **Document in session review** - What was the failure?
2. **Check Review Lead instructions** - Were they followed?
2. **Check Lead instructions** - Were they followed?
3. **Identify gap** - What prevented proper delegation?
4. **Recommend fix** - How to prevent recurrence?
5. **Update Review Lead instructions** - Add enforcement mechanism
5. **Update Lead instructions** - Add enforcement mechanism
6. **Verify fix works** - Test with hypothetical scenario

**Escalation pattern:**

If Review Lead repeatedly violates delegation requirements:
If Lead repeatedly violates delegation requirements:
- This is a systemic issue requiring Coordinator intervention
- Review Lead instructions need stronger enforcement
- Lead instructions need stronger enforcement
- Consider adding mandatory checkpoints before work execution
- May need explicit blockers to prevent solo execution

**Common patterns to track:**

| Pattern | Indicator | Action |
|---------|-----------|--------|
| Solo execution | Review Lead makes code commits | Flag as regression |
| "Too simple" trap | Review Lead justifies not delegating | Update instructions with example |
| Request misinterpretation | Review Lead confirms instead of implements | Strengthen request parsing guidance |
| Solo execution | Lead makes code commits | Flag as regression |
| "Too simple" trap | Lead justifies not delegating | Update instructions with example |
| Request misinterpretation | Lead confirms instead of implements | Strengthen request parsing guidance |
| Delegation omission | Specialists not invoked on implementation | Verify Session Close Checklist followed |

**Success indicators:**

- ✅ Review Lead invoked appropriate specialists
- ✅ Lead invoked appropriate specialists
- ✅ Specialists made the actual changes
- ✅ Review Lead synthesized findings
- ✅ Lead synthesized findings
- ✅ Team-based execution pattern maintained
- ✅ Session Close Checklist verified delegation

**This monitoring ensures Review Lead maintains its orchestration role and doesn't regress to solo execution.**
**This monitoring ensures Lead maintains its orchestration role and doesn't regress to solo execution.**

## Self-Improvement Notes

Expand Down Expand Up @@ -551,25 +551,25 @@ Lead should now invoke Coordinator as subagent.
- Governance process shown to be optional (dangerous precedent)

**Solution implemented**:
1. ✅ Added mandatory "Session Close Checklist" to Review Lead (commit 3ad8908)
1. ✅ Added mandatory "Session Close Checklist" to Lead (commit 3ad8908)
2. ✅ Added "Full Test Suite Requirement" to Test Specialist (commit 8d67f3c)
3. ✅ Added "Pre-commit Hook Enforcement" to Tooling & CI Specialist (commit dfe67e8)
4. ✅ Added "Session Close Verification" pattern to Coordinator (this commit)

**Structural changes**:
- Review Lead now has comprehensive checklist before closing any session
- Lead now has comprehensive checklist before closing any session
- Test Specialist must execute full suite, not just feature-specific tests
- Tooling & CI Specialist must verify pre-commit execution
- Coordinator enforces Review Lead checklist completion
- Coordinator enforces Lead checklist completion

**New Coordinator pattern (Pattern #7)**:
When invoked for governance review, Coordinator must verify:
- [ ] Review Lead followed session close checklist
- [ ] Lead followed session close checklist
- [ ] No checklist items were skipped without justification
- [ ] Evidence provided for each checklist item

**Enforcement escalation**:
If Review Lead repeatedly closes sessions without completing checklist:
If Lead repeatedly closes sessions without completing checklist:
1. First occurrence: Document and update instructions (this session)
2. Second occurrence: Require explicit justification for skips
3. Third occurrence: Escalate to architectural solution (automated checks)
Expand All @@ -584,11 +584,63 @@ If Review Lead repeatedly closes sessions without completing checklist:

These patterns must not repeat. Agent instructions have been updated to prevent recurrence.

### Additional Pattern Discovered (2026-03-24)

**Pattern**: Persistent self-improvement failure and missing API Specialist agent selection

**Session**: PR #2058 — Add `account_id` to DataSource table

**Observation**: After three sessions now, the same two failures recur:
1. Coordinator is not invoked at end of session (despite MUST requirement in Lead instructions)
2. No agent updates its own instructions (despite MUST requirement in all agents)

**Root cause analysis**:
- "Coordinator invocation" and "self-improvement" are both documented as mandatory last steps
- But the session ends before they are reached — they are treated as optional epilogue, not gating requirements
- The Lead agent selection is ad-hoc, with no explicit checklist forcing API Specialist engagement when endpoints change

**What was missed in PR #2058**:
- API Specialist not engaged: POST sensor data now sets `account_id` on the resulting data source — this is an endpoint behavior change that should be reviewed for backward compatibility
- Zero agent instruction updates across all three participating agents (Architecture Specialist, Test Specialist, Lead)
- No Coordinator invocation despite explicit user request in the original prompt

**Solutions implemented**:
- Architecture Specialist: Added Alembic migration checklist + DataSource domain invariants
- Test Specialist: Added DataSource property testing pattern + lessons learned
- Lead: Added Agent Selection Checklist mapping code change types to required agents; documented 3rd recurrence of these failures
- Coordinator (this file): Documented case study

**Governance escalation**: The Lead's "Must Always Run Coordinator" requirement has now been documented in three sessions without being followed. If it fails a fourth time, consider structural changes — e.g., making Coordinator invocation the FIRST step of a session rather than the last, so it sets context rather than being a forgotten epilogue.

**Code observation from PR #2058 worth tracking**:
- An early draft used `user.account_id or (user.account.id if user.account else None)` — the `or` pattern is fragile for `account_id=0` (unrealistic but worth noting). The final implementation correctly uses `if user.account_id is not None` (see `data_sources.py` lines 340-343) — this is the right pattern to follow.
- Empty "Initial plan" commit adds git history noise. When orchestrating agents, the first commit should be functional code, not a planning marker.

### Additional Pattern Discovered (2026-03-25)

**Pattern**: No-FK columns for data lineage preservation

**Session**: PR #2058 continued — Drop FK constraints on `data_source.user_id` and `data_source.account_id`

**Design decision documented**:
FlexMeasures now intentionally drops DB-level FK constraints on `DataSource.user_id` and `DataSource.account_id` so that historical lineage references survive user/account deletion. The ORM uses `passive_deletes="all"` to prevent auto-nullification.

**Checklist implication for future PRs**:
When reviewing schema changes that affect FK constraints:
- [ ] If a FK is dropped intentionally for lineage: verify `passive_deletes="all"` on the ORM relationship AND its backref
- [ ] Verify tests check that the orphaned column values are NOT nullified after parent deletion
- [ ] Verify changelog describes the *behavior change* (lineage preservation), not just the schema change (column added)

**Changelog completeness check** — lessons from this session:
- The initial changelog entry for PR #2058 only described adding `account_id`; it omitted the FK drop and behavior change
- When a migration both adds a column AND changes deletion semantics (e.g., drops a FK), the changelog must cover BOTH aspects
- Coordinator caught this and updated the entry to read: "...also drop FK constraints on `data_source.user_id` and `data_source.account_id` to preserve data lineage (historical user/account IDs are no longer nullified when users or accounts are deleted)"

### Session 2026-02-10: Annotation API Implementation (#470)

**Pattern**: Systemic self-improvement failure across all agents

**Observation**: Five agents completed substantial work (Architecture, API, Test, Documentation, Review Lead):
**Observation**: Five agents completed substantial work (Architecture, API, Test, Documentation, Lead):
- Created new API endpoints (3 POST endpoints)
- Wrote 17 comprehensive test functions
- Created 494-line feature guide documentation
Expand All @@ -605,17 +657,17 @@ These patterns must not repeat. Agent instructions have been updated to prevent
**Root causes identified**:
1. **Self-improvement not enforced**: No blocking requirement, agents treat as optional
2. **Unclear triggers**: Agents don't know when to update instructions ("after completing work" too vague)
3. **No verification**: Review Lead doesn't check if agents self-improved
3. **No verification**: Lead doesn't check if agents self-improved
4. **Invisible requirement**: Self-improvement not in task completion checklist

**Secondary violations observed**:
- Temporary file committed (`API_REVIEW_ANNOTATIONS.md`, 575 lines) then removed
- Non-atomic commits mixing multiple concerns
- Test claims without execution evidence
- Review Lead didn't invoke Coordinator despite governance request
- Lead didn't invoke Coordinator despite governance request

**Solution implemented**:
1. Added self-improvement enforcement to Review Lead checklist (see below)
1. Added self-improvement enforcement to Lead checklist (see below)
2. Documented temporary file prevention patterns
3. Added test execution evidence requirement
4. Strengthened Coordinator invocation triggers
Expand All @@ -628,14 +680,14 @@ These patterns must not repeat. Agent instructions have been updated to prevent

**Future sessions**: Monitor whether self-improvement enforcement works. If pattern recurs 3+ times, escalate to architectural solution (e.g., automated checks, mandatory prompts).

**Session 2026-02-10 (Annotation API Tests)**: Pattern recurred despite Review Lead update. Test Specialist fixed 32 annotation API tests (100% passing), made excellent technical commits, but did NOT update instructions with learned lessons (permission semantics, fixture selection, error code expectations). Review Lead enforcement unclear—may not have been involved in session. **Status**: Pattern persists. Approaching threshold for architectural solution.
**Session 2026-02-10 (Annotation API Tests)**: Pattern recurred despite Lead update. Test Specialist fixed 32 annotation API tests (100% passing), made excellent technical commits, but did NOT update instructions with learned lessons (permission semantics, fixture selection, error code expectations). Lead enforcement unclear—may not have been involved in session. **Status**: Pattern persists. Approaching threshold for architectural solution.

### Enforcement Mechanism Added

**New requirement for Review Lead**: Before marking task complete, verify:
**New requirement for Lead**: Before marking task complete, verify:

```markdown
## Task Completion Checklist (Review Lead)
## Task Completion Checklist (Lead)

- [ ] Code review completed and feedback addressed
- [ ] Security scan completed and alerts investigated
Expand All @@ -645,7 +697,7 @@ These patterns must not repeat. Agent instructions have been updated to prevent
- [ ] No temporary analysis files committed
```

If any agent hasn't self-improved, Review Lead must:
If any agent hasn't self-improved, Lead must:
1. Request agent update their instructions
2. Wait for update
3. Review update for quality
Expand Down
51 changes: 35 additions & 16 deletions .github/agents/test-specialist.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,11 @@ If ANY test fails during full suite execution:
**Click context errors**:
- Check IdField decorators (`@with_appcontext` vs `@with_appcontext_if_needed()`)
- Compare against SensorIdField pattern
- See Review Lead's Click context error pattern
- See Lead's Click context error pattern

### Integration with Review Lead
### Integration with Lead

The Test Specialist MUST provide evidence of full test suite execution to Review Lead.
The Test Specialist MUST provide evidence of full test suite execution to Lead.

**Required evidence format:**
```
Expand All @@ -134,14 +134,14 @@ Full test suite execution:
- Coverage: 87.2% (unchanged)
```

**Review Lead verification:**
Review Lead's session close checklist includes:
**Lead verification:**
Lead's session close checklist includes:
- [ ] Test Specialist confirmed full test suite execution
- [ ] All tests pass (100%)
- [ ] Test output captured and reviewed

**Enforcement:**
Review Lead cannot close session until Test Specialist provides evidence of full test suite execution with 100% pass rate.
Lead cannot close session until Test Specialist provides evidence of full test suite execution with 100% pass rate.


## Testing Patterns for flexmeasures
Expand Down Expand Up @@ -880,6 +880,30 @@ flask db current
- **Assertions**: Use descriptive assertion messages for failures
- **Mocking**: Use pytest fixtures and mocking when testing external dependencies

### Testing DataSource Properties After API Calls

When writing tests that verify data source properties (e.g. `account_id`, `user`, `type`) after an API call:

1. **Use `fresh_db` fixture** — tests that POST data and then query the resulting data source are modifying the DB and must use the function-scoped `fresh_db` fixture. Place these tests in a `_fresh_db` module.

2. **Query by user, not just name** — data sources created by the same user across test runs may collide; use `filter_by(user=user)` or `filter_by(user_id=user.id)` for precision.

3. **Pattern** (from `test_post_sensor_data_sets_account_id_on_data_source`):
```python
# Fetch the user that made the request
user = db.session.execute(
select(User).filter_by(email="test_supplier_user_4@seita.nl")
).scalar_one()
# Fetch the data source created for that user
data_source = db.session.execute(
select(Source).filter_by(user=user)
).scalar_one_or_none()
assert data_source is not None
assert data_source.account_id == user.account_id
```

4. **Check both existence and value** — don't just assert `data_source is not None`; also assert the specific field value you're testing.

## Understanding Test Design Intent (CRITICAL)

**Before changing a test, understand WHY it's designed that way.**
Expand Down Expand Up @@ -1060,14 +1084,9 @@ After each assignment:
- Added guidance on <specific topic>
```

Example:
### Lessons Learned

```
agents/test-specialist: learned to verify claims with actual test runs
Context:
- Session #456 claimed tests passed but they were never actually run
- Led to bug slipping through to production
Change:
- Added "Actually Run Tests" section with verification steps
- Emphasized checking test output and coverage
```
**Session 2026-03-24 (PR #2058 — add account_id to DataSource)**:

- **Self-improvement failure**: Despite having explicit instructions to update this agent file after each assignment, no update was made during this PR session. This was caught by the Coordinator post-hoc. The agent must treat instruction updates as the LAST mandatory step of any assignment.
- **DataSource property testing**: Added guidance in "Testing DataSource Properties After API Calls" above. When testing properties set by the API on a data source (like `account_id`), use `fresh_db`, query by user to avoid ambiguity, and assert both existence and the specific field value.
12 changes: 6 additions & 6 deletions .github/agents/tooling-ci-specialist.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,12 +163,12 @@ Code that bypasses pre-commit:

**Who runs pre-commit:**
- **During code changes**: Agent making changes runs pre-commit before committing
- **Before PR close**: Review Lead verifies pre-commit execution
- **Before PR close**: Lead verifies pre-commit execution
- **In PR review**: Tooling & CI Specialist validates config matches CI

**Enforcement:**
- Review Lead's session close checklist includes pre-commit verification
- Review Lead cannot close session without pre-commit evidence
- Lead's session close checklist includes pre-commit verification
- Lead cannot close session without pre-commit evidence
- If pre-commit fails, agent must fix all issues before proceeding

#### Common Failures and Fixes
Expand Down Expand Up @@ -206,9 +206,9 @@ black .
ci/run_mypy.sh
```

#### Integration with Review Lead
#### Integration with Lead

**Review Lead checklist items:**
**Lead checklist items:**
- [ ] Pre-commit hooks installed
- [ ] All hooks pass: `pre-commit run --all-files`
- [ ] Zero failures from flake8, black, mypy
Expand All @@ -219,7 +219,7 @@ ci/run_mypy.sh
- Or confirm: "Pre-commit verified: all hooks passed"

**Enforcement:**
Review Lead MUST verify pre-commit execution before closing session.
Lead MUST verify pre-commit execution before closing session.

### Agent Environment Setup

Expand Down
Loading
Loading