Skip to content

feat(api): Add drift detection, remediation API and modernize response format#6360

Open
jamengual wants to merge 18 commits intorunatlantis:mainfrom
jamengual:feat/api-drift-detection-v2
Open

feat(api): Add drift detection, remediation API and modernize response format#6360
jamengual wants to merge 18 commits intorunatlantis:mainfrom
jamengual:feat/api-drift-detection-v2

Conversation

@jamengual
Copy link
Copy Markdown
Contributor

Summary

This PR adds drift detection and remediation API endpoints to Atlantis, along with a modernized API response framework and drift webhook notifications. It supersedes #6098 with a clean rebase on current main, DCO sign-off on all commits, and fixes for all review feedback.

Key Features

  • Drift Detection API (POST /api/drift/detect) — triggers Terraform plan runs to detect infrastructure drift, supports auto-discovery via pre_workflow_hooks, stores results
  • Drift Status API (GET /api/drift/status) — returns cached drift detection results (read-only)
  • Drift Remediation API (POST /api/drift/remediate) — executes plan or apply to remediate drift, with dry-run mode
  • Remediation Results APIs (GET /api/drift/remediate, GET /api/drift/remediate/{id}) — list and retrieve remediation results
  • Drift Webhook Notifications — Slack messages and HTTP POST payloads when drift is detected, configured via event: drift in webhooks YAML
  • Modernized API Response Framework — standardized JSON envelope format (APIResponder, APIMiddleware) with structured errors, request tracing, consistent HTTP status codes
  • --enable-drift-detection server flag (default: false) — gates all drift functionality

Changes from #6098

  • Rebased cleanly on current main (no merge conflicts)
  • DCO sign-off on all commits
  • Fixed driftKey to include git ref (prevents different branches from overwriting drift data)
  • Updated tests to use gomock.NewController for NewMockLocker (compatible with upstream pegomock-to-gomock migration in feat: migrate locking mocks from pegomock to uber-go/mock (Phase 1) #6253)
  • Fixed gofmt formatting in drift webhook struct
  • Added drift detection cross-references to FAQ, locking, and usage docs

Previously Addressed (in #6098 commits)

  • Fixed race condition in getAPIMiddleware() using sync.Once
  • Used crypto/subtle.ConstantTimeCompare for API token validation
  • InternalError returns generic message to clients, logs details server-side
  • Complete() resets counters before recomputing to prevent double-counting
  • VCS type validation accepts all supported providers (BitbucketCloud, BitbucketServer, AzureDevops, Gitea)
  • Drift routes registered unconditionally (graceful 503 when feature disabled)
  • omitempty used instead of invalid omitzero JSON tag

Architecture

  • All drift functionality is feature-gated behind --enable-drift-detection
  • Drift webhooks are a parallel system that reuses existing Slack/HTTP clients but is independent from the apply webhook path
  • Non-PR workflows (PR: 0) skip PR-specific requirements (approved, mergeable) via ctx.API && ctx.Pull.Num == 0
  • In-memory storage with thread-safe sync.RWMutex (can be replaced with DB backend)

Files Changed

  • New: server/core/drift/ (storage, parser, remediation), server/events/models/drift.go, server/events/models/remediation.go, server/events/webhooks/drift*.go, server/controllers/api_response.go, server/controllers/api_types.go, docs/adr/0002-api-enhancement-drift-detection.md
  • Modified: server/controllers/api_controller.go, server/server.go, cmd/server.go, server/user_config.go, webhook infrastructure, API documentation

Test plan

  • All drift storage tests pass (including new ref-based key differentiation test)
  • All API controller drift/remediation tests pass
  • go build ./... succeeds
  • go vet passes on all affected packages
  • golangci-lint passes (only pre-existing issues in unrelated files)
  • CI pipeline (tests, linting, image build, DCO)
  • Manual testing with --enable-drift-detection flag

Closes #6098 (superseded by this PR)

Related: jamengual#6, jamengual#7

jamengual and others added 16 commits April 3, 2026 14:03
This commit implements Phases 1 and 2 of the API enhancement plan:

Phase 1 - Fix JSON error serialization:
- Add MarshalJSON to ProjectResult for proper error serialization
- Add MarshalJSON to Result for proper error serialization
- Maintain backwards-compatible flat JSON structure (no wrapper objects)
- Add comprehensive tests for JSON serialization

Phase 2 - Enable non-PR API workflows (drift detection):
- Add API field to ProjectContext for tracking API-triggered commands
- Modify CommandRequirementHandler to skip PR-specific requirements
  (approved, mergeable) when API=true and Pull.Num=0
- Propagate API flag from command.Context to ProjectContext
- Add tests for non-PR API requirement skipping

The error serialization fix addresses the issue where Go's error interface
serializes as {} by default. Custom MarshalJSON methods convert errors to
strings for proper JSON output while maintaining the existing flat API
response structure for backwards compatibility.

The non-PR workflow support enables drift detection and other API workflows
that don't operate in a PR context. PR-specific requirements are skipped
when appropriate, while security-relevant requirements (policies_passed,
undiverged) are still enforced.

Refs: ADR-0002
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Update API documentation to reflect recent enhancements:

Error Serialization:
- Document that Error fields contain error message strings when errors
  occur (not empty objects)
- Add error response example showing proper error format
- Add Error Handling tip explaining Error/Failure field semantics

Non-PR Workflows (Drift Detection):
- Document that when PR is omitted or set to 0, Atlantis operates in
  non-PR mode
- PR-specific requirements (approved, mergeable) are automatically
  skipped in this mode
- Security requirements (policies_passed, undiverged) remain enforced
- Add drift detection request example without PR parameter

These documentation updates correspond to code changes in:
- server/events/command/project_result.go (error serialization)
- server/events/command/result.go (error serialization)
- server/events/command_requirement_handler.go (non-PR workflow)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Phase 3: Drift Detection
- Add DriftSummary and ProjectDrift models for tracking infrastructure drift
- Add drift parser to extract drift metrics from plan output
- Add drift storage interface with in-memory implementation
- Add GET /api/drift/status endpoint to query drift state

Phase 4: Drift Remediation
- Add RemediationRequest/Result models with status tracking
- Add RemediationService interface with in-memory implementation
- Add POST /api/drift/remediate endpoint with plan/apply actions
- Support filtering by projects, workspaces, and drift-only mode

Both endpoints include comprehensive tests and API documentation.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Add three new API endpoints for drift management:

- POST /api/drift/detect: Trigger drift detection for projects
- GET /api/drift/remediate/{id}: Get a specific remediation result
- GET /api/drift/remediate?repository=X: List remediation results

Changes:
- Add DriftDetectionRequest, DriftDetectionResult models to drift.go
- Add DetectDrift, GetRemediationResult, ListRemediationResults handlers
- Add 12 comprehensive tests for new endpoints
- Update API documentation with new endpoint descriptions

This commit can be reverted independently if Phase 4+5 need to be
separated into different PRs (Phase 3+4 are in previous commit).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Fix gofmt formatting in api_controller.go
- Use strings.CutPrefix instead of HasPrefix+TrimPrefix
- Use slices.Contains instead of manual loops in remediation.go
- Use omitzero instead of omitempty for time.Time field
- Use tagged switch statement for status checks
- Fix markdown list style and spacing in api-endpoints.md

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Adds workflow trigger and push condition for the feature branch so that
test images are published to ghcr.io/jamengual/atlantis:drift-detection

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
This reverts commit c6722af.

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add API response envelope with structured error codes
- Create API DTOs (api_types.go) to separate API contract from internal models
- Add APIMiddleware and APIResponder for consistent error handling
- Refactor Plan, Apply, ListLocks endpoints to use new patterns
- Add request tracing via request_id in all responses
- Update tests to verify new API response format

All responses now follow pattern: {success, data, error, request_id, timestamp}
Error codes: VALIDATION_ERROR, UNAUTHORIZED, FORBIDDEN, INTERNAL_ERROR, SERVICE_UNAVAILABLE

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add Response Format section explaining the envelope pattern
- Document error codes table (VALIDATION_ERROR, UNAUTHORIZED, etc.)
- Update all endpoint sample responses with envelope format:
  - POST /api/plan
  - POST /api/apply
  - GET /api/locks
  - POST /api/drift/detect
  - GET /api/drift/status
  - POST /api/drift/remediate
  - GET /api/drift/remediate
  - GET /api/drift/remediate/{id}
- Use snake_case for all JSON field names per API DTOs
- Add request_id and timestamp to all responses

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
The drift and remediation endpoint handlers were implemented but never
registered in the router, causing 404s for all drift API endpoints.

Changes:
- Register 5 missing routes in server.go (drift/status, drift/detect,
  drift/remediate GET/POST, drift/remediate/{id})
- Extract SetupRoutes() from Start() to enable route registration testing
- Add TestSetupRoutes_APIRoutesRegistered covering all 8 API routes
- Fix docs: move authenticated remediate GET endpoints to Main Endpoints
  section, add auth headers to curl examples, correct default limit value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Change the PUSH condition to also push images on pull_request events,
enabling PR-based image builds for testing. Images are tagged with the
PR number (e.g., pr-6-alpine). Cosign signing and attestation remain
gated to non-PR events.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add --enable-drift-detection config flag to conditionally initialize
  DriftStorage and RemediationService on the APIController
- Fix race condition in getAPIMiddleware() using sync.Once
- Use crypto/subtle.ConstantTimeCompare for API token validation to
  prevent timing attacks
- Fix HTTP status from 400 to 503 when API is disabled (RequireAuth)
- Add defer Locker.UnlockByPull in remediation executor's ExecutePlan
  and ExecuteApply to prevent lock leaks
- Fix VCS type validation: replace "Bitbucket" with "BitbucketCloud"
  and "BitbucketServer" in both drift and remediation validators
- Stop leaking internal error details to API clients in InternalError()
- Return success:false envelope when Plan/Apply have errors (HTTP 500)
- Fix omitzero -> omitempty on RemediationResult.CompletedAt
- Reset SuccessCount/FailureCount in Complete() before recomputing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Add a parallel DriftWebhookSender system that sends notifications to
Slack and HTTP endpoints when drift is detected. This reuses the
existing SlackClient and HttpClient infrastructure but is completely
independent of the apply webhook path (easy to remove if needed).

- DriftSender interface + DriftWebhookSender for fire-and-forget dispatch
- DriftSlackWebhook with drift-specific Slack attachment formatting
- DriftHttpWebhook that POSTs JSON drift payloads to HTTP endpoints
- PostDriftMessage added to SlackClient interface
- NewDriftWebhookSender factory filters configs to event: drift
- Wired into APIController.DetectDrift (sends when drift is found)
- DriftDetectionResult now includes unique detection ID (uuid)
- 18 new tests covering sender, factory, Slack, HTTP, and validation

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add "Drift detection webhooks" section to webhook notifications doc
  with Slack message format, HTTP JSON payload schema, and config examples
- Update /api/drift/detect docs to mention webhook notifications
- Add id and detection_id fields to drift detect sample response
- Update --enable-drift-detection flag docs to mention webhook support

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
…matting

- Include git ref in driftKey to prevent different branches from
  overwriting each other's drift data
- Update api_controller_test.go to use gomock.NewController for
  NewMockLocker (required after upstream pegomock-to-gomock migration)
- Fix gofmt formatting in drift.go webhook struct alignment
- Add test for ref-based key differentiation in drift storage

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
… docs

- Add drift detection FAQ entries (how to detect drift, scheduled detection)
- Document locking behavior during drift detection/remediation in locking.md
- Add API-based workflows section to using-atlantis.md with links to
  drift detection, status, and remediation endpoints

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 3, 2026 21:23
@dosubot dosubot bot added feature New functionality/enhancement go Pull requests that update Go code labels Apr 3, 2026
@github-actions github-actions bot added docs Documentation github-actions labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive drift detection and remediation API endpoints to Atlantis, along with a modernized API response framework with structured error handling and request tracing. The feature is gated behind a --enable-drift-detection server flag and includes webhook notifications, in-memory storage, and comprehensive test coverage.

Changes:

  • API Modernization: Introduces a standardized envelope response format with structured errors, request IDs, and timestamps for all API endpoints
  • Drift Detection APIs: New endpoints for detecting infrastructure drift (POST /api/drift/detect), querying cached results (GET /api/drift/status), and executing remediation (POST /api/drift/remediate)
  • Error Serialization Fix: Custom JSON marshaling to properly serialize Go error types as strings instead of empty objects, maintaining backward compatibility with existing API response structure
  • Non-PR Workflow Support: Enables API operations without PR context by skipping PR-specific requirements while maintaining other security checks
  • Drift Webhooks: Sends Slack and HTTP notifications when drift is detected
  • Storage & Remediation: In-memory storage for drift results and a service to execute plan/apply remediation operations

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated no comments.

Show a summary per file
File Description
server/user_config.go Adds EnableDriftDetection configuration flag
server/server.go Initializes drift storage, remediation service, and webhook sender when enabled; extracts route registration to SetupRoutes()
server/controllers/api_response.go New file implementing modernized response envelope with structured errors and authentication middleware
server/controllers/api_types.go New API DTO types for drift, remediation, and plan/apply results with conversion functions
server/controllers/api_controller.go Implements drift detection, remediation, and status endpoints; modernizes existing plan/apply endpoints
server/events/command/project_result.go Adds custom JSON marshaling to fix error serialization while preserving backward-compatible flat structure
server/events/command/result.go Adds custom JSON marshaling for top-level result errors
server/events/command/project_context.go Adds API field to support non-PR workflows
server/events/command_requirement_handler.go Skips PR-specific requirements for non-PR API calls
server/events/models/drift.go New drift models and validation for detection requests
server/events/models/remediation.go New remediation models with status tracking and validation
server/events/webhooks/drift.go New drift webhook sender with Slack and HTTP support
server/core/drift/storage.go In-memory implementation of drift storage with repository/project keying
server/core/drift/remediation.go In-memory remediation service executing plan/apply operations
docs/adr/0002-api-enhancement-drift-detection.md Comprehensive architecture decision record documenting design rationale
Documentation files Updated API endpoints, server configuration, FAQ, webhooks, and locking docs

- Fix goimports ordering in cmd/server.go, server/server.go, user_config.go
- Remove unused apiReportError and respond methods from APIController
- Use omitzero instead of omitempty for time.Time (Go 1.24+ modernize lint)
- Convert if/else chain to tagged switch for remediation status codes
- Fix markdown lint: add blank line before list in using-atlantis.md

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
The push-on-PR change was only needed for fork image testing
and should not be included in the upstream PR.

Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation feature New functionality/enhancement github-actions go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants