feat(api): Add drift detection, remediation API and modernize response format#6360
Open
jamengual wants to merge 18 commits intorunatlantis:mainfrom
Open
feat(api): Add drift detection, remediation API and modernize response format#6360jamengual wants to merge 18 commits intorunatlantis:mainfrom
jamengual wants to merge 18 commits intorunatlantis:mainfrom
Conversation
This commit implements Phases 1 and 2 of the API enhancement plan:
Phase 1 - Fix JSON error serialization:
- Add MarshalJSON to ProjectResult for proper error serialization
- Add MarshalJSON to Result for proper error serialization
- Maintain backwards-compatible flat JSON structure (no wrapper objects)
- Add comprehensive tests for JSON serialization
Phase 2 - Enable non-PR API workflows (drift detection):
- Add API field to ProjectContext for tracking API-triggered commands
- Modify CommandRequirementHandler to skip PR-specific requirements
(approved, mergeable) when API=true and Pull.Num=0
- Propagate API flag from command.Context to ProjectContext
- Add tests for non-PR API requirement skipping
The error serialization fix addresses the issue where Go's error interface
serializes as {} by default. Custom MarshalJSON methods convert errors to
strings for proper JSON output while maintaining the existing flat API
response structure for backwards compatibility.
The non-PR workflow support enables drift detection and other API workflows
that don't operate in a PR context. PR-specific requirements are skipped
when appropriate, while security-relevant requirements (policies_passed,
undiverged) are still enforced.
Refs: ADR-0002
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Update API documentation to reflect recent enhancements: Error Serialization: - Document that Error fields contain error message strings when errors occur (not empty objects) - Add error response example showing proper error format - Add Error Handling tip explaining Error/Failure field semantics Non-PR Workflows (Drift Detection): - Document that when PR is omitted or set to 0, Atlantis operates in non-PR mode - PR-specific requirements (approved, mergeable) are automatically skipped in this mode - Security requirements (policies_passed, undiverged) remain enforced - Add drift detection request example without PR parameter These documentation updates correspond to code changes in: - server/events/command/project_result.go (error serialization) - server/events/command/result.go (error serialization) - server/events/command_requirement_handler.go (non-PR workflow) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Phase 3: Drift Detection - Add DriftSummary and ProjectDrift models for tracking infrastructure drift - Add drift parser to extract drift metrics from plan output - Add drift storage interface with in-memory implementation - Add GET /api/drift/status endpoint to query drift state Phase 4: Drift Remediation - Add RemediationRequest/Result models with status tracking - Add RemediationService interface with in-memory implementation - Add POST /api/drift/remediate endpoint with plan/apply actions - Support filtering by projects, workspaces, and drift-only mode Both endpoints include comprehensive tests and API documentation. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Add three new API endpoints for drift management:
- POST /api/drift/detect: Trigger drift detection for projects
- GET /api/drift/remediate/{id}: Get a specific remediation result
- GET /api/drift/remediate?repository=X: List remediation results
Changes:
- Add DriftDetectionRequest, DriftDetectionResult models to drift.go
- Add DetectDrift, GetRemediationResult, ListRemediationResults handlers
- Add 12 comprehensive tests for new endpoints
- Update API documentation with new endpoint descriptions
This commit can be reverted independently if Phase 4+5 need to be
separated into different PRs (Phase 3+4 are in previous commit).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Fix gofmt formatting in api_controller.go - Use strings.CutPrefix instead of HasPrefix+TrimPrefix - Use slices.Contains instead of manual loops in remediation.go - Use omitzero instead of omitempty for time.Time field - Use tagged switch statement for status checks - Fix markdown list style and spacing in api-endpoints.md Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Adds workflow trigger and push condition for the feature branch so that test images are published to ghcr.io/jamengual/atlantis:drift-detection Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
This reverts commit c6722af. Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add API response envelope with structured error codes
- Create API DTOs (api_types.go) to separate API contract from internal models
- Add APIMiddleware and APIResponder for consistent error handling
- Refactor Plan, Apply, ListLocks endpoints to use new patterns
- Add request tracing via request_id in all responses
- Update tests to verify new API response format
All responses now follow pattern: {success, data, error, request_id, timestamp}
Error codes: VALIDATION_ERROR, UNAUTHORIZED, FORBIDDEN, INTERNAL_ERROR, SERVICE_UNAVAILABLE
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add Response Format section explaining the envelope pattern
- Document error codes table (VALIDATION_ERROR, UNAUTHORIZED, etc.)
- Update all endpoint sample responses with envelope format:
- POST /api/plan
- POST /api/apply
- GET /api/locks
- POST /api/drift/detect
- GET /api/drift/status
- POST /api/drift/remediate
- GET /api/drift/remediate
- GET /api/drift/remediate/{id}
- Use snake_case for all JSON field names per API DTOs
- Add request_id and timestamp to all responses
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
The drift and remediation endpoint handlers were implemented but never
registered in the router, causing 404s for all drift API endpoints.
Changes:
- Register 5 missing routes in server.go (drift/status, drift/detect,
drift/remediate GET/POST, drift/remediate/{id})
- Extract SetupRoutes() from Start() to enable route registration testing
- Add TestSetupRoutes_APIRoutesRegistered covering all 8 API routes
- Fix docs: move authenticated remediate GET endpoints to Main Endpoints
section, add auth headers to curl examples, correct default limit value
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Change the PUSH condition to also push images on pull_request events, enabling PR-based image builds for testing. Images are tagged with the PR number (e.g., pr-6-alpine). Cosign signing and attestation remain gated to non-PR events. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add --enable-drift-detection config flag to conditionally initialize DriftStorage and RemediationService on the APIController - Fix race condition in getAPIMiddleware() using sync.Once - Use crypto/subtle.ConstantTimeCompare for API token validation to prevent timing attacks - Fix HTTP status from 400 to 503 when API is disabled (RequireAuth) - Add defer Locker.UnlockByPull in remediation executor's ExecutePlan and ExecuteApply to prevent lock leaks - Fix VCS type validation: replace "Bitbucket" with "BitbucketCloud" and "BitbucketServer" in both drift and remediation validators - Stop leaking internal error details to API clients in InternalError() - Return success:false envelope when Plan/Apply have errors (HTTP 500) - Fix omitzero -> omitempty on RemediationResult.CompletedAt - Reset SuccessCount/FailureCount in Complete() before recomputing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Add a parallel DriftWebhookSender system that sends notifications to Slack and HTTP endpoints when drift is detected. This reuses the existing SlackClient and HttpClient infrastructure but is completely independent of the apply webhook path (easy to remove if needed). - DriftSender interface + DriftWebhookSender for fire-and-forget dispatch - DriftSlackWebhook with drift-specific Slack attachment formatting - DriftHttpWebhook that POSTs JSON drift payloads to HTTP endpoints - PostDriftMessage added to SlackClient interface - NewDriftWebhookSender factory filters configs to event: drift - Wired into APIController.DetectDrift (sends when drift is found) - DriftDetectionResult now includes unique detection ID (uuid) - 18 new tests covering sender, factory, Slack, HTTP, and validation Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
- Add "Drift detection webhooks" section to webhook notifications doc with Slack message format, HTTP JSON payload schema, and config examples - Update /api/drift/detect docs to mention webhook notifications - Add id and detection_id fields to drift detect sample response - Update --enable-drift-detection flag docs to mention webhook support Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
…matting - Include git ref in driftKey to prevent different branches from overwriting each other's drift data - Update api_controller_test.go to use gomock.NewController for NewMockLocker (required after upstream pegomock-to-gomock migration) - Fix gofmt formatting in drift.go webhook struct alignment - Add test for ref-based key differentiation in drift storage Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
… docs - Add drift detection FAQ entries (how to detect drift, scheduled detection) - Document locking behavior during drift detection/remediation in locking.md - Add API-based workflows section to using-atlantis.md with links to drift detection, status, and remediation endpoints Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive drift detection and remediation API endpoints to Atlantis, along with a modernized API response framework with structured error handling and request tracing. The feature is gated behind a --enable-drift-detection server flag and includes webhook notifications, in-memory storage, and comprehensive test coverage.
Changes:
- API Modernization: Introduces a standardized envelope response format with structured errors, request IDs, and timestamps for all API endpoints
- Drift Detection APIs: New endpoints for detecting infrastructure drift (
POST /api/drift/detect), querying cached results (GET /api/drift/status), and executing remediation (POST /api/drift/remediate) - Error Serialization Fix: Custom JSON marshaling to properly serialize Go error types as strings instead of empty objects, maintaining backward compatibility with existing API response structure
- Non-PR Workflow Support: Enables API operations without PR context by skipping PR-specific requirements while maintaining other security checks
- Drift Webhooks: Sends Slack and HTTP notifications when drift is detected
- Storage & Remediation: In-memory storage for drift results and a service to execute plan/apply remediation operations
Reviewed changes
Copilot reviewed 46 out of 46 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
server/user_config.go |
Adds EnableDriftDetection configuration flag |
server/server.go |
Initializes drift storage, remediation service, and webhook sender when enabled; extracts route registration to SetupRoutes() |
server/controllers/api_response.go |
New file implementing modernized response envelope with structured errors and authentication middleware |
server/controllers/api_types.go |
New API DTO types for drift, remediation, and plan/apply results with conversion functions |
server/controllers/api_controller.go |
Implements drift detection, remediation, and status endpoints; modernizes existing plan/apply endpoints |
server/events/command/project_result.go |
Adds custom JSON marshaling to fix error serialization while preserving backward-compatible flat structure |
server/events/command/result.go |
Adds custom JSON marshaling for top-level result errors |
server/events/command/project_context.go |
Adds API field to support non-PR workflows |
server/events/command_requirement_handler.go |
Skips PR-specific requirements for non-PR API calls |
server/events/models/drift.go |
New drift models and validation for detection requests |
server/events/models/remediation.go |
New remediation models with status tracking and validation |
server/events/webhooks/drift.go |
New drift webhook sender with Slack and HTTP support |
server/core/drift/storage.go |
In-memory implementation of drift storage with repository/project keying |
server/core/drift/remediation.go |
In-memory remediation service executing plan/apply operations |
docs/adr/0002-api-enhancement-drift-detection.md |
Comprehensive architecture decision record documenting design rationale |
| Documentation files | Updated API endpoints, server configuration, FAQ, webhooks, and locking docs |
- Fix goimports ordering in cmd/server.go, server/server.go, user_config.go - Remove unused apiReportError and respond methods from APIController - Use omitzero instead of omitempty for time.Time (Go 1.24+ modernize lint) - Convert if/else chain to tagged switch for remediation status codes - Fix markdown lint: add blank line before list in using-atlantis.md Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
The push-on-PR change was only needed for fork image testing and should not be included in the upstream PR. Signed-off-by: PePe Amengual <2208324+jamengual@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds drift detection and remediation API endpoints to Atlantis, along with a modernized API response framework and drift webhook notifications. It supersedes #6098 with a clean rebase on current main, DCO sign-off on all commits, and fixes for all review feedback.
Key Features
POST /api/drift/detect) — triggers Terraform plan runs to detect infrastructure drift, supports auto-discovery viapre_workflow_hooks, stores resultsGET /api/drift/status) — returns cached drift detection results (read-only)POST /api/drift/remediate) — executes plan or apply to remediate drift, with dry-run modeGET /api/drift/remediate,GET /api/drift/remediate/{id}) — list and retrieve remediation resultsevent: driftin webhooks YAMLAPIResponder,APIMiddleware) with structured errors, request tracing, consistent HTTP status codes--enable-drift-detectionserver flag (default: false) — gates all drift functionalityChanges from #6098
main(no merge conflicts)driftKeyto include git ref (prevents different branches from overwriting drift data)gomock.NewControllerforNewMockLocker(compatible with upstream pegomock-to-gomock migration in feat: migrate locking mocks from pegomock to uber-go/mock (Phase 1) #6253)gofmtformatting in drift webhook structPreviously Addressed (in #6098 commits)
getAPIMiddleware()usingsync.Oncecrypto/subtle.ConstantTimeComparefor API token validationInternalErrorreturns generic message to clients, logs details server-sideComplete()resets counters before recomputing to prevent double-countingomitemptyused instead of invalidomitzeroJSON tagArchitecture
--enable-drift-detectionPR: 0) skip PR-specific requirements (approved, mergeable) viactx.API && ctx.Pull.Num == 0sync.RWMutex(can be replaced with DB backend)Files Changed
server/core/drift/(storage, parser, remediation),server/events/models/drift.go,server/events/models/remediation.go,server/events/webhooks/drift*.go,server/controllers/api_response.go,server/controllers/api_types.go,docs/adr/0002-api-enhancement-drift-detection.mdserver/controllers/api_controller.go,server/server.go,cmd/server.go,server/user_config.go, webhook infrastructure, API documentationTest plan
go build ./...succeedsgo vetpasses on all affected packagesgolangci-lintpasses (only pre-existing issues in unrelated files)--enable-drift-detectionflagCloses #6098 (superseded by this PR)
Related: jamengual#6, jamengual#7