[Preview] Amazon SageMaker Unified Studio CI/CD CLI is currently in preview and is subject to change. Commands, configuration formats, and APIs may evolve based on customer feedback. We recommend evaluating this tool in non-production environments during preview. For feedback and bug reports, please open an issue https://github.com/aws/CICD-for-SageMakerUnifiedStudio/issues
[IAM Domains Only] This CLI currently supports SMUS domains using IAM-based authentication only. Support for IAM Identity Center (IdC)-based domains is coming soon.
Automate deployment of data applications across SageMaker Unified Studio environments
Deploy Airflow DAGs, Jupyter notebooks, and ML workflows from development to production with confidence. Built for data scientists, data engineers, ML engineers, and GenAI app developers working with DevOps teams.
Works with your deployment strategy: Whether you use git branches (branch-based), versioned artifacts (bundle-based), git tags (tag-based), or direct deployment - this CLI supports your workflow. Define your application once, deploy it your way.
✅ AWS Abstraction Layer - CLI encapsulates all AWS analytics, ML, and SMUS complexity - DevOps teams never call AWS APIs directly
✅ Separation of Concerns - Data teams define WHAT to deploy (manifest.yaml), DevOps teams define HOW and WHEN (CI/CD workflows)
✅ Generic CI/CD Workflows - Same workflow works for Glue, SageMaker, Bedrock, QuickSight, or any AWS service combination
✅ Deploy with Confidence - Automated testing and validation before production
✅ Multi-Environment Management - Test → Prod with environment-specific configuration
✅ Infrastructure as Code - Version-controlled application manifests and reproducible deployments
✅ Event-Driven Workflows - Trigger workflows automatically via EventBridge on deployment
Install from source:
git clone https://github.com/aws/CICD-for-SageMakerUnifiedStudio.git
cd CICD-for-SageMakerUnifiedStudio
pip install -e .Deploy your first application:
# Validate configuration
smus-cli describe --manifest manifest.yaml --connect
# Create deployment bundle (optional)
smus-cli bundle --manifest manifest.yaml
# Deploy to test environment
smus-cli deploy --targets test --manifest manifest.yaml
# Run validation tests
smus-cli test --manifest manifest.yaml --targets testSee it in action: Live GitHub Actions Example
You focus on: Your application - what to deploy, where to deploy, and how it runs
You define: Application manifest (manifest.yaml) with your code, workflows, and configurations
You don't need to know: CI/CD pipelines, GitHub Actions, deployment automation
→ Quick Start Guide - Deploy your first application in 10 minutes
Includes examples for:
- Data Engineering (Glue, Notebooks, Athena)
- ML Workflows (SageMaker, Notebooks)
- GenAI Applications (Bedrock, Notebooks)
You focus on: CI/CD best practices, security, compliance, and deployment automation
You define: Workflow templates that enforce testing, approvals, and promotion policies
You don't need to know: Application-specific details, AWS services used, DataZone APIs, SMUS project structures, or business logic
→ Admin Guide - Configure infrastructure and pipelines in 15 minutes
→ GitHub Workflow Templates - Generic, reusable workflow templates for automated deployment
The CLI is your abstraction layer: You just call smus-cli deploy - the CLI handles all AWS service interactions (DataZone, Glue, Athena, SageMaker, MWAA, S3, IAM, etc.). Your workflows stay simple and generic.
📊 Analytics & BI
- Glue ETL jobs and crawlers
- Athena queries
- QuickSight dashboards
- EMR jobs (future)
- Redshift queries (future)
🤖 Machine Learning
- SageMaker training jobs
- ML models and endpoints
- MLflow experiments
- Feature Store (future)
- Batch transforms (future)
🧠 Generative AI
- Bedrock agents
- Knowledge bases
- Foundation model configurations (future)
📓 Code & Workflows
- Jupyter notebooks
- Python scripts
- Airflow DAGs (MWAA and Amazon MWAA Serverless)
- Lambda functions (future)
💾 Data & Storage
- S3 data files
- Git repositories
- Data catalogs (future)
Deploy workflows using these AWS services through Airflow YAML syntax:
Amazon Athena • AWS Glue • Amazon EMR • Amazon Redshift • Amazon QuickSight • Lake Formation
SageMaker Training • SageMaker Pipelines • Feature Store • Model Registry • Batch Transform
Amazon Bedrock • Bedrock Agents • Bedrock Knowledge Bases • Guardrails
S3 • Lambda • Step Functions • DynamoDB • RDS • SNS/SQS • Batch
See complete list: Airflow AWS Operators Reference
The Problem: Traditional deployment approaches force DevOps teams to learn AWS analytics services (Glue, Athena, DataZone, SageMaker, MWAA, etc.) and understand SMUS project structures, or force data teams to become CI/CD experts.
The Solution: SMUS CLI is the abstraction layer that encapsulates all AWS and SMUS complexity.
Example workflow:
1. DevOps Team 2. Data Team 3. SMUS CLI (The Abstraction)
↓ ↓ ↓
Defines the PROCESS Defines the CONTENT Workflow calls:
- Test on merge - Glue jobs smus-cli deploy --manifest manifest.yaml
- Approval for prod - SageMaker training ↓
- Security scans - Athena queries CLI handles ALL AWS complexity:
- Notification rules - File structure - DataZone APIs
- Glue/Athena/SageMaker APIs
Defines INFRASTRUCTURE - MWAA deployment
- Account & region - S3 management
- IAM roles - IAM configuration
- Resources - Infrastructure provisioning
Works for ANY app!
No ML/Analytics/GenAI
service knowledge needed!
DevOps teams focus on:
- CI/CD best practices (testing, approvals, notifications)
- Security and compliance gates
- Deployment orchestration
- Monitoring and alerting
SMUS CLI handles ALL AWS complexity:
- DataZone domain and project management
- AWS Glue, Athena, SageMaker, MWAA APIs
- S3 storage and artifact management
- IAM roles and permissions
- Connection configurations
- Catalog asset subscriptions
- Workflow deployment to Airflow
- Infrastructure provisioning
- Testing and validation
Data teams focus on:
- Application code and workflows
- Which AWS services to use (Glue, Athena, SageMaker, etc.)
- Environment configurations
- Business logic
Result:
- DevOps teams never call AWS APIs directly - they just call
smus-cli deploy - CI/CD workflows are generic - same workflow works for Glue apps, SageMaker apps, or Bedrock apps
- Data teams never touch CI/CD configs
- Both teams work independently using their expertise
A declarative YAML file (manifest.yaml) that defines your data application:
- Application details - Name, version, description
- Content - Code from git repositories, data/models from storage, QuickSight dashboards
- Workflows - Airflow DAGs for orchestration and automation
- Stages - Where to deploy (dev, test, prod environments)
- Configuration - Environment-specific settings, connections, and bootstrap actions
Created and owned by data teams. Defines what to deploy and where. No CI/CD knowledge required.
Your data/analytics workload being deployed:
- Airflow DAGs and Python scripts
- Jupyter notebooks and data files
- ML models and training code
- ETL pipelines and transformations
- GenAI agents and MCP servers
- Foundation model configurations
A deployment environment (dev, test, prod) mapped to a SageMaker Unified Studio project:
- Domain and region configuration
- Project name and settings
- Resource connections (S3, Airflow, Athena, Glue)
- Environment-specific parameters
- Optional branch mapping for git-based deployments
Each application stage deploys to a dedicated SageMaker Unified Studio (SMUS) project. A project can host a single application or multiple applications depending on your architecture and CI/CD methodology. Stage projects are independent entities with their own governance:
- Ownership & Access: Each stage project has its own set of owners and contributors, which may differ from the development project. Production projects typically have restricted access compared to development environments.
- Multi-Domain & Multi-Region: Stage projects can belong to different SMUS domains, AWS accounts, and regions. For example, your dev stage might deploy to a development domain in us-east-1, while prod deploys to a production domain in eu-west-1.
- Flexible Architecture: Organizations can choose between dedicated projects per application (isolation) or shared projects hosting multiple applications (consolidation), based on security, compliance, and operational requirements.
This separation enables true environment isolation with independent access controls, compliance boundaries, and regional data residency requirements.
Orchestration logic that executes your application. Workflows serve two purposes:
1. Deployment-time: Create required AWS resources during deployment
- Provision infrastructure (S3 buckets, databases, IAM roles)
- Configure connections and permissions
- Set up monitoring and logging
2. Runtime: Execute ongoing data and ML pipelines
- Scheduled execution (daily, hourly, etc.)
- Event-driven triggers (S3 uploads, API calls)
- Data processing and transformations
- Model training and inference
Workflows are defined as Airflow DAGs (Directed Acyclic Graphs) in YAML format. Supports MWAA (Managed Workflows for Apache Airflow) and Amazon MWAA Serverless (User Guide).
GitHub Actions workflows (or other CI/CD systems) that automate deployment:
- Created and owned by DevOps teams
- Defines how and when to deploy
- Runs tests and quality gates
- Manages promotion across targets
- Enforces security and compliance policies
- Example:
.github/workflows/deploy.yml
Key insight: DevOps teams create generic, reusable workflows that work for ANY application. They don't need to know if the app uses Glue, SageMaker, or Bedrock - the CLI handles all AWS service interactions. The workflow just calls smus-cli deploy and the CLI does the rest.
Bundle-based (Artifact): Create versioned archive → deploy archive to stages
- Good for: audit trails, rollback capability, compliance
- Command:
smus-cli bundlethensmus-cli deploy --manifest app.tar.gz
Direct (Git-based): Deploy directly from sources without intermediate artifacts
- Good for: simpler workflows, rapid iteration, git as source of truth
- Command:
smus-cli deploy --manifest manifest.yaml --stage test
Both modes work with any combination of storage and git content sources.
Real-world examples showing how to deploy different workloads with SMUS CI/CD.
Deploy interactive BI dashboards with automated Glue ETL pipelines for data preparation. Uses QuickSight asset bundles, Athena queries, and GitHub dataset integration with environment-specific configurations.
AWS Services: QuickSight • Glue • Athena • S3 • MWAA Serverless
GitHub Workflow: analytic-dashboard-glue-quicksight.yml
What happens during deployment: Application code is deployed to S3, Glue jobs and Airflow workflows are created and executed, QuickSight dashboard/data source/dataset are created, and QuickSight ingestion is initiated to refresh the dashboard with latest data.
📁 App Structure
dashboard-glue-quick/
├── manifest.yaml # Deployment configuration
├── covid_etl_workflow.yaml # Airflow workflow definition
├── glue_setup_covid_db.py # Glue job: Create database & tables
├── glue_covid_summary_job.py # Glue job: ETL transformations
├── glue_set_permission_check.py # Glue job: Permission validation
├── quicksight/
│ └── TotalDeathByCountry.qs # QuickSight dashboard bundle
└── app_tests/
└── test_covid_data.py # Integration tests
Key Files:
- Glue Jobs: Python scripts for database setup, ETL, and validation
- Workflow: YAML defining Airflow DAG for orchestration
- QuickSight Bundle: Dashboard, datasets, and data sources
- Tests: Validate data quality and dashboard functionality
View Airflow Workflow
workflow_combined:
dag_id: 'covid_dashboard_glue_quick_pipeline'
tasks:
setup_covid_db_task:
operator: airflow.providers.amazon.aws.operators.glue.GlueJobOperator
retries: 0
job_name: setup-covid-db-job
script_location: '{proj.connection.default.s3_shared.s3Uri}dashboard-glue-quick/bundle/glue_setup_covid_db.py'
s3_bucket: '{proj.connection.default.s3_shared.bucket}'
iam_role_name: '{proj.iam_role_name}'
region_name: '{domain.region}'
update_config: true
script_args:
'--BUCKET_NAME': '{proj.connection.default.s3_shared.bucket}'
'--REGION_NAME': '{domain.region}'
create_job_kwargs:
GlueVersion: '4.0'
MaxRetries: 0
Timeout: 180
data_summary_task:
operator: airflow.providers.amazon.aws.operators.glue.GlueJobOperator
retries: 0
job_name: summary-glue-job
script_location: '{proj.connection.default.s3_shared.s3Uri}dashboard-glue-quick/bundle/glue_covid_summary_job.py'
s3_bucket: '{proj.connection.default.s3_shared.bucket}'
iam_role_name: '{proj.iam_role_name}'
region_name: '{domain.region}'
update_config: true
script_args:
'--DATABASE_NAME': 'covid19_db'
'--TABLE_NAME': 'us_simplified'
'--SUMMARY_DATABASE_NAME': 'covid19_summary_db'
'--S3_DATABASE_PATH': '{proj.connection.default.s3_shared.s3Uri}dashboard-glue-quick/output/databases/covid19_summary_db/'
'--BUCKET_NAME': '{proj.connection.default.s3_shared.bucket}'
dependencies: [setup_covid_db_task]
create_job_kwargs:
GlueVersion: '4.0'
MaxRetries: 0
Timeout: 180
set_permission_check_task:
operator: airflow.providers.amazon.aws.operators.glue.GlueJobOperator
retries: 0
job_name: set-permission-check-job
script_location: '{proj.connection.default.s3_shared.s3Uri}dashboard-glue-quick/bundle/glue_set_permission_check.py'
s3_bucket: '{proj.connection.default.s3_shared.bucket}'
iam_role_name: '{proj.iam_role_name}'
region_name: '{domain.region}'
update_config: true
script_args:
'--BUCKET_NAME': '{proj.connection.default.s3_shared.bucket}'
'--REGION_NAME': '{domain.region}'
'--ROLES': '{env.GRANT_TO}'
dependencies: [data_summary_task]
create_job_kwargs:
GlueVersion: '4.0'
MaxRetries: 0
Timeout: 180View Manifest
applicationName: IntegrationTestETLWorkflow
content:
storage:
- name: dashboard-glue-quick
include:
- "*.py"
- name: workflows
include:
- "*.yaml"
git:
- repository: covid-19-dataset
url: https://github.com/datasets/covid-19.git
quicksight:
- name: TotalDeathByCountry
type: dashboard
workflows:
- workflowName: covid_dashboard_glue_quick_pipeline
connectionName: default.workflow_serverless
stages:
test:
stage: TEST
domain:
tags:
purpose: smus-cicd-testing
region: ${TEST_DOMAIN_REGION:us-east-1}
project:
name: test-marketing
owners:
- Eng1
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitHubActionsRole-SMUS-CLI-Tests
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/Admin
environment_variables:
S3_PREFIX: test
AWS_REGION: ${TEST_DOMAIN_REGION:us-east-1}
GRANT_TO: Admin,service-role/aws-quicksight-service-role-v0
bootstrap:
actions:
- type: workflow.create
workflowName: covid_dashboard_glue_quick_pipeline
- type: workflow.run
workflowName: covid_dashboard_glue_quick_pipeline
trailLogs: true
- type: quicksight.refresh_dataset
refreshScope: IMPORTED
ingestionType: FULL_REFRESH
wait: false
deployment_configuration:
storage:
- name: dashboard-glue-quick
connectionName: default.s3_shared
targetDirectory: dashboard-glue-quick/bundle
- name: workflows
connectionName: default.s3_shared
targetDirectory: dashboard-glue-quick/bundle/workflows
git:
- name: covid-19-dataset
connectionName: default.s3_shared
targetDirectory: repos
quicksight:
assets:
- name: TotalDeathByCountry
owners:
- arn:aws:quicksight:${TEST_DOMAIN_REGION:us-east-1}:${AWS_ACCOUNT_ID}:user/default/Admin/*
viewers:
- arn:aws:quicksight:${TEST_DOMAIN_REGION:us-east-1}:${AWS_ACCOUNT_ID}:user/default/Admin/*
overrideParameters:
ResourceIdOverrideConfiguration:
PrefixForAllResources: deployed-{stage.name}-covid-Deploy Jupyter notebooks with parallel execution orchestration for data analysis and ETL workflows. Demonstrates notebook deployment with MLflow integration for experiment tracking.
AWS Services: SageMaker Notebooks • MLflow • S3 • MWAA Serverless
GitHub Workflow: analytic-data-notebooks.yml
What happens during deployment: Notebooks and workflow definitions are uploaded to S3, Airflow DAG is created for parallel notebook execution, MLflow connection is provisioned for experiment tracking, and notebooks are ready to run on-demand or scheduled.
📁 App Structure
data-notebooks/
├── manifest.yaml # Deployment configuration
├── notebooks/
│ ├── customer_churn_prediction.ipynb # Customer churn ML
│ ├── retail_sales_forecasting.ipynb # Sales forecasting
│ ├── customer_segmentation_analysis.ipynb # Customer segmentation
│ └── requirements.txt # Python dependencies
├── workflows/
│ └── parallel_notebooks_workflow.yaml # Airflow orchestration
└── app_tests/
└── test_notebooks_execution.py # Integration tests
Key Files:
- Notebooks: 3 Jupyter notebooks for ML and analytics workflows
- Workflow: Parallel execution orchestration with Airflow
- Tests: Validate notebook execution and outputs
View Manifest
applicationName: IntegrationTestNotebooks
content:
storage:
- name: notebooks
connectionName: default.s3_shared
include:
- notebooks/
- workflows/
workflows:
- workflowName: parallel_notebooks_execution
connectionName: default.workflow_serverless
stages:
test:
domain:
region: us-east-1
project:
name: test-marketing
owners:
- Eng1
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitHubActionsRole-SMUS-CLI-Tests
environment_variables:
S3_PREFIX: test
deployment_configuration:
storage:
- name: notebooks
connectionName: default.s3_shared
targetDirectory: notebooks/bundle/notebooks
bootstrap:
actions:
- type: datazone.create_connection
name: mlflow-server
connection_type: MLFLOW
properties:
trackingServerArn: arn:aws:sagemaker:${STS_REGION}:${STS_ACCOUNT_ID}:mlflow-tracking-server/smus-integration-mlflow-use2
trackingServerName: smus-integration-mlflow-use2
- type: workflow.create
workflowName: parallel_notebooks_execution
- type: workflow.run
workflowName: parallel_notebooks_execution
trailLogs: trueView Airflow Workflow
notebooks_workflow:
dag_id: notebooks_parallel
tasks:
nb_churn:
operator: airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator
retries: 0
input_config:
input_path: notebooks/bundle/notebooks/customer_churn_prediction.ipynb
input_params: {}
output_config:
output_formats:
- NOTEBOOK
compute:
instance_type: ml.c5.xlarge
image_details:
image_name: sagemaker-distribution-prod
image_version: '3'
wait_for_completion: true
nb_sales:
operator: airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator
retries: 0
input_config:
input_path: notebooks/bundle/notebooks/retail_sales_forecasting.ipynb
input_params: {}
output_config:
output_formats:
- NOTEBOOK
compute:
instance_type: ml.c5.xlarge
image_details:
image_name: sagemaker-distribution-prod
image_version: '3'
wait_for_completion: true
nb_segment:
operator: airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator
retries: 0
input_config:
input_path: notebooks/bundle/notebooks/customer_segmentation_analysis.ipynb
input_params: {}
output_config:
output_formats:
- NOTEBOOK
compute:
instance_type: ml.c5.xlarge
image_details:
image_name: sagemaker-distribution-prod
image_version: '3'
wait_for_completion: trueTrain ML models with SageMaker using the SageMaker SDK and SageMaker Distribution images. Track experiments with MLflow and automate training pipelines with environment-specific configurations.
AWS Services: SageMaker Training • MLflow • S3 • MWAA Serverless
GitHub Workflow: analytic-ml-training.yml
What happens during deployment: Training code and workflow definitions are uploaded to S3 with compression, Airflow DAG is created for training orchestration, MLflow connection is provisioned for experiment tracking, and SageMaker training jobs are created and executed using SageMaker Distribution images.
📁 App Structure
ml/training/
├── manifest.yaml # Deployment configuration
├── code/
│ ├── sagemaker_training_script.py # Training script
│ └── requirements.txt # Python dependencies
├── workflows/
│ ├── ml_training_workflow.yaml # Airflow orchestration
│ └── ml_training_notebook.ipynb # Training notebook
└── app_tests/
└── test_model_registration.py # Integration tests
Key Files:
- Training Script: SageMaker training job implementation
- Workflow: Airflow DAG for training orchestration
- Notebook: Interactive training workflow
- Tests: Validate model registration and training
View Manifest
applicationName: IntegrationTestMLTraining
content:
storage:
- name: training-code
connectionName: default.s3_shared
include: [ml/training/code]
- name: training-workflows
connectionName: default.s3_shared
include: [ml/training/workflows]
workflows:
- workflowName: ml_training_workflow
connectionName: default.workflow_serverless
stages:
test:
domain:
region: us-east-1
project:
name: test-ml-training
owners:
- Eng1
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitHubActionsRole-SMUS-CLI-Tests
role:
arn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/SMUSCICDTestRole
environment_variables:
S3_PREFIX: test
deployment_configuration:
storage:
- name: training-code
connectionName: default.s3_shared
targetDirectory: ml/bundle/training-code
compression: gz
- name: training-workflows
connectionName: default.s3_shared
targetDirectory: ml/bundle/training-workflows
bootstrap:
actions:
- type: datazone.create_connection
name: mlflow-server
connection_type: MLFLOW
properties:
trackingServerArn: arn:aws:sagemaker:${STS_REGION}:${STS_ACCOUNT_ID}:mlflow-tracking-server/smus-integration-mlflow-use2
- type: workflow.create
workflowName: ml_training_workflow
- type: workflow.run
workflowName: ml_training_workflow
trailLogs: trueView Airflow Workflow
ml_training_workflow:
dag_id: "ml_training_workflow"
tasks:
ml_training_notebook:
operator: "airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator"
retries: 0
input_config:
input_path: "ml/bundle/training-workflows/ml_training_notebook.ipynb"
input_params:
mlflow_tracking_server_arn: "{proj.connection.mlflow-server.trackingServerArn}"
mlflow_artifact_location: "{proj.connection.default.s3_shared.s3Uri}ml/mlflow-artifacts"
sklearn_version: "1.2-1"
python_version: "py3"
training_instance_type: "ml.m5.large"
model_name: "realistic-classifier-v1"
output_config:
output_formats:
['NOTEBOOK']
wait_for_completion: TrueDeploy trained ML models as SageMaker real-time inference endpoints. Uses SageMaker SDK for endpoint configuration and SageMaker Distribution images for serving.
AWS Services: SageMaker Endpoints • S3 • MWAA Serverless
GitHub Workflow: analytic-ml-deployment.yml
What happens during deployment: Model artifacts, deployment code, and workflow definitions are uploaded to S3, Airflow DAG is created for endpoint deployment orchestration, SageMaker endpoint configuration and model are created, and the inference endpoint is deployed and ready to serve predictions.
📁 App Structure
ml/deployment/
├── manifest.yaml # Deployment configuration
├── code/
│ └── inference.py # Inference handler
├── workflows/
│ ├── ml_deployment_workflow.yaml # Airflow orchestration
│ └── ml_deployment_notebook.ipynb # Deployment notebook
└── app_tests/
└── test_endpoint_deployment.py # Integration tests
Key Files:
- Inference Handler: Custom inference logic for endpoint
- Workflow: Airflow DAG for endpoint deployment
- Notebook: Interactive deployment workflow
- Tests: Validate endpoint deployment and predictions
View Manifest
applicationName: IntegrationTestMLDeployment
content:
storage:
- name: deployment-code
connectionName: default.s3_shared
include: [ml/deployment/code]
- name: deployment-workflows
connectionName: default.s3_shared
include: [ml/deployment/workflows]
- name: model-artifacts
connectionName: default.s3_shared
include: [ml/output/model-artifacts/latest]
workflows:
- workflowName: ml_deployment_workflow
connectionName: default.workflow_serverless
stages:
test:
domain:
region: us-east-1
project:
name: test-ml-deployment
owners:
- Eng1
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitHubActionsRole-SMUS-CLI-Tests
role:
arn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/SMUSCICDTestRole
environment_variables:
S3_PREFIX: test
deployment_configuration:
storage:
- name: deployment-code
connectionName: default.s3_shared
targetDirectory: ml/bundle/deployment-code
- name: deployment-workflows
connectionName: default.s3_shared
targetDirectory: ml/bundle/deployment-workflows
- name: model-artifacts
connectionName: default.s3_shared
targetDirectory: ml/bundle/model-artifacts
bootstrap:
actions:
- type: workflow.create
workflowName: ml_deployment_workflow
- type: workflow.run
workflowName: ml_deployment_workflow
trailLogs: trueView Airflow Workflow
ml_deployment_workflow:
dag_id: "ml_deployment_workflow"
tasks:
ml_deployment_notebook:
operator: "airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator"
retries: 0
input_config:
input_path: "ml/bundle/deployment-workflows/ml_deployment_notebook.ipynb"
input_params:
model_s3_uri: "{proj.connection.default.s3_shared.s3Uri}ml/output/model-artifacts/latest/output/model.tar.gz"
sklearn_version: "1.2-1"
python_version: "py3"
inference_instance_type: "ml.m5.large"
output_config:
output_formats:
['NOTEBOOK']
wait_for_completion: TrueDeploy GenAI applications with Bedrock agents and knowledge bases. Demonstrates RAG (Retrieval Augmented Generation) workflows with automated agent deployment and testing.
AWS Services: Amazon Bedrock • S3 • MWAA Serverless
GitHub Workflow: analytic-genai-workflow.yml
What happens during deployment: Agent configuration and workflow definitions are uploaded to S3, Airflow DAG is created for agent deployment orchestration, Bedrock agents and knowledge bases are configured, and the GenAI application is ready for inference and testing.
📁 App Structure
genai/
├── manifest.yaml # Deployment configuration
├── job-code/
│ ├── requirements.txt # Python dependencies
│ ├── test_agent.yaml # Agent test configuration
│ ├── lambda_mask_string.py # Lambda function
│ └── utils/
│ ├── bedrock_agent.py # Agent management
│ ├── bedrock_agent_helper.py # Agent utilities
│ └── knowledge_base_helper.py # Knowledge base utilities
├── workflows/
│ ├── genai_dev_workflow.yaml # Airflow orchestration
│ └── bedrock_agent_notebook.ipynb # Agent deployment notebook
└── app_tests/
└── test_genai_workflow.py # Integration tests
Key Files:
- Agent Code: Bedrock agent and knowledge base management
- Workflow: Airflow DAG for GenAI deployment
- Notebook: Interactive agent deployment
- Tests: Validate agent functionality
View Manifest
applicationName: IntegrationTestGenAIWorkflow
content:
storage:
- name: agent-code
connectionName: default.s3_shared
include: [genai/job-code]
- name: genai-workflows
connectionName: default.s3_shared
include: [genai/workflows]
workflows:
- workflowName: genai_dev_workflow
connectionName: default.workflow_serverless
stages:
test:
domain:
region: us-east-1
project:
name: test-marketing
owners:
- Eng1
- arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitHubActionsRole-SMUS-CLI-Tests
environment_variables:
S3_PREFIX: test
deployment_configuration:
storage:
- name: agent-code
connectionName: default.s3_shared
targetDirectory: genai/bundle/agent-code
- name: genai-workflows
connectionName: default.s3_shared
targetDirectory: genai/bundle/workflowsView Airflow Workflow
genai_dev_workflow:
dag_id: "genai_dev_workflow"
tasks:
bedrock_agent_notebook:
operator: "airflow.providers.amazon.aws.operators.sagemaker_unified_studio.SageMakerNotebookOperator"
retries: 0
input_config:
input_path: "genai/bundle/workflows/bedrock_agent_notebook.ipynb"
input_params:
agent_name: "calculator_agent"
agent_llm: "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
force_recreate: "True"
kb_name: "mortgage-kb"
output_config:
output_formats:
['NOTEBOOK']
wait_for_completion: TrueSee All Examples with Detailed Walkthroughs →
Legend: ✅ Supported | 🔄 Planned | 🔮 Future
| Feature | Status | Notes |
|---|---|---|
| YAML configuration | ✅ | Manifest Guide |
| Infrastructure as Code | ✅ | Deploy Command |
| Multi-environment deployment | ✅ | Stages |
| CLI tool | ✅ | CLI Commands |
| Version control integration | ✅ | GitHub Actions |
Automated Deployment - Define your application content, workflows, and deployment targets in YAML. Bundle-based (artifact) or direct (git-based) deployment modes. Deploy to test and prod with a single command. Dynamic configuration using ${VAR} substitution. Track deployments in S3 or git for deployment history.
| Feature | Status | Notes |
|---|---|---|
| Artifact bundling | ✅ | Bundle Command |
| Bundle-based deployment | ✅ | Deploy Command |
| Direct deployment | ✅ | Deploy Command |
| Deployment validation | ✅ | Describe Command |
| Incremental deployment | 🔄 | Upload only changed files |
| Rollback support | 🔮 | Automated rollback |
| Blue-green deployment | 🔮 | Zero-downtime deployments |
| Feature | Status | Notes |
|---|---|---|
| Project templates | 🔄 | smus-cli init with templates |
| Manifest initialization | ✅ | Create Command |
| Interactive setup | 🔄 | Guided configuration prompts |
| Local development | ✅ | CLI Commands |
| VS Code extension | 🔮 | IntelliSense and validation |
Environment Variables & Dynamic Configuration - Flexible configuration for any environment using variable substitution. Environment-specific settings with validation and connection management.
| Feature | Status | Notes |
|---|---|---|
| Variable substitution | ✅ | Substitutions Guide |
| Environment-specific config | ✅ | Stages |
| Secrets management | 🔮 | AWS Secrets Manager integration |
| Config validation | ✅ | Manifest Schema |
| Connection management | ✅ | Connections Guide |
Deploy Any AWS Service - Airflow DAGs, Jupyter notebooks, Glue ETL jobs, Athena queries, SageMaker training and endpoints, QuickSight dashboards, Bedrock agents, Lambda functions, EMR jobs, and Redshift queries.
| Feature | Status | Notes |
|---|---|---|
| Airflow DAGs | ✅ | Workflows |
| Jupyter notebooks | ✅ | SageMakerNotebookOperator |
| Glue ETL jobs | ✅ | GlueJobOperator |
| Athena queries | ✅ | AthenaOperator |
| SageMaker training | ✅ | SageMakerTrainingOperator |
| SageMaker endpoints | ✅ | SageMakerEndpointOperator |
| QuickSight dashboards | ✅ | QuickSight Deployment |
| Bedrock agents | ✅ | BedrockInvokeModelOperator |
| Lambda functions | 🔄 | LambdaInvokeFunctionOperator |
| EMR jobs | ✅ | EmrAddStepsOperator |
| Redshift queries | ✅ | RedshiftDataOperator |
Automated Workflow Execution & Event-Driven Workflows - Trigger workflows automatically during deployment with workflow.run (use trailLogs: true to stream logs and wait for completion). Fetch workflow logs for validation and debugging with workflow.logs. Automatically refresh QuickSight dashboards after ETL deployment with quicksight.refresh_dataset. Emit custom events for downstream automation and CI/CD orchestration with eventbridge.put_events. Provision MLflow and other DataZone connections during deployment. Actions run in order during smus-cli deploy for reliable initialization and validation.
| Feature | Status | Notes |
|---|---|---|
| Workflow execution | ✅ | workflow.run |
| Log retrieval | ✅ | workflow.logs |
| QuickSight refresh | ✅ | quicksight.refresh_dataset |
| EventBridge events | ✅ | eventbridge.put_events |
| DataZone connections | ✅ | datazone.create_connection |
| Sequential execution | ✅ | Execution Flow |
Pre-built CI/CD Pipeline Workflows - GitHub Actions, GitLab CI, Azure DevOps, and Jenkins support for automated deployment. Flexible configuration for any CI/CD platform. Trigger deployments from external events with webhook support.
| Feature | Status | Notes |
|---|---|---|
| GitHub Actions | ✅ | GitHub Actions Guide |
| GitLab CI | ✅ | CLI Commands |
| Azure DevOps | ✅ | CLI Commands |
| Jenkins | ✅ | CLI Commands |
| Service principals | ✅ | GitHub Actions Guide |
| OIDC federation | ✅ | GitHub Actions Guide |
Automated Tests & Quality Gates - Run validation tests before promoting to production. Block deployments if tests fail. Track execution status and logs. Verify deployment correctness with health checks.
| Feature | Status | Notes |
|---|---|---|
| Unit testing | ✅ | Test Command |
| Integration testing | ✅ | Test Command |
| Automated tests | ✅ | Test Command |
| Quality gates | ✅ | Test Command |
| Workflow monitoring | ✅ | Monitor Command |
| Feature | Status | Notes |
|---|---|---|
| Deployment monitoring | ✅ | Deploy Command |
| Workflow monitoring | ✅ | Monitor Command |
| Custom alerts | ✅ | Deployment Metrics |
| Metrics collection | ✅ | Deployment Metrics |
| Deployment history | ✅ | Bundle Command |
| Feature | Status | Notes |
|---|---|---|
| Amazon MWAA | ✅ | Workflows |
| MWAA Serverless | ✅ | Workflows |
| AWS Glue | ✅ | Airflow Operators |
| Amazon Athena | ✅ | Airflow Operators |
| SageMaker | ✅ | Airflow Operators |
| Amazon Bedrock | ✅ | Airflow Operators |
| Amazon QuickSight | ✅ | QuickSight Deployment |
| DataZone | ✅ | Manifest Schema |
| EventBridge | ✅ | Deployment Metrics |
| Lake Formation | ✅ | Connections Guide |
| Amazon S3 | ✅ | Storage |
| AWS Lambda | 🔄 | Airflow Operators |
| Amazon EMR | ✅ | Airflow Operators |
| Amazon Redshift | ✅ | Airflow Operators |
| Feature | Status | Notes |
|---|---|---|
| Multi-region deployment | ✅ | Stages |
| Cross-project deployment | ✅ | Stages |
| Dependency management | ✅ | Airflow Operators |
| Catalog subscriptions | ✅ | Manifest Schema |
| Multi-service orchestration | ✅ | Airflow Operators |
| Drift detection | 🔮 | Detect configuration drift |
| State management | 🔄 | Comprehensive state tracking |
- Quick Start Guide - Deploy your first application (10 min)
- Admin Guide - Set up infrastructure (15 min)
- Application Manifest - Complete YAML configuration reference
- CLI Commands - All available commands and options
- Bootstrap Actions - Automated deployment actions and event-driven workflows
- Substitutions & Variables - Dynamic configuration
- Connections Guide - Configure AWS service integrations
- GitHub Actions Integration - CI/CD automation setup
- Deployment Metrics - Monitoring with EventBridge
- Manifest Schema - YAML schema validation and structure
- Airflow AWS Operators - Custom operator reference
- Examples Guide - Walkthrough of example applications
- Data Notebooks - Jupyter notebooks with Airflow
- ML Training - SageMaker training with MLflow
- ML Deployment - SageMaker endpoint deployment
- QuickSight Dashboard - BI dashboards with Glue
- GenAI Application - Bedrock agents and knowledge bases
- Developer Guide - Complete development guide with architecture, testing, and workflows
- AI Assistant Context - Context for AI assistants (Amazon Q, Kiro)
- Tests Overview - Testing infrastructure
- Issues: GitHub Issues
- Documentation: docs/
- Examples: examples/
# ✅ Correct - Install from official AWS repository
git clone https://github.com/aws/CICD-for-SageMakerUnifiedStudio.git
cd CICD-for-SageMakerUnifiedStudio
pip install -e .
# ❌ Wrong - Do not use PyPI
pip install smus-cicd-cli # May contain malicious codeThis project is licensed under the MIT-0 License. See LICENSE for details.
