Skip to content

Commit fa043c0

Browse files
authored
Sequential copies to backup acct: backup region 1st, resource region 2nd (#2)
* Sequential copies to backup acct: to backup region 1st, to resource region 2nd * ReadMe: troubleshooting advice * ReadMe: new, sequential diagram; fix region cardinality
1 parent b2ed690 commit fa043c0

File tree

3 files changed

+174
-85
lines changed

3 files changed

+174
-85
lines changed

README.md

Lines changed: 115 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ Backup Events automatically **copies on‑demand backups to**:
1414
- **and a second region**, for compliance, disaster recovery, or
1515
just peace-of-mind.
1616

17-
Then, it **saves money** by scheduling the original backup for deletion.
17+
It also **saves money** by scheduling the original backup for deletion.
1818

19-
It also **monitors** on-demand backup and copy jobs, sending a message to an
20-
error queue if something goes wrong.
19+
It **monitors** on-demand backups and copies, sending messages to an error
20+
queue if they fail.
2121

2222
You can get started immediately, or customize Backup Events.
2323

@@ -67,8 +67,10 @@ in **bold**. In this example,
6767

6868
|Element|||Value|
6969
|:---|:---|:---|:---|
70+
|||||
7071
|BackupSelection|[Resources](https://docs.aws.amazon.com/aws-backup/latest/devguide/API_BackupSelection.html#Backup-Type-BackupSelection-Resources)[0]||`arn:aws:rds:us‑east‑1:888866664444:db:Your‑Database`|
7172
||Resources[1]||`arn:aws:rds:us‑east‑1:888866664444:db‑cluster:Your‑Cluster`|
73+
|||||
7274
|[BackupPlan](https://docs.aws.amazon.com/aws-backup/latest/devguide/API_BackupPlan.html).Rules[0]|[TargetBackupVault](https://docs.aws.amazon.com/aws-backup/latest/devguide/API_BackupRule.html#Backup-Type-BackupRule-TargetBackupVaultName)||`Default` (resource region and resource account are implicit)|
7375
||**Lifecycle.[DeleteAfterDays](https://docs.aws.amazon.com/aws-backup/latest/devguide/API_Lifecycle.html#Backup-Type-Lifecycle-DeleteAfterDays)** _updated_||NewDeleteAfterDays CloudFormation parameter value|
7476
||**CopyActions[0]** _new_|[DestinationBackupVaultArn](https://docs.aws.amazon.com/aws-backup/latest/devguide/API_CopyAction.html#Backup-Type-CopyAction-DestinationBackupVaultArn)|`arn:aws:backup:us‑west‑2:999977775555:backup‑vault:Default`|
@@ -103,14 +105,17 @@ retain the sample vaults, disable Backup Events instead, by changing the
103105
<br/>
104106

105107
If you sometimes take on-demand backups, update your Backup Events
106-
CloudFormation StackSet or stacks. `v2.0.0`&nbsp;:
108+
CloudFormation StackSet or stacks. `v2.1.0`&nbsp;:
107109

108110
- Ignores scheduled backups from backup plans (because plans support
109111
CopyActions) but still copies on-demand backups.
110-
- Directly copies an on-demand backup from the resource account to _both_ the
111-
resource and backup regions in the backup account.
112-
- Reduces retention of an on-demand backup after the more important of the two
113-
copies, to the backup region, has been completed.
112+
- Copies an on-demand backup from the resource account directly to the backup
113+
account, backup region.
114+
- If the first copy completes successfully, copies the on-demand backup from
115+
the resource account to the backup account, resource region.
116+
- If the second copy completes successfully, reduces retention of the original
117+
on-demand backup.
118+
- Tracks on-demand backup and on-demand copy failures in the error queue.
114119

115120
</details>
116121

@@ -131,7 +136,7 @@ Click to view the architecture diagram:
131136

132137
## Quick Start
133138

134-
1. Check prerequisites.
139+
1. Check AWS&nbsp;Backup prerequisites.
135140

136141
If you have already used AWS&nbsp;Backup from the console, to back up a
137142
resource in one AWS account (your "main account") and copy the backup to
@@ -240,16 +245,66 @@ Click to view the architecture diagram:
240245
Switch to your backup AWS account and check for copies of your backup in
241246
the main region and the backup region.
242247

243-
13. In case of trouble, focus on the resource region and check the following,
244-
in the main/resource account:
245-
246-
- The [BackupEvents CloudWatch log group](https://console.aws.amazon.com/cloudwatch/home#logsV2:log-groups$3FlogGroupNameFilter$3DBackupEvents)
248+
13. In case of trouble, check the following, in the main/resource account, in
249+
the resource region:
247250

251+
- The
252+
[BackupEvents CloudWatch log group](https://console.aws.amazon.com/cloudwatch/home#logsV2:log-groups$3FlogGroupNameFilter$3DBackupEvents)
248253
- The `BackupEvents-ErrorQueue`
249254
[SQS queue](https://console.aws.amazon.com/sqs/v3/home#/queues)
250-
- [CloudTrail&rarr;Event&nbsp;history](https://console.aws.amazon.com/cloudtrailv2/home#/events).
251-
Tips: Change "Read-only" to `true` to see more events. Select the gear
252-
icon at the right to add the "Error code" column.
255+
- AWS&nbsp;Backup
256+
[backup&nbsp;jobs](https://console.aws.amazon.com/backup/home#/jobs/backup)
257+
and
258+
[copy&nbsp;jobs](https://console.aws.amazon.com/backup/home#/jobs/copy).
259+
Select a longer time window than "Last 24 hours", if necessary.
260+
- [CloudTrail Event history](https://console.aws.amazon.com/cloudtrailv2/home#/events).
261+
Change "Read-only" to `true` to see more events. Select the gear icon at
262+
the right to add the "Error code" column.
263+
264+
<details>
265+
<summary>Troubleshooting advice...</summary>
266+
267+
<br/>
268+
269+
If a copy job did not start, or if it started but failed, intervene before
270+
the deletion day (if any) that you specified when you started the on-demand
271+
backup. The original backup might be available for you to re-copy.
272+
273+
Keep in mind that successful completion of certain on-demand copy jobs will
274+
trigger Backup Events actions. Completion of the first copy will trigger
275+
the second, and completion of the second copy will trigger reduction of the
276+
original backup's retention period. To disable the triggers, temporarily
277+
set the `EnableCopy` and/or `EnableUpdateLifecycle` CloudFormation
278+
parameters to `false`&nbsp.
279+
280+
When you start an on-demand backup, keep the start window and the
281+
completion window as short as possible so that you will not have to wait
282+
many hours or days for error feedback from AWS&nbsp;Backup.
283+
284+
Sometimes, a resource is not in the expected state when AWS&nbsp;Backup
285+
actually starts a requested backup. For example, an RDS database instance
286+
must be in the `available` state.
287+
288+
Timeouts and cross-region network problems are rare but permissions
289+
problems are a likely cause of errors. When you start an on-demand backup
290+
or copy, make sure you have permission to pass your chosen backup role or
291+
backup copy role to AWS&nbsp;Backup. Start a new on-demand backup or copy
292+
job after checking and correcting:
293+
294+
- policies and the permissions boundary for a custom backup role
295+
- policies and the permissions boundary for a custom backup copy role
296+
- availability of the AWSBackupDefaultServiceRole in the backup account
297+
(even if you use a custom backup copy role, AWS&nbsp;Backup uses the
298+
default role to complete a cross-account copy)
299+
- backup vault policies in all relevant AWS accounts and regions (if you
300+
write custom policies, compare the policies for the sample vaults)
301+
- central service control policies and resource control policies (SCPs and
302+
RCPs)
303+
- key policies for customer-managed KMS encryption keys applied to backup
304+
vaults (and to resources, if the resource types do not support
305+
[full management and independent encryption in AWS&nbsp;Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/encryption.html#independent-encryption))
306+
307+
</details>
253308

254309
14. Delete the EFS file system and all of its AWS&nbsp;Backup backups (or let
255310
the backups expire, at a small cost).
@@ -360,7 +415,7 @@ resources potentially deployed to the backup account.
360415

361416
```terraform
362417
module "backup_events_stackset" {
363-
source = "git::https://github.com/sqlxpert/backup-events-aws.git//terraform-multi?ref=v2.0.0"
418+
source = "git::https://github.com/sqlxpert/backup-events-aws.git//terraform-multi?ref=v2.1.0"
364419
# Reference a specific version from github.com/sqlxpert/backup-events-aws/releases
365420
366421
backup_events_stackset_regions = ["us-east-1", "us-west-2", ]
@@ -420,22 +475,24 @@ software at your own risk. You are encouraged to evaluate the source code._
420475
they have been copied can access backups in any vault in the same AWS
421476
account and region. Tampering with the function's source code, environment
422477
variable or event input would allow switching vaults. The backup account is
423-
a security barrier; the function is never deployed there. The problem?
424-
[Backup or recoveryPoint ARNs](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsbackup.html#awsbackup-recoveryPoint)
478+
a security barrier; the function is never deployed there. The problem? Flat
479+
[backup or "recoveryPoint" ARNs](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsbackup.html#awsbackup-recoveryPoint)
425480
do not include the vault name, and
426481
[UpdateRecoveryPointLifecycle](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsbackup.html#awsbackup-UpdateRecoveryPointLifecycle)
427-
does not support an IAM condition key for vault name or ARN.
482+
does not support an IAM condition key for vault ARN.
428483
- Readable IAM policies, broken down into discrete statements by service,
429-
resource or principal. Policies are formatted as CloudFormation YAML rather
430-
than JSON.
431-
- Tolerance for slow operations and clock drift in a distributed system
432-
- The function that reduces retention of original backups after they have
433-
been copied applies a full-day margin.
434-
- Options to encrypt the log and the error queue at rest, using the AWS Key
435-
Management System (KMS)
484+
resource or principal. Policies, except those open to customization, are
485+
formatted as CloudFormation YAML rather than native JSON.
436486
- Least-privilege SQS queue policy with support for customization
437-
- Option to use custom vaults (with custom KMS keys) and a custom role for
438-
AWS&nbsp;Backup
487+
- Options to encrypt the log and the error queue at rest, using the AWS Key
488+
Management System
489+
- Options to use a custom, multi-region KMS key for the sample backup vaults or
490+
to use custom backup vaults, with KMS keys and vault access policies of your
491+
choice
492+
- Option to use a custom role for AWS&nbsp;Backup copy jobs
493+
- Tolerance for slow operations and clock drift in a distributed system. The
494+
function that reduces retention of original backups after they have been
495+
copied applies a full-day margin.
439496
440497
### Security Steps You Can Take
441498
@@ -450,6 +507,23 @@ software at your own risk. You are encouraged to evaluate the source code._
450507
- Instead of relying on sample vaults, on default `aws/` KMS keys, and on the
451508
AWSBackupDefaultServiceRole , define custom equivalents with least-privilege
452509
resource- and/or identity-based policies tailored to your needs.
510+
- To prevent use of backups if an AWS account containing a backup vault is
511+
removed from the organization, encrypt backups (and original resources, for
512+
resource types that do _not_ support
513+
[full management and independent encryption in AWS&nbsp;Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/encryption.html#independent-encryption))
514+
with a custom KMS key housed in an account separate from the original
515+
resources and the backup vault. In the key policy, deny usage by principals
516+
outside the organization. Control over key usage is a major benefit of
517+
creating a customer-managed KMS key. Having the key policy serve as a
518+
security barrier is a major benefit of housing the key in an account separate
519+
from the account where it is used. Limit access to this separate account to
520+
people authorized to change key policies.
521+
522+
I am not publishing my custom KMS encryption key policies or AWS&nbsp;Backup
523+
backup and copy role policies. If you need help with least-privilege,
524+
cross-account, multi-region KMS key policies, or with least-privilege IAM
525+
policies for AWS&nbsp;Backup roles, please contact me. This is part of what I
526+
do for a living.
453527
454528
</details>
455529
@@ -467,7 +541,7 @@ software at your own risk. You are encouraged to evaluate the source code._
467541
from the second day, and so on, within reason) is a better investment of
468542
engineering effort. Consider
469543
[AWS&nbsp;Backup restore testing](https://docs.aws.amazon.com/aws-backup/latest/devguide/restore-testing.html)!
470-
- Set lifecycles when making on-demand backups, but **specify 7&nbsp;days
544+
- Set lifecycles when starting on-demand backups, but **specify 7&nbsp;days
471545
minimum before backups are transitioned to cold storage** / the "archive
472546
tier". Allow time for cross-account and cross-region copies to complete, and
473547
for original backups to be scheduled for deletion. If an original backup
@@ -531,10 +605,10 @@ So, I decided to write a new solution from scratch. The benefits?
531605
532606
- **Up-to-date:** AWS never returned to update the sample solution for
533607
multi-region encryption keys or direct cross-account Lambda function
534-
invocation. A multi-region key make it easy to move backups between regions.
535-
Direct cross-account invocation eliminates several components. (My
536-
February,&nbsp;2026 update has eliminated the need for a cross-account
537-
invocation mechanism.)
608+
invocation. A multi-region key makes it easy to move backups between regions.
609+
Direct cross-account Lambda function invocation eliminates several
610+
infrastructure components. (My February,&nbsp;2026 update has eliminated the
611+
need for a cross-account invocation mechanism.)
538612
539613
- **Centrally deployable:** 1&nbsp;CloudFormation template replaces AWS's
540614
3&nbsp;separate templates. Advanced users can use the template to create a
@@ -546,28 +620,21 @@ So, I decided to write a new solution from scratch. The benefits?
546620
547621
- **Supports on-demand backups:** AWS's solution depends on the copy step
548622
available in backup plans but not in on-demand backup requests. Waiting for
549-
an on-demand backup job to complete before you manually start copy jobs is
623+
an on-demand backup job to complete before you start on-demand copy jobs is
550624
tedious and prone to error. Also, you might forget to check for copy
551625
completion. (As of my February,&nbsp;2026, update, only on-demand backups are
552626
supported.)
553627
554-
- **Supports a multi-region, cross-account encryption key:** A multi-region KMS
555-
key makes moving encrypted backups from one region to another easier. Housing
556-
the key in a central, limited-access account increases control. (I am not
557-
publishing my custom key policy for AWS&nbsp;Backup. If you need multi-region
558-
KMS encryption keys and least privilege key policies, contact me! It's the
559-
kind of work I do for a living.)
560-
561-
- **Streamlined:** Object-oriented Python code interprets backup job events
562-
and copy job events. An abstract base class covers the many similarities and
628+
- **Streamlined:** Object-oriented Python code interprets backup events and
629+
copy events. An abstract base class covers the many similarities and
563630
subclasses, the few differences. This way, the same primitives serve for
564-
copying a backup and for reducing the backup's retention period. If AWS had
565-
chosen more consistent key names, subclasses would not be necessary.
631+
copying a backup and for reducing the backup's retention period. (If AWS had
632+
chosen more consistent key names, subclasses would not be necessary.)
566633
567634
- **Simple:** The function to reduce the retention period after copying is easy
568635
to understand. Minimum retention periods under various rules are added to a
569-
list. At the end, the highest minimum is applied. The original backup can be
570-
deleted as soon as AWS allows, but no sooner!
636+
list. The longest one stands. The original backup can be deleted as soon as
637+
AWS allows, but no sooner!
571638
572639
</details>
573640

0 commit comments

Comments
 (0)