Skip to content

Support multiple S3 buckets (same path pattern) in a single raw table #84

@takahiro-oda-paypay

Description

@takahiro-oda-paypay

Feature request

Use case: Ingest the same log schema from multiple S3 buckets (e.g. different regions or accounts) into a single raw (and downstream bronze) table, using the same path pattern per bucket.

Current behavior: Each CloudFiles load action has one path (one bucket + prefix). To combine multiple buckets, we have to:

  • Define multiple load actions (one per bucket)
  • Add a custom transform that unions the streams
  • Use a custom template or inline actions

Desired behavior: Either of:

  1. Multi-path load: Allow the path (or a new paths) field to accept multiple S3 URIs so a single load action reads from several buckets (e.g. path: ["s3://bucket-a/prefix/*", "s3://bucket-b/prefix/*"]), with the same schema and options applied to all.

  2. Documented append-flow pattern: If multiple flowgroups writing to the same table (one create + N append flows) is already supported and stable, document it clearly so we can rely on “N flowgroups, same target table” instead of maintaining a custom multi-bucket template with manual union.

Impact: Would reduce custom templates and duplicated union logic for multi-bucket ingestion and make the “same schema, multiple buckets, one table” pattern a first-class option.

Thank you for considering this.

Metadata

Metadata

Assignees

Labels

questionFurther information is requestedwontfixThis will not be worked on

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions