Skip to content

[SS-44] Add COPY FROM s3 Docs#35301

Open
patrickwwbutler wants to merge 3 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/copy-from-docs
Open

[SS-44] Add COPY FROM s3 Docs#35301
patrickwwbutler wants to merge 3 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/copy-from-docs

Conversation

@patrickwwbutler
Copy link
Contributor

Updates the documentation for COPY FROM mzsql command to include information and syntax on the new COPY FROM s3 feature.

Motivation

https://linear.app/materializeinc/issue/SS-44/write-user-facing-docs-for-copy-from-s3-statement-csv

Description

Adds a new syntax file for copy from s3/url, adds tab to SQL command reference page, and information on how to use it.

@patrickwwbutler patrickwwbutler requested a review from a team March 2, 2026 15:27
@patrickwwbutler patrickwwbutler requested a review from a team as a code owner March 2, 2026 15:27
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@patrickwwbutler patrickwwbutler requested a review from kay-kim March 2, 2026 15:27
@patrickwwbutler patrickwwbutler changed the title [SS-44] COPY FROM s3 Docs [SS-44] Add COPY FROM s3 Docs Mar 2, 2026
Copy link
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The doc should also enumerate the S3 bucket/object ACLs needed for MZ to read the data.

`bool` | `BOOLEAN` | | [`boolean`](/sql/types/boolean/)
`date32` | `INT32` | `DATE` | [`date`](/sql/types/date/)
`decimal128[38, 10 or max-scale]` | `FIXED_LEN_BYTE_ARRAY` | `DECIMAL` | [`numeric`](/sql/types/numeric/)
`fixed_size_binary(16)` | `FIXED_LEN_BYTE_ARRAY` | | [`bytea`](/sql/types/bytea/) <--- what about using the UUID logical type?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra comment in the Materialize type column

Comment on lines +25 to +29
`DELIMITER` | Single-quoted one-byte character | Format-dependent | Overrides the format's default column delimiter.
`NULL` | Single-quoted strings | Format-dependent | Specifies the string that represents a _NULL_ value.
`QUOTE` | Single-quoted one-byte character | `"` | Specifies the character to signal a quoted string, which may contain the `DELIMITER` value (without beginning new columns). To include the `QUOTE` character itself in column, wrap the column's value in the `QUOTE` character and prefix all instance of the value you want to literally interpret with the `ESCAPE` value. _`FORMAT CSV` only_
`ESCAPE` | Single-quoted strings | `QUOTE`'s value | Specifies the character to allow instances of the `QUOTE` character to be parsed literally as part of a column's value. _`FORMAT CSV` only_
`HEADER` | `boolean` | `false` | Specifies that the file contains a header line with the names of each column in the file. The first line is ignored on input. _`FORMAT CSV` only._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these apply only to CSV, or also to TEXT? Do we need to add something like _FORMAT X,Y only_ for each of these?

`QUOTE` | Single-quoted one-byte character | `"` | Specifies the character to signal a quoted string, which may contain the `DELIMITER` value (without beginning new columns). To include the `QUOTE` character itself in column, wrap the column's value in the `QUOTE` character and prefix all instance of the value you want to literally interpret with the `ESCAPE` value. _`FORMAT CSV` only_
`ESCAPE` | Single-quoted strings | `QUOTE`'s value | Specifies the character to allow instances of the `QUOTE` character to be parsed literally as part of a column's value. _`FORMAT CSV` only_
`HEADER` | `boolean` | `false` | Specifies that the file contains a header line with the names of each column in the file. The first line is ignored on input. _`FORMAT CSV` only._
`AWS CONNECTION` | _connection_name_ | | The name of the AWS connection to use in the `COPY FROM` command. If using an s3 URI, must be specified. For details on creating connections, check the [`CREATE CONNECTION`](/sql/create-connection/#aws) documentation page. _Only valid with an S3._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with an S3 should probably just be with S3

`large_binary` | `BYTE_ARRAY` | | [`bytea`](/sql/types/bytea/)
`large_utf8` | `BYTE_ARRAY` | | [`jsonb`](/sql/types/jsonb/)
`list` | Nested | | [`list`](/sql/types/list/)
CUSTOM TYPE `struct` | Nested | | [Arrays](/sql/types/array/) (`[]`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove CUSTOM TYPE

`uint32` | `INT32` | `INT(32, false)` | [`uint4`](/sql/types/uint/#uint4-info)
`uint64` | `INT64` | `INT(64, false)` | [`uint8`](/sql/types/uint/#uint8-info)
`utf8` or `large_utf8` | `BYTE_ARRAY` | `STRING` | [`text`](/sql/types/text/)
(COMING SOON) `map` (`struct` with fields `keys` and `values`) | Nested | `MAP` | [`map`](/sql/types/map/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this get marked as unsupported along with interval?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants