Skip to content

Commit 30d3546

Browse files
feat: Support schemas in the migration tool (#4236)
- Add schema support to the migration tool. TODO - Add an acceptance test. - Rename the handleIf func
1 parent a8b6122 commit 30d3546

File tree

9 files changed

+560
-11
lines changed

9 files changed

+560
-11
lines changed

pkg/scripts/migration_script/CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ We would appreciate that any planned changes are discussed first (e.g., in an is
1919

2020
If you want to add a new object type, you need to define it in the migration script.
2121
The [`program.go`](./program.go) file contains the main logic for the user interactions with the migration script through the terminal.
22-
At the top of the file, you need to add a new constant for the object type. Remember to reflect this change
22+
At the top of the file, you need to add a new constant for the object type. Remember to reflect this change
2323
in the help text (in `parseInputArguments` method for the Program struct) and the readme file ([syntax section](./README.md#syntax)).
2424

2525
Next, you can use the newly defined object type in the `generateOutput` Program method.
@@ -65,13 +65,13 @@ type databaseRow struct {
6565
Under schema definition, you need to provide a conversion function that takes the CSV schema struct as input and returns the corresponding SDK object.
6666
In our case, it would be `func (row databaseRow) convert() (*sdk.Database, error)`.
6767

68-
> The SDK's databaseRow has the same convert method.
68+
> The SDK's databaseRow has the same convert method.
6969
> It's worth checking as it may contain the necessary parts for converting Snowflake output for certain cases.
7070
7171
## 3. Providing an object migration function
7272

7373
Now, you need to provide a function that would take the CSV input and return the generated resources and imports in the form of string (see `HandleGrants` function).
74-
The file with mapping function should be placed in the file named `<object_type>_migration.go`, where `<object_type>` is the name of the object type you are working with.
74+
The file with mapping function should be placed in the file named `mappings_<object_type>.go`, where `<object_type>` is the name of the object type you are working with.
7575

7676
At the top of the function, you need to parse the CSV input into the CSV schema struct you have defined in the previous step.
7777
This is done by the predefined `ConvertCsvInput` function (see `HandleGrants` for example usage).
@@ -81,7 +81,7 @@ To do this, you should use the model package in our project that contains the lo
8181
It should contain the resource model struct and functions for transforming the model (if not you should add them, look at [generators documentation](https://github.com/snowflakedb/terraform-provider-snowflake/blob/main/pkg/acceptance/bettertestspoc/README.md)).
8282
They in combination with `TransformResourceModel` function produce the final resource definitions output.
8383

84-
To generate the import statements or blocks, you should use the `TransformImportModel` function that expects you
84+
To generate the import statements or blocks, you should use the `TransformImportModel` function that expects you
8585
to provide the import model with resource address and identifier used for importing a given object.
8686
You should look at given resource documentation to understand how to construct the resource import (e.g., https://registry.terraform.io/providers/snowflakedb/snowflake/latest/docs/resources/database#import).
8787

pkg/scripts/migration_script/README.md

Lines changed: 86 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,12 @@ where script options are:
6969
Limitations:
7070
- grants on 'future' or on 'all' objects are not supported
7171
- all_privileges and always_apply fields are not supported
72+
- `schemas` which expects a converted CSV output from the snowflake_schemas data source
73+
To support object parameters, one should use the SHOW PARAMETERS output, and combine it with the SHOW SCHEMA output, so the CSV header looks like `"comment","created_on",...,"catalog_value","catalog_level","data_retention_time_in_days_value","data_retention_time_in_days_level",...`
74+
When the additional columns are present, the resulting resource will have the parameters values, if the parameter level is set to "SCHEMA".
75+
For more details about using multiple sources, visit the [Multiple sources section](#multiple-sources).
76+
Supported resources:
77+
- snowflake_schema
7278
- **INPUT**:
7379
- Migration script operates on STDIN input in CSV format. You can redirect the input from a file or pipe it from another command.
7480
- **OUTPUT**:
@@ -288,7 +294,7 @@ No changes. Your infrastructure matches the configuration.
288294
#### 5. Update generated resources
289295
290296
The last step is optional, but highly recommended. The generated resources have generic names, which are not very user-friendly,
291-
and they do not depend on the existing role resources which they refer to in their configuration.
297+
and they do not depend on the existing role resources which they refer to in their configuration.
292298
293299
To rename the resources, you can use the [terraform state mv](https://developer.hashicorp.com/terraform/cli/commands/state/mv) command or [moved block](https://developer.hashicorp.com/terraform/language/moved).
294300
@@ -377,7 +383,7 @@ SHOW GRANTS ON ACCOUNT;
377383
and filter the output to only include the grants to the roles we are interested in.
378384
379385
> If you use SnowSight, you can click on the "Download" button and select "CSV" format.
380-
>
386+
>
381387
> ![Download CSV button in SnowSight](./images/csv_output_download.png)
382388
383389
Whatever way you choose, save the output to a CSV file as `example.csv`.
@@ -526,6 +532,84 @@ Remember that, if you chose to use the import block approach, [after importing y
526532
527533
By following the above steps, you can migrate other existing Snowflake objects into Terraform and start managing them!
528534
535+
### Multiple sources
536+
Some Snowflake objects (like schemas) have fields returned from more than one SQL command. That's why simply using one `SHOW ...` output will not work. Fields from `DESCRIBE` or `SHOW PARAMETERS` must be also processed.
537+
But outputs from all of these commands must be mapped to the input CSV value of the migration script. To make this easy, we can use a data source output for a given object, which already has the logic for mapping multiple
538+
SQL queries and returning all necessary information.
539+
540+
In general, what we need to do is:
541+
1. Define a data source for the objects you want to import.
542+
1. Use HCL (Terraform's configuration language) to transform the data: merge `show_output` with flattened `parameters` for each object.
543+
1. Write the transformed data to a CSV file using the `local_file` resource.
544+
1. Run the migration script with the generated CSV file.
545+
546+
> **Note:** It's recommended to create a fresh Terraform environment (e.g., a new local directory with a clean state) for this data extraction process to avoid collisions with any existing data sources or state in your main Terraform workspace.
547+
548+
Now, let's look into more details.
549+
550+
As an example, let's import all schemas in a given database. First, we need to define a data source for schemas and use Terraform's HCL to transform the data into CSV format:
551+
552+
```terraform
553+
terraform {
554+
required_providers {
555+
snowflake = {
556+
source = "Snowflake-Labs/snowflake"
557+
}
558+
local = {
559+
source = "hashicorp/local"
560+
}
561+
}
562+
}
563+
564+
data "snowflake_schemas" "test" {
565+
in {
566+
database = "DATABASE"
567+
}
568+
}
569+
570+
locals {
571+
# Transform each schema by merging show_output and flattened parameters
572+
schemas_flattened = [
573+
for schema in data.snowflake_schemas.test.schemas : merge(
574+
schema.show_output[0],
575+
# Flatten parameters: convert each parameter to {param_name}_value and {param_name}_level
576+
{
577+
for param_key, param_values in schema.parameters[0] :
578+
param_key => param_values[0].value
579+
},
580+
{
581+
for param_key, param_values in schema.parameters[0] :
582+
"${param_key}_level" => param_values[0].level
583+
}
584+
)
585+
]
586+
587+
# Get all unique keys from the first schema to create CSV header
588+
csv_header = join(",", [for key in keys(local.schemas_flattened[0]) : "\"${key}\""])
589+
590+
# Convert each schema object to CSV row
591+
csv_rows = [
592+
for schema in local.schemas_flattened :
593+
join(",", [for key in keys(local.schemas_flattened[0]) : "\"${lookup(schema, key, "")}\""])
594+
]
595+
596+
# Combine header and rows
597+
csv_content = join("\n", concat([local.csv_header], local.csv_rows))
598+
}
599+
600+
resource "local_file" "schemas_csv" {
601+
content = local.csv_content
602+
filename = "${path.module}/schemas.csv"
603+
}
604+
```
605+
606+
After running `terraform apply`, the CSV file will be automatically generated at `schemas.csv`. Now, we can run the migration script like:
607+
```shell
608+
go run github.com/Snowflake-Labs/terraform-provider-snowflake/pkg/scripts/migration_script@main -import=block schemas < ./schemas.csv
609+
```
610+
611+
This will output the generated configuration and import blocks for the specified schemas.
612+
529613
## Limitations
530614
531615
### Generated resource names
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
package main
2+
3+
import (
4+
"strconv"
5+
6+
"github.com/Snowflake-Labs/terraform-provider-snowflake/pkg/acceptance/bettertestspoc/config/model"
7+
"github.com/Snowflake-Labs/terraform-provider-snowflake/pkg/sdk"
8+
)
9+
10+
func handleOptionalFieldWithBuilder[T any](parameter *T, builder func(T) *model.SchemaModel) {
11+
if parameter != nil {
12+
builder(*parameter)
13+
}
14+
}
15+
16+
func handleIfNotEmpty(value string, builder func(string) *model.SchemaModel) {
17+
if value != "" {
18+
builder(value)
19+
}
20+
}
21+
22+
func handleIf(condition bool, builder func(string) *model.SchemaModel) {
23+
if condition {
24+
builder("true")
25+
}
26+
}
27+
28+
type parameterHandler struct {
29+
level sdk.ParameterType
30+
}
31+
32+
func newParameterHandler(level sdk.ParameterType) parameterHandler {
33+
return parameterHandler{
34+
level: level,
35+
}
36+
}
37+
38+
func (h *parameterHandler) handleIntegerParameter(level sdk.ParameterType, value string, setField **int) error {
39+
if h.level != level {
40+
return nil
41+
}
42+
v, err := strconv.Atoi(value)
43+
if err != nil {
44+
return err
45+
}
46+
*setField = &v
47+
return nil
48+
}
49+
50+
func (h *parameterHandler) handleBooleanParameter(level sdk.ParameterType, value string, setField **bool) error {
51+
if h.level != level {
52+
return nil
53+
}
54+
b, err := strconv.ParseBool(value)
55+
if err != nil {
56+
return err
57+
}
58+
*setField = &b
59+
return nil
60+
}
61+
62+
func (h *parameterHandler) handleStringParameter(level sdk.ParameterType, value string, setField **string) error {
63+
if h.level != level {
64+
return nil
65+
}
66+
*setField = &value
67+
return nil
68+
}
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
package main
2+
3+
import (
4+
"errors"
5+
6+
"github.com/Snowflake-Labs/terraform-provider-snowflake/pkg/sdk"
7+
)
8+
9+
var _ ConvertibleCsvRow[SchemaRepresentation] = new(SchemaCsvRow)
10+
11+
type SchemaCsvRow struct {
12+
Comment string `csv:"comment"`
13+
CreatedOn string `csv:"created_on"`
14+
DatabaseName string `csv:"database_name"`
15+
DroppedOn string `csv:"dropped_on"`
16+
IsCurrent string `csv:"is_current"`
17+
IsDefault string `csv:"is_default"`
18+
Name string `csv:"name"`
19+
Options string `csv:"options"`
20+
Owner string `csv:"owner"`
21+
OwnerRoleType string `csv:"owner_role_type"`
22+
RetentionTime string `csv:"retention_time"`
23+
CatalogValue string `csv:"catalog_value"`
24+
CatalogLevel string `csv:"catalog_level"`
25+
DataRetentionTimeInDaysValue string `csv:"data_retention_time_in_days_value"`
26+
DataRetentionTimeInDaysLevel string `csv:"data_retention_time_in_days_level"`
27+
DefaultDdlCollationValue string `csv:"default_ddl_collation_value"`
28+
DefaultDdlCollationLevel string `csv:"default_ddl_collation_level"`
29+
EnableConsoleOutputValue string `csv:"enable_console_output_value"`
30+
EnableConsoleOutputLevel string `csv:"enable_console_output_level"`
31+
ExternalVolumeValue string `csv:"external_volume_value"`
32+
ExternalVolumeLevel string `csv:"external_volume_level"`
33+
LogLevelValue string `csv:"log_level_value"`
34+
LogLevelLevel string `csv:"log_level_level"`
35+
MaxDataExtensionTimeInDaysValue string `csv:"max_data_extension_time_in_days_value"`
36+
MaxDataExtensionTimeInDaysLevel string `csv:"max_data_extension_time_in_days_level"`
37+
PipeExecutionPausedValue string `csv:"pipe_execution_paused_value"`
38+
PipeExecutionPausedLevel string `csv:"pipe_execution_paused_level"`
39+
QuotedIdentifiersIgnoreCaseValue string `csv:"quoted_identifiers_ignore_case_value"`
40+
QuotedIdentifiersIgnoreCaseLevel string `csv:"quoted_identifiers_ignore_case_level"`
41+
ReplaceInvalidCharactersValue string `csv:"replace_invalid_characters_value"`
42+
ReplaceInvalidCharactersLevel string `csv:"replace_invalid_characters_level"`
43+
StorageSerializationPolicyValue string `csv:"storage_serialization_policy_value"`
44+
StorageSerializationPolicyLevel string `csv:"storage_serialization_policy_level"`
45+
SuspendTaskAfterNumFailuresValue string `csv:"suspend_task_after_num_failures_value"`
46+
SuspendTaskAfterNumFailuresLevel string `csv:"suspend_task_after_num_failures_level"`
47+
TaskAutoRetryAttemptsValue string `csv:"task_auto_retry_attempts_value"`
48+
TaskAutoRetryAttemptsLevel string `csv:"task_auto_retry_attempts_level"`
49+
TraceLevelValue string `csv:"trace_level_value"`
50+
TraceLevelLevel string `csv:"trace_level_level"`
51+
UserTaskManagedInitialWarehouseSizeValue string `csv:"user_task_managed_initial_warehouse_size_value"`
52+
UserTaskManagedInitialWarehouseSizeLevel string `csv:"user_task_managed_initial_warehouse_size_level"`
53+
UserTaskMinimumTriggerIntervalInSecondsValue string `csv:"user_task_minimum_trigger_interval_in_seconds_value"`
54+
UserTaskMinimumTriggerIntervalInSecondsLevel string `csv:"user_task_minimum_trigger_interval_in_seconds_level"`
55+
UserTaskTimeoutMsValue string `csv:"user_task_timeout_ms_value"`
56+
UserTaskTimeoutMsLevel string `csv:"user_task_timeout_ms_level"`
57+
}
58+
59+
type SchemaRepresentation struct {
60+
sdk.Schema
61+
62+
// parameters
63+
Catalog *string
64+
DataRetentionTimeInDays *int
65+
DefaultDdlCollation *string
66+
EnableConsoleOutput *bool
67+
ExternalVolume *string
68+
LogLevel *string
69+
MaxDataExtensionTimeInDays *int
70+
PipeExecutionPaused *bool
71+
QuotedIdentifiersIgnoreCase *bool
72+
ReplaceInvalidCharacters *bool
73+
StorageSerializationPolicy *string
74+
SuspendTaskAfterNumFailures *int
75+
TaskAutoRetryAttempts *int
76+
TraceLevel *string
77+
UserTaskManagedInitialWarehouseSize *string
78+
UserTaskMinimumTriggerIntervalInSeconds *int
79+
UserTaskTimeoutMs *int
80+
}
81+
82+
func (row SchemaCsvRow) convert() (*SchemaRepresentation, error) {
83+
schemaRepresentation := &SchemaRepresentation{
84+
Schema: sdk.Schema{
85+
Name: row.Name,
86+
IsDefault: row.IsDefault == "Y",
87+
IsCurrent: row.IsCurrent == "Y",
88+
DatabaseName: row.DatabaseName,
89+
Owner: row.Owner,
90+
Comment: row.Comment,
91+
RetentionTime: row.RetentionTime,
92+
OwnerRoleType: row.OwnerRoleType,
93+
},
94+
}
95+
if row.Options != "" {
96+
schemaRepresentation.Options = &row.Options
97+
}
98+
99+
handler := newParameterHandler(sdk.ParameterTypeSchema)
100+
errs := errors.Join(
101+
handler.handleIntegerParameter(sdk.ParameterType(row.DataRetentionTimeInDaysLevel), row.DataRetentionTimeInDaysValue, &schemaRepresentation.DataRetentionTimeInDays),
102+
handler.handleIntegerParameter(sdk.ParameterType(row.MaxDataExtensionTimeInDaysLevel), row.MaxDataExtensionTimeInDaysValue, &schemaRepresentation.MaxDataExtensionTimeInDays),
103+
handler.handleStringParameter(sdk.ParameterType(row.ExternalVolumeLevel), row.ExternalVolumeValue, &schemaRepresentation.ExternalVolume),
104+
handler.handleStringParameter(sdk.ParameterType(row.CatalogLevel), row.CatalogValue, &schemaRepresentation.Catalog),
105+
handler.handleBooleanParameter(sdk.ParameterType(row.PipeExecutionPausedLevel), row.PipeExecutionPausedValue, &schemaRepresentation.PipeExecutionPaused),
106+
handler.handleBooleanParameter(sdk.ParameterType(row.ReplaceInvalidCharactersLevel), row.ReplaceInvalidCharactersValue, &schemaRepresentation.ReplaceInvalidCharacters),
107+
handler.handleStringParameter(sdk.ParameterType(row.DefaultDdlCollationLevel), row.DefaultDdlCollationValue, &schemaRepresentation.DefaultDdlCollation),
108+
handler.handleStringParameter(sdk.ParameterType(row.StorageSerializationPolicyLevel), row.StorageSerializationPolicyValue, &schemaRepresentation.StorageSerializationPolicy),
109+
handler.handleStringParameter(sdk.ParameterType(row.LogLevelLevel), row.LogLevelValue, &schemaRepresentation.LogLevel),
110+
handler.handleStringParameter(sdk.ParameterType(row.TraceLevelLevel), row.TraceLevelValue, &schemaRepresentation.TraceLevel),
111+
handler.handleIntegerParameter(sdk.ParameterType(row.SuspendTaskAfterNumFailuresLevel), row.SuspendTaskAfterNumFailuresValue, &schemaRepresentation.SuspendTaskAfterNumFailures),
112+
handler.handleIntegerParameter(sdk.ParameterType(row.TaskAutoRetryAttemptsLevel), row.TaskAutoRetryAttemptsValue, &schemaRepresentation.TaskAutoRetryAttempts),
113+
handler.handleStringParameter(sdk.ParameterType(row.UserTaskManagedInitialWarehouseSizeLevel), row.UserTaskManagedInitialWarehouseSizeValue, &schemaRepresentation.UserTaskManagedInitialWarehouseSize),
114+
handler.handleIntegerParameter(sdk.ParameterType(row.UserTaskTimeoutMsLevel), row.UserTaskTimeoutMsValue, &schemaRepresentation.UserTaskTimeoutMs),
115+
handler.handleIntegerParameter(sdk.ParameterType(row.UserTaskMinimumTriggerIntervalInSecondsLevel), row.UserTaskMinimumTriggerIntervalInSecondsValue, &schemaRepresentation.UserTaskMinimumTriggerIntervalInSeconds),
116+
handler.handleBooleanParameter(sdk.ParameterType(row.QuotedIdentifiersIgnoreCaseLevel), row.QuotedIdentifiersIgnoreCaseValue, &schemaRepresentation.QuotedIdentifiersIgnoreCase),
117+
handler.handleBooleanParameter(sdk.ParameterType(row.EnableConsoleOutputLevel), row.EnableConsoleOutputValue, &schemaRepresentation.EnableConsoleOutput),
118+
)
119+
if errs != nil {
120+
return nil, errs
121+
}
122+
123+
return schemaRepresentation, nil
124+
}

0 commit comments

Comments
 (0)