Skip to content

feat(Datastream): Add SQL Server (MSSQL) source support#3396

Open
pabloqc wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
pabloqc:feat/add-sqlserver-support
Open

feat(Datastream): Add SQL Server (MSSQL) source support#3396
pabloqc wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
pabloqc:feat/add-sqlserver-support

Conversation

@pabloqc
Copy link

@pabloqc pabloqc commented Feb 22, 2026

Summary

  • Add full SQL Server CDC support to the Datastream-to-BigQuery pipeline, resolving the Source Connection Profile Type Not Supported error when using SQL Server as a Datastream source
  • Add SQL Server branch in the Avro format processing path (FormatDatastreamRecordToJson) for correct metadata extraction (_metadata_schema, _metadata_lsn, _metadata_tx_id)
  • Add SQL Server sort key definitions (_metadata_timestamp, _metadata_lsn) used by BigQuery MERGE operations
  • Add SQL Server schema discovery via the Datastream API (DataStreamClient), including table/column discovery, primary key extraction, and SQL Server-to-BigQuery type conversion
  • Add _metadata_lsn to BigQuery default staging table schema
  • Update template documentation to list SQL Server as a supported source

Test plan

  • Added testSqlServerSortFields in DatastreamRowTest — verifies sort keys are ["_metadata_timestamp", "_metadata_lsn"]
  • Added testProcessElement_sqlServer in FormatDatastreamJsonToJsonTest — verifies JSON format processing produces correct SQL Server metadata fields
  • All 7 tests pass (mvn test -pl v2/datastream-common)
  • Compilation succeeds across all affected modules (mvn compile -pl v2/datastream-common,v2/common,v2/datastream-to-bigquery -am)

@pabloqc pabloqc requested review from a team as code owners February 22, 2026 15:20
@google-cla
Copy link

google-cla bot commented Feb 22, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link

Summary of Changes

Hello @pabloqc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Datastream-to-BigQuery pipeline by integrating comprehensive support for SQL Server as a source database. It enables seamless data replication and change data capture from SQL Server instances to BigQuery, ensuring proper schema discovery, metadata handling, and data type conversions. This expansion broadens the utility of the pipeline for users relying on SQL Server for their data warehousing needs.

Highlights

  • SQL Server CDC Support: Full SQL Server Change Data Capture (CDC) support has been added to the Datastream-to-BigQuery pipeline, resolving the 'Source Connection Profile Type Not Supported' error.
  • Metadata Extraction: SQL Server specific metadata fields such as _metadata_schema, _metadata_lsn, and _metadata_tx_id are now correctly extracted during Avro format processing.
  • Sort Key Definitions: Sort key definitions (_metadata_timestamp, _metadata_lsn) for SQL Server have been added to facilitate BigQuery MERGE operations.
  • Schema Discovery: The Datastream API (DataStreamClient) now supports SQL Server schema discovery, including table/column identification, primary key extraction, and SQL Server-to-BigQuery type conversion.
  • BigQuery Schema Update: The _metadata_lsn field has been added to the default BigQuery staging table schema.
  • Documentation Update: Template documentation has been updated to list SQL Server as a newly supported source.
Changelog
  • v2/common/src/main/java/com/google/cloud/teleport/v2/cdc/mappers/BigQueryDefaultSchemas.java
    • Added _metadata_lsn field to the default BigQuery schema for PostgreSQL and SQL Server.
  • v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java
    • Implemented SQL Server specific metadata extraction for _metadata_schema, _metadata_lsn, and _metadata_tx_id.
  • v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/utils/DataStreamClient.java
    • Imported SQL Server model classes.
    • Extended schema discovery logic to include SQL Server.
    • Added methods for retrieving SQL Server primary keys.
    • Implemented SQL Server to BigQuery data type conversion.
  • v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/values/DatastreamRow.java
    • Configured _metadata_timestamp and _metadata_lsn as sort fields for SQL Server source types.
  • v2/datastream-common/src/test/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamJsonToJsonTest.java
    • Added example SQL Server JSON data.
    • Introduced a new test method testProcessElement_sqlServer to verify SQL Server JSON record processing.
  • v2/datastream-common/src/test/java/com/google/cloud/teleport/v2/datastream/values/DatastreamRowTest.java
    • Added testSqlServerSortFields to verify the correct sort fields are returned for SQL Server.
  • v2/datastream-to-bigquery/src/main/java/com/google/cloud/teleport/v2/templates/DataStreamToBigQuery.java
    • Updated template documentation to include SQL Server as a supported source database.
    • Modified help text for datastreamSourceType to include 'sqlserver' as a valid override value.
Activity
  • The author added testSqlServerSortFields in DatastreamRowTest to verify SQL Server sort keys.
  • The author added testProcessElement_sqlServer in FormatDatastreamJsonToJsonTest to verify SQL Server JSON format processing.
  • All 7 existing tests passed successfully.
  • Compilation succeeded across all affected modules (v2/datastream-common, v2/common, v2/datastream-to-bigquery).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

…-to-BigQuery pipeline

Add full SQL Server CDC support including Avro format processing,
sort key definitions, schema discovery via Datastream API, BigQuery
metadata schema, and type conversion mappings.
@pabloqc pabloqc force-pushed the feat/add-sqlserver-support branch from 01e4e94 to 8155b42 Compare February 22, 2026 15:28
@pabloqc pabloqc force-pushed the feat/add-sqlserver-support branch from 0b624da to 41b1627 Compare March 9, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant