Fix DLQWriteTransform: DLQ file overwrites by adding window token to shard template#3465
Fix DLQWriteTransform: DLQ file overwrites by adding window token to shard template#3465darshan-sj wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3465 +/- ##
============================================
- Coverage 54.06% 51.80% -2.27%
- Complexity 1937 5893 +3956
============================================
Files 536 1029 +493
Lines 30502 62342 +31840
Branches 3269 6836 +3567
============================================
+ Hits 16492 32297 +15805
- Misses 13027 27816 +14789
- Partials 983 2229 +1246
🚀 New features to boost your workflow:
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical data integrity issue within the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
This PR fixes a bug in DLQWriteTransform
The Problem
DLQWriteTransform applies a 1-minute
FixedWindowsbefore writing to GCS. However, the getShardTemplate() method previously returned a shard template (-SSSSS-of-NNNNNor-P-SSSSS-of-NNNNN) that lacked a window token.Because the window token was missing, files generated in different 1-minute time windows but assigned to the same shard number were given identical filenames. This resulted in newer files silently overwriting older files in the GCS bucket, leading to a discrepancy between the expected and actual number of DLQ events.
The Solution
This PR adds the
-W(window) token to the getShardTemplate() method.By including the
-Wtoken, theDefaultFilenamePolicynow correctly resolves the window start and end times (in ISO-8601 format) and injects them into the filename. This ensures that every DLQ file generated across different time windows has a globally unique name, preventing any overwrites.Example of the new filename format:
[prefix]-2026-03-10T06:47:00.000Z-2026-03-10T06:48:00.000Z-00005-of-00020.jsonGCS Quota Considerations
The addition of the
-Wtoken adds exactly 49 bytes to the filename.While this decreases the available filepath/filename length for customers by 49 characters, it is highly unlikely to cause
Invalid Object Nameerrors and is a necessary trade-off to ensure data integrity.Testing
MySQLAllDataTypesBulkAndLiveIT