Translation infrastructure: style guides, glossary audit, and robustness improvements#12401
Merged
Translation infrastructure: style guides, glossary audit, and robustness improvements#12401
Conversation
Create scripts/styleguides/*.md with language-specific rules (gender conventions, register, terminology preferences). Move per-language rules out of the shared prompt into individual files. Wire style guides into both translate and review passes via load_styleguide(). This makes it easy to add language-specific feedback (e.g., from the Japan team) without bloating the shared translation prompt. Made-with: Cursor
Fixes silent truncation of long files (e.g., integrations.md at 738 lines) by quadrupling the token limit and logging a warning when the limit is hit. Made-with: Cursor
Files exceeding 130 KB (configurable via TRANSLATION_MAX_FILE_KB) are skipped with a warning in the workflow logs and listed in the PR summary so they don't silently go untranslated. Made-with: Cursor
The Anthropic SDK requires streaming for operations with high max_tokens that may exceed 10 minutes. Replaces client.messages.create() with client.messages.stream() to fix all-tasks-failing error. Made-with: Cursor
Made-with: Cursor
…m UI - New: scripts/audit_glossaries.py compares glossary entries against platform dashboard, Android SDK, Swift SDK, and GrapesJS locale files to detect mismatches and missing high-value terms. - New: .github/workflows/audit-glossaries.yml runs the audit weekly (Sundays) or on-demand, creating a GitHub Issue with findings. - Fix: Add "Everyone Else" to all 6 glossaries from platform source (e.g., ja: その他のユーザー, de: Alle anderen). - Fix: 33 clear-cut glossary corrections where platform UI translations differed (e.g., de: Aktions-Pfade→Aktionspfade, ko: SDK kept as SDK). - Add glossary_audit_report.* to .gitignore. Made-with: Cursor
Terms sourced from the platform dashboard locale files that appear frequently in docs content but were missing from the translation glossaries. Deduplicated against existing entries (case-insensitive). Per language: de +173, es +172, fr +171, ja +172, ko +169, pt-br +173. Made-with: Cursor
Contributor
|
🤖 Automated Reviewer Assignment: I have automatically added reviewers based on the following:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors the translation prompting system to support per-language style guidance and expands glossary/automation tooling to improve translation consistency across docs.
Changes:
- Added per-language style guide markdown files under
scripts/styleguides/and updated the shared translation prompt to defer language-specific rules to the appended style guide. - Updated
scripts/auto_translate.pyto load and append language style guides (and to skip translating oversized files with reporting in the PR summary). - Expanded multiple language glossaries and introduced a new glossary-audit script + scheduled workflow to detect drift vs source UI localization files.
Reviewed changes
Copilot reviewed 14 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/translation_prompt.md |
Removes embedded language-specific rules and references appended language style guides. |
scripts/styleguides/pt-br.md |
Adds pt-BR-specific gender/register rules. |
scripts/styleguides/ko.md |
Adds Korean register/tone guidance. |
scripts/styleguides/ja.md |
Adds Japanese register/tone guidance. |
scripts/styleguides/fr.md |
Adds French brand-article guidance + register/tone. |
scripts/styleguides/es.md |
Adds Spanish brand-article guidance + terminology + register/tone. |
scripts/styleguides/de.md |
Adds German brand-article guidance + register/tone. |
scripts/glossaries/pt-br.json |
Adds many new pt-BR glossary entries for UI/term consistency. |
scripts/glossaries/ko.json |
Adds many new Korean glossary entries for UI/term consistency. |
scripts/glossaries/ja.json |
Adds many new Japanese glossary entries for UI/term consistency. |
scripts/glossaries/fr.json |
Adds many new French glossary entries for UI/term consistency. |
scripts/glossaries/es.json |
Adds many new Spanish glossary entries for UI/term consistency. |
scripts/glossaries/de.json |
Adds many new German glossary entries for UI/term consistency. |
scripts/auto_translate.py |
Appends style guides to translate/review prompts; adds file-size skipping and summary reporting; switches Claude calls to streaming. |
scripts/audit_glossaries.py |
New script to compare doc glossaries against platform/SDK/GrapesJS locale sources and generate JSON/MD reports. |
.gitignore |
Ignores generated glossary audit reports. |
.github/workflows/auto-translate.yml |
Comments out orphaned-translation cleanup step in the auto-translate workflow. |
.github/workflows/audit-glossaries.yml |
New scheduled workflow to run glossary audits, then open/rotate issues when findings exist. |
Introduces structural quality checks that run after translation and before build verification. Auto-repairs front matter, code blocks, and URLs; flags Liquid tag mismatches, glossary compliance, completeness, and untranslated blocks. Results are summarized in the PR body. Lazy-loads the Anthropic SDK so qc and summary commands work without it. Made-with: Cursor
Remove non-breaking spaces (U+00A0) from glossary values in fr.json (4 entries), de.json (1 entry), and es.json (1 trailing space). Add explanatory comment for disabled orphan cleanup step in workflow. Made-with: Cursor
bre-fitzgerald
pushed a commit
that referenced
this pull request
Mar 4, 2026
…ess improvements (#12401)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes changes to the auto-translation workflow.
Summary
This PR adds several improvements to the automated translation infrastructure: per-language style guides, a glossary audit system that compares our glossaries against the Braze platform UI source of truth, and robustness improvements for handling large files.
Changes
Per-language style guides (
scripts/styleguides/*.md)translation_prompt.mdinto individual files so feedback from regional partners can be incorporated per-language without bloating the shared promptload_styleguide()inauto_translate.pytranslation_prompt.mdupdated to reference appended style guides instead of embedding per-language rules inlineGlossary audit system
scripts/audit_glossaries.py— Compares glossary entries against source-of-truth localization files from:.github/workflows/audit-glossaries.yml— Runs weekly (Sundays) or on-demand, clones reference repos, runs the audit, and creates a GitHub Issue with findingsREFERENCE_REPO_TOKENsecret (fine-grained PAT with Contents read access to platform/SDK repos)Glossary updates (sourced from platform UI)
Robustness improvements to
auto_translate.pyTRANSLATION_MAX_TOKENS) to prevent silent truncation of long filesTRANSLATION_MAX_FILE_KB) are skipped with clear logging and included in the PR summaryclient.messages.stream()instead ofclient.messages.create()to comply with Anthropic's requirements for largemax_tokensoperationsOther
glossary_audit_report.*to.gitignoreFiles changed
scripts/styleguides/*.md(6 files)scripts/audit_glossaries.py.github/workflows/audit-glossaries.ymlscripts/auto_translate.pyscripts/translation_prompt.mdscripts/glossaries/*.json(6 files).github/workflows/auto-translate.yml.gitignoreSetup required
After merging, add a
REFERENCE_REPO_TOKENsecret to the repo (fine-grained GitHub PAT with Contents: Read-only access toAppboy/platform,braze-inc/grapesjs,braze-inc/braze-android-sdk,braze-inc/braze-swift-sdk).