-
Notifications
You must be signed in to change notification settings - Fork 79
Add -f codepage flag for input/output encoding #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add -f/--code-page flag with ODBC-compatible format parsing - Support 50+ codepages: Unicode, Windows, OEM/DOS, ISO-8859, CJK, EBCDIC, Macintosh - Apply input codepage in IncludeFile() for :r command - Apply output codepage in outCommand() for :OUT file writes - Add --list-codepages flag to display all supported codepages - Add comprehensive unit tests for parsing and encoding lookup
- Fix nil encoding panic in errorCommand when OutputCodePage is 65001 (UTF-8) - Close file handle in outCommand when GetEncoding returns an error - Handle close error properly in errorCommand - Apply UTF-8 BOM stripping when input codepage is 65001 - Fix test subtest names to use strconv.Itoa instead of string(rune)
- Use localizer.Errorf for all user-facing error messages - Fix UTF-16 BOM handling using ExpectBOM for input decoding - Add transformWriteCloser to properly close underlying file handles - Use transformWriteCloser in outCommand and errorCommand for both UnicodeOutputFile and CodePage transforms to prevent file handle leaks - Add integration tests for output/error codepage encoding
Use strconv.Itoa instead of %d to avoid locale-specific thousands separators in error message.
- Create codepageRegistry map as single source of truth for codepages - GetEncoding() now uses the registry instead of switch statement - SupportedCodePages() now generates list from registry - Removes duplicate codepage definitions between the two functions - Sort SupportedCodePages result by codepage number for consistency
- Verify all returned codepages are valid in GetEncoding - Ensure results are sorted by codepage number - Check well-known codepages are present
- Fix IncludeFile to strip BOM from UTF-16 encoded files using BOMOverride - Add Windows API fallback (MultiByteToWideChar/WideCharToMultiByte) for codepages not in built-in registry (e.g., Japanese EBCDIC 20290) - Add helpful error on non-Windows when codepage not in registry - Add TestIncludeFileWithInputCodePage for Windows-1252 and UTF-16 LE/BE - Add TestGetEncodingWindowsFallback to verify Windows API fallback - Update README.md to document Windows codepage availability - Update code comments to document cross-platform vs Windows-only support
- Change UTF-16 BOM handling from IgnoreBOM to UseBOM for codepages 1200 (UTF-16 LE) and 1201 (UTF-16 BE) to properly strip BOMs on decode (pkg/sqlcmd/codepage.go) - Eliminate redundant CodePage parsing by storing parsed settings in codePageSettings field after validation in Validate(), then reusing in run() (cmd/sqlcmd/sqlcmd.go) - Add comprehensive Code Page Support documentation section to README.md with format guide, common codepages table, practical examples, and notes on default behavior Note: Integration test for IncludeFile with non-UTF8 input already exists in TestIncludeFileWithInputCodePage (pkg/sqlcmd/sqlcmd_test.go)
- Add validation to error when -f arg is non-empty but no codepage parsed (e.g., ',' or whitespace-only input) - Add unit tests for comma-only, whitespace-only, and multiple-comma inputs - Fix misleading BOM comments to accurately describe BOMOverride behavior - Remove unused skipOnEncError field from test table
|
Supersedes #627 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds configurable input/output code page handling to the legacy sqlcmd CLI, improving compatibility with ODBC sqlcmd and enabling legacy-encoding workflows.
Changes:
- Introduces a central code page registry and parsing logic (
-f/--code-page) with support for many Windows, OEM, ISO-8859, CJK, and EBCDIC encodings and a--list-codepageslisting mode. - Wires code page settings through the legacy CLI into the core
Sqlcmdengine so:R,:OUT, and:ERRORcommands correctly decode input files and encode output/error files according to the configured code pages. - Adds comprehensive unit tests for code page parsing/lookup and CLI argument validation, plus README documentation and examples for the new functionality.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
pkg/sqlcmd/sqlcmd_test.go |
Adds tests to verify IncludeFile decodes input files correctly under various input code pages (Windows-1252, UTF-16LE/BE with and without BOM). |
pkg/sqlcmd/sqlcmd.go |
Extends Sqlcmd with a CodePage field and updates IncludeFile to select an appropriate io.Reader based on configured input code page or BOM auto-detection. |
pkg/sqlcmd/commands_test.go |
Adds tests confirming :OUT and :ERROR commands honor the configured output code page when writing to files. |
pkg/sqlcmd/commands.go |
Introduces a transformWriteCloser helper and updates :OUT/:ERROR commands to encode output using the selected code page while ensuring underlying files are closed correctly. |
pkg/sqlcmd/codepage_windows.go |
Implements Windows-specific encoding.Encoding support backed by MultiByteToWideChar/WideCharToMultiByte for system-installed code pages not in the built-in registry. |
pkg/sqlcmd/codepage_test.go |
Adds tests for ParseCodePage, GetEncoding, SupportedCodePages, and Windows-specific fallback behavior, including round-trip checks for a Windows-only EBCDIC code page. |
pkg/sqlcmd/codepage_other.go |
Provides a non-Windows implementation of getSystemCodePageEncoding that surfaces a clear error for unsupported system code pages. |
pkg/sqlcmd/codepage.go |
Defines the cross-platform code page registry, CodePageSettings, parsing/validation logic for -f, and a SupportedCodePages API used by the CLI. |
cmd/sqlcmd/sqlcmd_test.go |
Extends CLI argument tests to cover -f / --code-page, --list-codepages, and various invalid code page argument combinations. |
cmd/sqlcmd/sqlcmd.go |
Adds CodePage and ListCodePages arguments, validates and stores parsed CodePageSettings, applies them to the Sqlcmd instance, and implements --list-codepages output formatting. |
README.md |
Documents the new -f flag, describes supported code pages, provides usage examples, and explains --list-codepages and default BOM auto-detection behavior. |
Use strconv.Itoa instead of %d format verb to avoid locale-based number formatting that adds thousands separators (99,999 vs 99999). This ensures consistent error messages across all platforms.
Address Copilot review: The windowsDecoder and windowsEncoder now properly handle the atEOF parameter and buffer incomplete sequences between Transform calls. This ensures correct behavior when transform.Reader/Writer splits multibyte sequences across chunks. Changes: - Add buffer fields to windowsDecoder and windowsEncoder structs - Use MB_ERR_INVALID_CHARS to detect incomplete sequences in decoder - Use utf8.Valid to detect incomplete UTF-8 sequences in encoder - Return transform.ErrShortSrc when more input is needed - Return error for incomplete sequences at EOF - Add TestWindowsEncodingStreaming test for streaming behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
- Fix getSystemCodePageEncoding comment to describe error return behavior - Clarify test comment in TestWindowsEncodingStreaming to match actual coverage - Improve comments in IncludeFile explaining BOMOverride behavior - Update README notes to clarify that --list-codepages shows built-in set only and that Windows may have additional codepages via OS API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
Summary
Implements the
-fflag from ODBC sqlcmd for specifying input and output file code pages.Changes
CodePageandListCodePagesfields,-f/--code-pageflag, validation, and runtime application:Rcommand to respect codepage settings when reading filesCodePagefield to Sqlcmd structUsage
Supported Codepages
Testing
Improves ODBC sqlcmd compatibility.