Skip to content

fix: use Go-compatible JSON encoding for OCI manifests#348

Open
sajayantony wants to merge 7 commits intooras-project:mainfrom
sajayantony:agent/fix-plus-escaping-346
Open

fix: use Go-compatible JSON encoding for OCI manifests#348
sajayantony wants to merge 7 commits intooras-project:mainfrom
sajayantony:agent/fix-plus-escaping-346

Conversation

@sajayantony
Copy link
Contributor

Fix: + escaped as \u002B in JSON serialization

Fixes #346

Note

This PR was generated with GitHub Copilot CLI. The second commit trims and updates .github/copilot-instructions.md with learnings from this fix.

Problem

System.Text.Json's default JavaScriptEncoder escapes + as \u002B. OCI media types like application/vnd.oci.image.manifest.v1+json become ...v1\u002Bjson, producing different digests than oras-go and breaking CAS interoperability.

Root cause: JavaScriptEncoder.Default hardcodes + on its block list. AllowCharacter('+') doesn't work. UnsafeRelaxedJsonEscaping fixes + but stops escaping <>& — unsafe.

Solution

Custom JsonConverter classes with Utf8JsonWriter.WriteRawValue() and a Go-compatible escape function matching encoding/json.Marshal semantics.

Design

┌─────────────────────────────────────────────────────┐
│                    Call Sites                        │
│  Packer · ManifestStore · Repository · Registry     │
│  FetchableExtensions · ResponseException · Error    │
└───────────────────┬─────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────┐
│           OciJsonSerializer (internal)              │
│  ─────────────────────────────────────────────────  │
│  SerializeToUtf8Bytes<T>()  ← size guard (4 MiB)   │
│  Deserialize<T>(byte[])                             │
│  Deserialize<T>(string)                             │
│  DeserializeAsync<T>(Stream)                        │
│  GenerateIndex()   FormatErrorDetail()              │
│  ─────────────────────────────────────────────────  │
│  Private: s_options (cached JsonSerializerOptions)   │
│           EscapeJsonString() ← Go compat escaping   │
└───────────────────┬─────────────────────────────────┘
                    │ registers
        ┌───────────┴───────────┐
        ▼                       ▼
┌──────────────────┐  ┌─────────────────────────┐
│ OciStringConverter│  │ OciDictionaryConverter   │
│ JsonConverter     │  │ JsonConverter            │
│ <string>          │  │ <IDictionary<str,str>>   │
│                   │  │                          │
│ Write: escape +   │  │ Write: escape keys+vals  │
│   WriteRawValue() │  │   WriteRawValue()        │
│ Read: GetString() │  │ Read: loop key/value     │
└──────────────────┘  └─────────────────────────┘

Why two converters? JsonConverter<string> is NOT invoked for dictionary keys — only values. OciDictionaryConverter handles annotation dictionaries where both keys and values may contain +.

Escaping Rules (Go encoding/json.Marshal compatible)

Input Output Input Output
" \" \ \\
\b \f \n \r \t \b \f \n \r \t < > & \u003C \u003E \u0026
U+2028, U+2029 \u2028 \u2029 0x00–0x1F (other) \uXXXX
+ + (literal — the fix) all other literal

Changes

New files (all internal):

  • Serialization/OciJsonSerializer.cs — single entry point for all JSON ops
  • Serialization/OciStringConverter.cs — string value converter
  • Serialization/OciDictionaryConverter.cs — annotation dictionary converter
  • Content/OciLimits.cs — shared MaxManifestBytes constant (4 MiB)

Modified call sites:

  • Packer.cs, ManifestStore.cs, Repository.cs, Registry.cs, FetchableExtensions.cs, ResponseException.cs, Error.cs — all JSON through OciJsonSerializer
  • Index.csGenerateIndex() removed (moved to serializer)
  • CopyGraphOptions.cs, RepositoryOptions.cs — use shared OciLimits.MaxManifestBytes

Tests (36 new, 429 total passing):

  • OciStringConverterTest.cs — escape branch coverage + round-trip
  • OciDictionaryConverterTest.cs — keys, values, empty, null, invalid types, round-trip
  • ManifestSerializationTest.cs — JSON fixture deserialization, annotation fidelity, index, round-trip, size limit
  • PackerTest.cs — 2 regression tests checking raw bytes for literal +

Coverage: 99.1% line, 97.5% branch, 100% method for serialization code.

Copilot Instructions Update (second commit)

Trimmed .github/copilot-instructions.md from 128→92 lines (−28%) by removing duplicate sections (Go→.NET table, Async & I/O, Copilot Prompt Hints, Non-Goals) and adding serialization/testing guidance learned from this fix.

Implementation Plan

Approach: Custom JsonConverters with WriteRawValue

Register custom JsonConverter classes that:

  1. Escape strings using Go-compatible rules via EscapeJsonString()
  2. Write pre-escaped JSON via WriteRawValue() — bypasses STJ's encoder entirely
  3. Are registered in a private JsonSerializerOptions inside OciJsonSerializer

Key decisions

  • Layered architecture: All JSON serialization AND deserialization goes through OciJsonSerializer
  • Domain types clean: GenerateIndex() moved out of Index.cs, Serialize moved out of Error.ToString()
  • Internal only: Zero public API surface added
  • Size guard: 4 MiB limit on serialized output (matches oras-go)
  • Shared constant: OciLimits.MaxManifestBytes replaces duplicate private consts

What the original plan missed (retrospective)

  1. Dictionary keys bypass JsonConverter<string> — needed OciDictionaryConverter
  2. All deserialization sites also needed routing through the serializer for consistency
  3. Wire-format JSON fixtures are better than object-based test factories
  4. Dead code from defensive guards should be removed, not tested
  5. \x01b in C# is \x01B not \x01 + b — use \u0001 for control chars

Copilot AI review requested due to automatic review settings February 23, 2026 23:05
@sajayantony sajayantony marked this pull request as draft February 23, 2026 23:09
@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 52000ac to 017a00b Compare February 23, 2026 23:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical JSON serialization issue where System.Text.Json's default encoder escapes + as \u002B, breaking content-addressable storage (CAS) interoperability with oras-go and other OCI tools. The solution introduces a custom JSON serialization layer with Go-compatible escaping rules.

Changes:

  • Introduced internal OciJsonSerializer with custom converters (OciStringConverter, OciDictionaryConverter) that bypass .NET's default encoder to preserve literal + characters while maintaining proper escaping for quotes, backslashes, control characters, and HTML-sensitive characters (<, >, &)
  • Centralized all OCI content JSON operations through OciJsonSerializer across 7 source files
  • Moved GenerateIndex from Index class to OciJsonSerializer for architectural consistency
  • Consolidated duplicate 4 MiB size limit constants into shared OciLimits.MaxManifestBytes
  • Updated Copilot instructions with serialization conventions and testing best practices learned from this fix

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
OciJsonSerializer.cs Central serialization entry point with Go-compatible EscapeJsonString method and 4 MiB size guard
OciStringConverter.cs Custom JsonConverter<string> using WriteRawValue to bypass default encoder
OciDictionaryConverter.cs Custom dictionary converter handling both keys and values (framework doesn't invoke string converter for dict keys)
OciLimits.cs Shared 4 MiB manifest size constant
Packer.cs Updated to use OciJsonSerializer.SerializeToUtf8Bytes
ManifestStore.cs Updated to use OciJsonSerializer.DeserializeAsync and GenerateIndex
Repository.cs Updated to use OciJsonSerializer.Deserialize for OCI content
Registry.cs Updated to use OciJsonSerializer.Deserialize
FetchableExtensions.cs Updated to use OciJsonSerializer.Deserialize
ResponseException.cs Updated to use OciJsonSerializer.Deserialize for error responses
Error.cs Updated to use OciJsonSerializer.FormatErrorDetail
Index.cs Removed GenerateIndex method (moved to serializer)
CopyGraphOptions.cs Uses OciLimits.MaxManifestBytes instead of duplicate constant
RepositoryOptions.cs Uses OciLimits.MaxManifestBytes instead of duplicate constant
OciStringConverterTest.cs Tests escaping rules, round-trip, verifies literal + not \u002B
OciDictionaryConverterTest.cs Tests key/value escaping, null handling, invalid types, round-trip
ManifestSerializationTest.cs Wire-format JSON fixtures, annotation fidelity, size limit enforcement
PackerTest.cs Two new regression tests inspecting raw bytes for literal +
.github/copilot-instructions.md Trimmed 28%, added serialization/testing conventions

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 017a00b to 3f67c33 Compare February 23, 2026 23:12
@sajayantony sajayantony marked this pull request as ready for review February 23, 2026 23:13
Copilot AI review requested due to automatic review settings February 23, 2026 23:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 3f67c33 to 57092d1 Compare February 23, 2026 23:28
Copilot AI review requested due to automatic review settings February 23, 2026 23:29
@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 57092d1 to 0489b9a Compare February 23, 2026 23:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 0489b9a to 14934ba Compare February 23, 2026 23:35
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 95.15152% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.09%. Comparing base (5953fb1) to head (0cbdb13).

Files with missing lines Patch % Lines
...rc/OrasProject.Oras/Content/FetchableExtensions.cs 0.00% 0 Missing and 2 partials ⚠️
.../OrasProject.Oras/Registry/Remote/ManifestStore.cs 90.00% 0 Missing and 2 partials ⚠️
src/OrasProject.Oras/Registry/Remote/Repository.cs 88.88% 0 Missing and 2 partials ⚠️
...oject.Oras/Serialization/OciDictionaryConverter.cs 94.59% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #348      +/-   ##
==========================================
+ Coverage   91.76%   92.09%   +0.33%     
==========================================
  Files          64       67       +3     
  Lines        2755     2897     +142     
  Branches      364      373       +9     
==========================================
+ Hits         2528     2668     +140     
- Misses        138      139       +1     
- Partials       89       90       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 24, 2026 01:21
@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 14934ba to af9d966 Compare February 24, 2026 01:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from af9d966 to cd1e45b Compare February 24, 2026 01:49
Copilot AI review requested due to automatic review settings February 24, 2026 02:00
@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from cd1e45b to f16fe71 Compare February 24, 2026 02:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from f16fe71 to ba708a3 Compare February 24, 2026 02:39
Copilot AI review requested due to automatic review settings February 24, 2026 02:46
@sajayantony
Copy link
Contributor Author

Resolved by Copilot: Added section comments throughout the EscapeJsonString switch: JSON structural characters, Named control character escapes, HTML-sensitive characters (Go escapes these), Unicode line/paragraph separators (Go escapes these), and a default-case comment explaining it escapes remaining control chars (U+0000-U+001F) as \uXXXX while passing all other characters through literally, including +.

@sajayantony
Copy link
Contributor Author

Resolved by Copilot: Good idea. The serialization path is only used for manifests/indexes (bounded at 4 MiB by OciLimits.MaxManifestBytes), so the overhead relative to network I/O should be negligible. That said, adding a benchmark suite would be valuable for tracking regressions — filed as a follow-up concern.

@sajayantony
Copy link
Contributor Author

Resolved by Copilot: Added a comment explaining the relationship — _maxMetadataBytes defaults to OciLimits.MaxManifestBytes because manifests are the primary metadata cached during copy operations. The constant was extracted to OciLimits so both the serializer size check and this default share the same source of truth.

@sajayantony
Copy link
Contributor Author

Resolved by Copilot: Named the tuple elements — (Descriptor Descriptor, byte[] Content). Also added a tuple naming convention to copilot-instructions.md.

@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from d150eec to 1d6f5f4 Compare February 26, 2026 07:18
sajayantony and others added 2 commits February 25, 2026 23:24
…ance

Add learnings from issue oras-project#346 fix:
- All JSON through OciJsonSerializer (no direct JsonSerializer)
- Serialization types are internal
- Domain types must not contain serialization logic
- Test invalid inputs, wire format fixtures, round-trip
- No dead code; no committing local-only tooling

Also trim redundancy: remove duplicate Go→.NET table, collapse
overlapping sections. 128→92 lines.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Sajay Antony <sajaya@microsoft.com>
- Add section comments in EscapeJsonString switch cases
- Name tuple elements in GenerateIndex return type
- Add comment explaining _maxMetadataBytes default
- Add tuple naming convention to copilot-instructions.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Sajay Antony <sajaya@microsoft.com>
@sajayantony sajayantony force-pushed the agent/fix-plus-escaping-346 branch from 1d6f5f4 to d7a9498 Compare February 26, 2026 07:25
Copy link
Collaborator

@akashsinghal akashsinghal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

This review was generated by Copilot using Claude Opus 4.6, mimicking the review style of shizhMSFT. This is not the real shizhMSFT.

Solid fix for a real CAS interoperability problem. The escaping logic matches Go's encoding/json.Marshal semantics correctly, and the converter architecture (separate string + dictionary converters) is the right approach given System.Text.Json's limitation that JsonConverter<string> doesn't intercept dictionary keys. A few observations below.

sajayantony and others added 2 commits March 4, 2026 22:39
- Revert copilot-instructions.md to upstream (split to issue oras-project#356)
- Move GenerateIndex back to Index.cs, call OciJsonSerializer internally
- Add byte-level idempotency assertion in round-trip tests
- Document size-guard materialization behavior in OciJsonSerializer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Sajay Antony <sajaya@microsoft.com>
Copilot AI review requested due to automatic review settings March 5, 2026 07:19
@sajayantony
Copy link
Contributor Author

Addressed all 4 unresolved review threads in commit 6eacd7d:

  1. GenerateIndex in serializer — Moved back to Index.cs. Calls OciJsonSerializer.SerializeToUtf8Bytes internally so the serializer stays focused on encoding only.
  2. Byte-level idempotency test — Round-trip test now asserts serialize-deserialize-serialize produces byte-identical output for CAS compatibility.
  3. copilot-instructions.md rewrite — Reverted to upstream. Filed docs: trim and update .github/copilot-instructions.md #356 to track the rewrite as a separate PR.
  4. Size-guard materialization — Added remarks documenting that SerializeToUtf8Bytes fully materializes before the size check.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +59 to +67
internal static (Descriptor Descriptor, byte[] Content) GenerateIndex(
IList<Descriptor> manifests)
{
var index = new Index(manifests);
var indexContent = JsonSerializer.SerializeToUtf8Bytes(index);
return (Descriptor.Create(indexContent, Oci.MediaType.ImageIndex), indexContent);
var indexContent =
OciJsonSerializer.SerializeToUtf8Bytes(index);
return (
Descriptor.Create(indexContent, Oci.MediaType.ImageIndex),
indexContent);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions Index.GenerateIndex() being removed/moved into the serializer, but this method still exists (now just using OciJsonSerializer). Either update the PR description to match the implementation, or remove/move the method if the intent is to centralize index generation in OciJsonSerializer.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Sajay Antony <sajaya@microsoft.com>
Copilot AI review requested due to automatic review settings March 5, 2026 07:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +71 to +73
var first = true;
foreach (var kvp in value.OrderBy(k => k.Key, StringComparer.Ordinal))
{
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OciDictionaryConverter.Write sorts keys with StringComparer.Ordinal, which compares UTF-16 code units. Go’s encoding/json sorts map keys by Go string order (bytewise over UTF-8), which can differ from UTF-16 ordinal ordering for non-BMP characters (surrogate pairs) and some BMP ranges, potentially reintroducing cross-runtime digest mismatches for non-ASCII annotation keys. Consider sorting by UTF-8 byte sequence (e.g., Encoding.UTF8.GetBytes(key) with lexicographic compare) to match Go’s ordering semantics.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@ridhoq ridhoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Packer.PackManifestAsync escapes '+' in media types as \\u002B

5 participants