MetadataTransferError: isomorphic-git "Invalid checksum in GitIndex buffer" on massive parallel retrieve — 1424/2114 packages fail consistently

### Summary

We are performing a large-scale metadata download from a production Salesforce org (what we call internally a **"full-retrieve"**), which splits all org metadata into batches of ~50 members and retrieves them in parallel using `sf project retrieve start`.

After a ~10-hour run, **1424 out of 2114 retrieve packages** fail with the following error — **consistently and reproducibly**, even when the failed packages are re-run individually days later:

```
Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f
```

---

### Context and Background

- We are intentionally running an **older version of the CLI** (2.110.3) because upgrading is currently blocked by open bug [#3493 — "Using --flags-dir on (third-party) sf plugins does not work"](https://github.com/forcedotcom/cli/issues/3493), which is still in progress. We are aware a newer version is available; updating is not currently feasible for us.

- The analysis of this issue required **several days** because the problem **does not reproduce on small metadata slices or in isolation**. It only manifests at full-org scale. We were unable to pinpoint it on our CI/CD pipeline and had to run the full retrieve **locally** over approximately 10 hours of wall-clock time.

- During the run, we observed that **each `sf` process was consuming close to 4 GB of RAM** (this does not happen on other orgs we work with). As a consequence, we limited concurrency to **3 parallel workers** to avoid memory exhaustion. We also lowered the per-package timeout from the usual 30 minutes to **10 minutes** to reduce total run time while still capturing failures.

- The execution produced **3.6 GB of plain-text log output**, which we had to parse with a custom Python script using `mmap` to avoid loading the entire file into memory.

---

### Steps To Reproduce

We cannot provide a public repository because the org is a private production environment. However, the pattern is:

1. Enumerate all metadata members from the org (using `sf org list metadata`).
2. Split them into batches of ~50 members per `package.xml`.
3. Run `sf project retrieve start` for each batch in parallel (3 workers), with `--wait 10` (10-minute timeout).
4. Observe that the majority of batches fail with the `isomorphic-git` checksum error.
5. Retry any single failed batch individually — the error persists deterministically.

**Minimal repro of a failing single-package retrieve (run after the full session):**

```bash
sf project retrieve start \
  --target-org smartflow \
  --ignore-conflicts \
  --manifest fullRetrieve_Artifact_3/package_ApexClass_29.xml \
  --output-dir fullRetrieve_Artifact_3/retrievePack/dir_087_29 \
  --wait 10
```

Output:
```
 ✔ Preparing retrieve request 5ms
 ✔ Sending request to org 155ms
 ✘ Waiting for the org to respond 2.05s
 ◼ Done

 Status: In Progress
 Elapsed Time: 2.22s

Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f
```

The package `ApexClass_29` contains exclusively Apex classes from **managed/installed packages** (namespace prefix `NE__`), e.g.:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
    <types>
        <members>NE__Bit2Win_Translation_Controller</members>
        <members>NE__Bit2winChangeOfferImplementation</members>
        <members>NE__Bit2winPostInstallUtilities</members>
        <members>NE__Bit2winSyncArchetypesService</members>
        <members>NE__Bit2winTranslationController</members>
        <members>NE__ChangePricingAdministrationExtension</members>
        <members>NE__ChangePricingAdministrationExtensionTest</members>
        
        <name>ApexClass</name>
    </types>
</Package>
```

We note this as a possible clue — however the failure is not limited to managed-package components (see statistics below).

---

### Expected result

All `sf project retrieve start` invocations should complete successfully and download the requested metadata to the output directory, or fail with a clear, actionable error tied to a specific problematic member.

---

### Actual result

**1424 out of 2114 packages (67.4%) fail** with the same `isomorphic-git` checksum error. The failure is **deterministic**: packages that fail during the full run also fail when retried individually days later.

#### Aggregate statistics from the full run

```
File size          : 3.6 GB of raw log output
Total packages     : 2114
  ✔  Successes     :  690  (32.6%)
  ✘  Failures      : 1424  (67.4%)
     ├ retry>=1, then definitively failed  : 1415
     └ retry>=1 only (succeeded eventually):    9
```

Each failing package was retried up to 3 times by our automation before being marked as a definitive failure.

#### Top 28 failing packages by total elapsed time

The suffix `_N` in the metadata name indicates the batch split index (batches of 50 members each):

```
Metadata                    Retry 0    Retry 1    Retry 2     TOTAL
--------------------------------------------------------------------
Report_575                    5m 0s      5m 0s      5m 0s    15m 0s
Flow_6                       10m 0s      1m 0s      16.4s   11m 16s
CustomApplication_0           3m 0s      2m 0s      3m 0s     8m 0s
Report_541                    2m 0s      2m 0s      2m 0s     6m 0s
Flow_3                        5m 0s      25.1s      17.3s    5m 42s
Flow_10                       4m 0s      22.1s      19.1s    4m 41s
Flow_4                        4m 0s      19.7s      16.9s    4m 36s
Flow_8                        4m 0s      16.1s       7.9s    4m 23s
CustomObject_11               2m 0s      1m 0s      1m 0s     4m 0s
Flow_7                        3m 0s      20.5s      10.8s    3m 31s
Flow_5                        3m 0s      18.9s      12.1s    3m 31s
CustomObject_27               2m 0s      25.6s      34.5s    3m 0s
Report_585                    1m 0s      1m 0s      1m 0s     3m 0s
ExperienceBundle_0            1m 0s      1m 0s      1m 0s     3m 0s
CustomObject_2                1m 0s      1m 0s      58.6s    2m 58s
CustomObject_25               2m 0s      39.9s      18.6s    2m 58s
CustomObject_28               2m 0s      29.5s      19.2s    2m 48s
PermissionSet_3               53.4s      51.2s      1m 0s    2m 44s
CustomObject_31               2m 0s      20.5s      19.6s    2m 40s
CustomObject_17               2m 0s      20.8s      15.5s    2m 36s
CustomObject_7                1m 0s      40.5s      49.3s    2m 29s
CustomObject_26               1m 0s      37.7s      36.4s    2m 14s
CustomObject_16               55.8s      36.4s      28.3s     2m 0s
Report_568                    35.2s      37.9s      40.4s    1m 53s
Report_436                    35.1s      36.8s      38.3s    1m 50s
CustomObject_20               1m 0s      25.2s      21.9s    1m 47s
CustomObject_21               1m 0s      18.6s      27.1s    1m 45s
CustomObject_22               1m 0s      20.4s      22.8s    1m 43s
```

#### Observations on affected metadata types

The failures **span virtually every metadata category** retrieved from this org: `Report`, `Flow`, `CustomObject`, `ApexClass`, `AuraDefinitionBundle`, `CustomMetadata`, `CustomApplication`, `EmailTemplate`, `ExperienceBundle`, `PermissionSet`, `Queue`, `ConversationMessageDefinition`, and more.

Crucially, **the same metadata type appears in both the success and failure tables** — for example, `Flow_1` may succeed while `Flow_6` fails. This rules out any single metadata type being the root cause and strongly suggests a **server-side or CLI-internal state issue** rather than a problem with specific metadata content.

The checksum values in the error are **always identical** across all failures and all retries:
- expected: `9c5c572820b3a6b4497ccc765f9516316e9d1f07`
- actual:   `a792b2660a21e9d230fb3bba873862e8cbcb1e8f`

The stability of these two hashes across thousands of independent API calls (different packages, different timing, different retry attempts) strongly suggests this is a **corruption or inconsistency in a cached/persisted Git index file** on the CLI side, not a transient network or server issue.

---

### Additional information

- **Memory anomaly**: during this run, each `sf` process was consuming ~4 GB of RAM on this specific org. This does not happen with other orgs. We do not know if this is related.
- **No reproduction on pipeline**: the issue only manifests at full-org scale. Small subsets retrieve fine.
- **Identical checksums across all failures**: the `expected` and `actual` SHA1 values in the `isomorphic-git` error are literally the same string in every single failing invocation across the entire 3.6 GB log. This is not random corruption.
- **Org type**: production sandbox with managed packages installed (multiple namespaces including `NE__`).
- **CLI version in use**: 2.110.3 (upgrade blocked by #3493).

We are happy to provide additional log excerpts, the `sf doctor` output, or any other diagnostic information that might help narrow down the root cause. We understand this is a complex scenario and we appreciate your time.

---

### System Information

- **Shell**: zsh on macOS
- **CLI version**: 2.110.3 (upgrade intentionally held back due to #3493)

```json
{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.123.1",
  "nodeVersion": "node-v24.13.1",
  "osVersion": "Darwin 25.2.0",
  "rootPath": "/opt/homebrew/lib/node_modules/@salesforce/cli",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.2.35 (core)",
    "@oclif/plugin-commands 4.1.34 (core)",
    "@oclif/plugin-help 6.2.33 (core)",
    "@oclif/plugin-not-found 3.2.68 (core)",
    "@oclif/plugin-plugins 5.4.48 (core)",
    "@oclif/plugin-search 1.2.32 (core)",
    "@oclif/plugin-update 4.7.8 (core)",
    "@oclif/plugin-version 2.2.33 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.48 (core)",
    "@oclif/plugin-which 3.2.40 (core)",
    "@salesforce/cli 2.110.3 (core)",
    "agent 1.24.13 (core)",
    "apex 3.8.3 (core)",
    "api 1.3.3 (core)",
    "auth 3.9.9 (core)",
    "data 4.0.58 (core)",
    "deploy-retrieve 3.23.3 (core)",
    "info 3.4.88 (core)",
    "limits 3.3.67 (core)",
    "marketplace 1.3.8 (core)",
    "org 5.9.32 (core)",
    "packaging 2.20.5 (core)",
    "schema 3.3.82 (core)",
    "settings 2.4.48 (core)",
    "sobject 1.4.73 (core)",
    "telemetry 3.6.58 (core)",
    "templates 56.3.65 (core)",
    "trust 3.7.113 (core)",
    "user 3.6.38 (core)",
    "apex-code-coverage-transformer 2.14.1 (user) published 120 days ago (Mon Oct 27 2025) (latest is 2.17.0)",
    "sfdx-plugin-source-read 1.5.6 (user) published 164 days ago (Sun Sep 14 2025)"
  ]
}
```

> Run `sf version --verbose --json` and paste the output above before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetadataTransferError: isomorphic-git "Invalid checksum in GitIndex buffer" on massive parallel retrieve — 1424/2114 packages fail consistently #3509

Summary

Context and Background

Steps To Reproduce

Expected result

Actual result

Aggregate statistics from the full run

Top 28 failing packages by total elapsed time

Observations on affected metadata types

Additional information

System Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MetadataTransferError: isomorphic-git "Invalid checksum in GitIndex buffer" on massive parallel retrieve — 1424/2114 packages fail consistently #3509

Description

Summary

Context and Background

Steps To Reproduce

Expected result

Actual result

Aggregate statistics from the full run

Top 28 failing packages by total elapsed time

Observations on affected metadata types

Additional information

System Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions