Skip to content

MetadataTransferError: isomorphic-git "Invalid checksum in GitIndex buffer" on massive parallel retrieve — 1424/2114 packages fail consistently #3509

@Alfystar

Description

@Alfystar

Summary

We are performing a large-scale metadata download from a production Salesforce org (what we call internally a "full-retrieve"), which splits all org metadata into batches of ~50 members and retrieves them in parallel using sf project retrieve start.

After a ~10-hour run, 1424 out of 2114 retrieve packages fail with the following error — consistently and reproducibly, even when the failed packages are re-run individually days later:

Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f

Context and Background

  • We are intentionally running an older version of the CLI (2.110.3) because upgrading is currently blocked by open bug #3493 — "Using --flags-dir on (third-party) sf plugins does not work", which is still in progress. We are aware a newer version is available; updating is not currently feasible for us.

  • The analysis of this issue required several days because the problem does not reproduce on small metadata slices or in isolation. It only manifests at full-org scale. We were unable to pinpoint it on our CI/CD pipeline and had to run the full retrieve locally over approximately 10 hours of wall-clock time.

  • During the run, we observed that each sf process was consuming close to 4 GB of RAM (this does not happen on other orgs we work with). As a consequence, we limited concurrency to 3 parallel workers to avoid memory exhaustion. We also lowered the per-package timeout from the usual 30 minutes to 10 minutes to reduce total run time while still capturing failures.

  • The execution produced 3.6 GB of plain-text log output, which we had to parse with a custom Python script using mmap to avoid loading the entire file into memory.


Steps To Reproduce

We cannot provide a public repository because the org is a private production environment. However, the pattern is:

  1. Enumerate all metadata members from the org (using sf org list metadata).
  2. Split them into batches of ~50 members per package.xml.
  3. Run sf project retrieve start for each batch in parallel (3 workers), with --wait 10 (10-minute timeout).
  4. Observe that the majority of batches fail with the isomorphic-git checksum error.
  5. Retry any single failed batch individually — the error persists deterministically.

Minimal repro of a failing single-package retrieve (run after the full session):

sf project retrieve start \
  --target-org smartflow \
  --ignore-conflicts \
  --manifest fullRetrieve_Artifact_3/package_ApexClass_29.xml \
  --output-dir fullRetrieve_Artifact_3/retrievePack/dir_087_29 \
  --wait 10

Output:

 ✔ Preparing retrieve request 5ms
 ✔ Sending request to org 155ms
 ✘ Waiting for the org to respond 2.05s
 ◼ Done

 Status: In Progress
 Elapsed Time: 2.22s

Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f

The package ApexClass_29 contains exclusively Apex classes from managed/installed packages (namespace prefix NE__), e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
    <types>
        <members>NE__Bit2Win_Translation_Controller</members>
        <members>NE__Bit2winChangeOfferImplementation</members>
        <members>NE__Bit2winPostInstallUtilities</members>
        <members>NE__Bit2winSyncArchetypesService</members>
        <members>NE__Bit2winTranslationController</members>
        <members>NE__ChangePricingAdministrationExtension</members>
        <members>NE__ChangePricingAdministrationExtensionTest</members>
        <!-- ... more NE__* managed package classes ... -->
        <name>ApexClass</name>
    </types>
</Package>

We note this as a possible clue — however the failure is not limited to managed-package components (see statistics below).


Expected result

All sf project retrieve start invocations should complete successfully and download the requested metadata to the output directory, or fail with a clear, actionable error tied to a specific problematic member.


Actual result

1424 out of 2114 packages (67.4%) fail with the same isomorphic-git checksum error. The failure is deterministic: packages that fail during the full run also fail when retried individually days later.

Aggregate statistics from the full run

File size          : 3.6 GB of raw log output
Total packages     : 2114
  ✔  Successes     :  690  (32.6%)
  ✘  Failures      : 1424  (67.4%)
     ├ retry>=1, then definitively failed  : 1415
     └ retry>=1 only (succeeded eventually):    9

Each failing package was retried up to 3 times by our automation before being marked as a definitive failure.

Top 28 failing packages by total elapsed time

The suffix _N in the metadata name indicates the batch split index (batches of 50 members each):

Metadata                    Retry 0    Retry 1    Retry 2     TOTAL
--------------------------------------------------------------------
Report_575                    5m 0s      5m 0s      5m 0s    15m 0s
Flow_6                       10m 0s      1m 0s      16.4s   11m 16s
CustomApplication_0           3m 0s      2m 0s      3m 0s     8m 0s
Report_541                    2m 0s      2m 0s      2m 0s     6m 0s
Flow_3                        5m 0s      25.1s      17.3s    5m 42s
Flow_10                       4m 0s      22.1s      19.1s    4m 41s
Flow_4                        4m 0s      19.7s      16.9s    4m 36s
Flow_8                        4m 0s      16.1s       7.9s    4m 23s
CustomObject_11               2m 0s      1m 0s      1m 0s     4m 0s
Flow_7                        3m 0s      20.5s      10.8s    3m 31s
Flow_5                        3m 0s      18.9s      12.1s    3m 31s
CustomObject_27               2m 0s      25.6s      34.5s    3m 0s
Report_585                    1m 0s      1m 0s      1m 0s     3m 0s
ExperienceBundle_0            1m 0s      1m 0s      1m 0s     3m 0s
CustomObject_2                1m 0s      1m 0s      58.6s    2m 58s
CustomObject_25               2m 0s      39.9s      18.6s    2m 58s
CustomObject_28               2m 0s      29.5s      19.2s    2m 48s
PermissionSet_3               53.4s      51.2s      1m 0s    2m 44s
CustomObject_31               2m 0s      20.5s      19.6s    2m 40s
CustomObject_17               2m 0s      20.8s      15.5s    2m 36s
CustomObject_7                1m 0s      40.5s      49.3s    2m 29s
CustomObject_26               1m 0s      37.7s      36.4s    2m 14s
CustomObject_16               55.8s      36.4s      28.3s     2m 0s
Report_568                    35.2s      37.9s      40.4s    1m 53s
Report_436                    35.1s      36.8s      38.3s    1m 50s
CustomObject_20               1m 0s      25.2s      21.9s    1m 47s
CustomObject_21               1m 0s      18.6s      27.1s    1m 45s
CustomObject_22               1m 0s      20.4s      22.8s    1m 43s

Observations on affected metadata types

The failures span virtually every metadata category retrieved from this org: Report, Flow, CustomObject, ApexClass, AuraDefinitionBundle, CustomMetadata, CustomApplication, EmailTemplate, ExperienceBundle, PermissionSet, Queue, ConversationMessageDefinition, and more.

Crucially, the same metadata type appears in both the success and failure tables — for example, Flow_1 may succeed while Flow_6 fails. This rules out any single metadata type being the root cause and strongly suggests a server-side or CLI-internal state issue rather than a problem with specific metadata content.

The checksum values in the error are always identical across all failures and all retries:

  • expected: 9c5c572820b3a6b4497ccc765f9516316e9d1f07
  • actual: a792b2660a21e9d230fb3bba873862e8cbcb1e8f

The stability of these two hashes across thousands of independent API calls (different packages, different timing, different retry attempts) strongly suggests this is a corruption or inconsistency in a cached/persisted Git index file on the CLI side, not a transient network or server issue.


Additional information

  • Memory anomaly: during this run, each sf process was consuming ~4 GB of RAM on this specific org. This does not happen with other orgs. We do not know if this is related.
  • No reproduction on pipeline: the issue only manifests at full-org scale. Small subsets retrieve fine.
  • Identical checksums across all failures: the expected and actual SHA1 values in the isomorphic-git error are literally the same string in every single failing invocation across the entire 3.6 GB log. This is not random corruption.
  • Org type: production sandbox with managed packages installed (multiple namespaces including NE__).
  • CLI version in use: 2.110.3 (upgrade blocked by Using --flags-dir on (third-party) sf plugins does not work #3493).

We are happy to provide additional log excerpts, the sf doctor output, or any other diagnostic information that might help narrow down the root cause. We understand this is a complex scenario and we appreciate your time.


System Information

{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.123.1",
  "nodeVersion": "node-v24.13.1",
  "osVersion": "Darwin 25.2.0",
  "rootPath": "/opt/homebrew/lib/node_modules/@salesforce/cli",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.2.35 (core)",
    "@oclif/plugin-commands 4.1.34 (core)",
    "@oclif/plugin-help 6.2.33 (core)",
    "@oclif/plugin-not-found 3.2.68 (core)",
    "@oclif/plugin-plugins 5.4.48 (core)",
    "@oclif/plugin-search 1.2.32 (core)",
    "@oclif/plugin-update 4.7.8 (core)",
    "@oclif/plugin-version 2.2.33 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.48 (core)",
    "@oclif/plugin-which 3.2.40 (core)",
    "@salesforce/cli 2.110.3 (core)",
    "agent 1.24.13 (core)",
    "apex 3.8.3 (core)",
    "api 1.3.3 (core)",
    "auth 3.9.9 (core)",
    "data 4.0.58 (core)",
    "deploy-retrieve 3.23.3 (core)",
    "info 3.4.88 (core)",
    "limits 3.3.67 (core)",
    "marketplace 1.3.8 (core)",
    "org 5.9.32 (core)",
    "packaging 2.20.5 (core)",
    "schema 3.3.82 (core)",
    "settings 2.4.48 (core)",
    "sobject 1.4.73 (core)",
    "telemetry 3.6.58 (core)",
    "templates 56.3.65 (core)",
    "trust 3.7.113 (core)",
    "user 3.6.38 (core)",
    "apex-code-coverage-transformer 2.14.1 (user) published 120 days ago (Mon Oct 27 2025) (latest is 2.17.0)",
    "sfdx-plugin-source-read 1.5.6 (user) published 164 days ago (Sun Sep 14 2025)"
  ]
}

Run sf version --verbose --json and paste the output above before submitting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    investigatingWe're actively investigating this issuevalidatedVersion information for this issue has been validated

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions