-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Summary
We are performing a large-scale metadata download from a production Salesforce org (what we call internally a "full-retrieve"), which splits all org metadata into batches of ~50 members and retrieves them in parallel using sf project retrieve start.
After a ~10-hour run, 1424 out of 2114 retrieve packages fail with the following error — consistently and reproducibly, even when the failed packages are re-run individually days later:
Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f
Context and Background
-
We are intentionally running an older version of the CLI (2.110.3) because upgrading is currently blocked by open bug #3493 — "Using --flags-dir on (third-party) sf plugins does not work", which is still in progress. We are aware a newer version is available; updating is not currently feasible for us.
-
The analysis of this issue required several days because the problem does not reproduce on small metadata slices or in isolation. It only manifests at full-org scale. We were unable to pinpoint it on our CI/CD pipeline and had to run the full retrieve locally over approximately 10 hours of wall-clock time.
-
During the run, we observed that each
sfprocess was consuming close to 4 GB of RAM (this does not happen on other orgs we work with). As a consequence, we limited concurrency to 3 parallel workers to avoid memory exhaustion. We also lowered the per-package timeout from the usual 30 minutes to 10 minutes to reduce total run time while still capturing failures. -
The execution produced 3.6 GB of plain-text log output, which we had to parse with a custom Python script using
mmapto avoid loading the entire file into memory.
Steps To Reproduce
We cannot provide a public repository because the org is a private production environment. However, the pattern is:
- Enumerate all metadata members from the org (using
sf org list metadata). - Split them into batches of ~50 members per
package.xml. - Run
sf project retrieve startfor each batch in parallel (3 workers), with--wait 10(10-minute timeout). - Observe that the majority of batches fail with the
isomorphic-gitchecksum error. - Retry any single failed batch individually — the error persists deterministically.
Minimal repro of a failing single-package retrieve (run after the full session):
sf project retrieve start \
--target-org smartflow \
--ignore-conflicts \
--manifest fullRetrieve_Artifact_3/package_ApexClass_29.xml \
--output-dir fullRetrieve_Artifact_3/retrievePack/dir_087_29 \
--wait 10Output:
✔ Preparing retrieve request 5ms
✔ Sending request to org 155ms
✘ Waiting for the org to respond 2.05s
◼ Done
Status: In Progress
Elapsed Time: 2.22s
Error (MetadataTransferError): Metadata API request failed: An internal error caused this command to fail. isomorphic-git error:
Invalid checksum in GitIndex buffer: expected 9c5c572820b3a6b4497ccc765f9516316e9d1f07 but saw a792b2660a21e9d230fb3bba873862e8cbcb1e8f
The package ApexClass_29 contains exclusively Apex classes from managed/installed packages (namespace prefix NE__), e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>NE__Bit2Win_Translation_Controller</members>
<members>NE__Bit2winChangeOfferImplementation</members>
<members>NE__Bit2winPostInstallUtilities</members>
<members>NE__Bit2winSyncArchetypesService</members>
<members>NE__Bit2winTranslationController</members>
<members>NE__ChangePricingAdministrationExtension</members>
<members>NE__ChangePricingAdministrationExtensionTest</members>
<!-- ... more NE__* managed package classes ... -->
<name>ApexClass</name>
</types>
</Package>We note this as a possible clue — however the failure is not limited to managed-package components (see statistics below).
Expected result
All sf project retrieve start invocations should complete successfully and download the requested metadata to the output directory, or fail with a clear, actionable error tied to a specific problematic member.
Actual result
1424 out of 2114 packages (67.4%) fail with the same isomorphic-git checksum error. The failure is deterministic: packages that fail during the full run also fail when retried individually days later.
Aggregate statistics from the full run
File size : 3.6 GB of raw log output
Total packages : 2114
✔ Successes : 690 (32.6%)
✘ Failures : 1424 (67.4%)
├ retry>=1, then definitively failed : 1415
└ retry>=1 only (succeeded eventually): 9
Each failing package was retried up to 3 times by our automation before being marked as a definitive failure.
Top 28 failing packages by total elapsed time
The suffix _N in the metadata name indicates the batch split index (batches of 50 members each):
Metadata Retry 0 Retry 1 Retry 2 TOTAL
--------------------------------------------------------------------
Report_575 5m 0s 5m 0s 5m 0s 15m 0s
Flow_6 10m 0s 1m 0s 16.4s 11m 16s
CustomApplication_0 3m 0s 2m 0s 3m 0s 8m 0s
Report_541 2m 0s 2m 0s 2m 0s 6m 0s
Flow_3 5m 0s 25.1s 17.3s 5m 42s
Flow_10 4m 0s 22.1s 19.1s 4m 41s
Flow_4 4m 0s 19.7s 16.9s 4m 36s
Flow_8 4m 0s 16.1s 7.9s 4m 23s
CustomObject_11 2m 0s 1m 0s 1m 0s 4m 0s
Flow_7 3m 0s 20.5s 10.8s 3m 31s
Flow_5 3m 0s 18.9s 12.1s 3m 31s
CustomObject_27 2m 0s 25.6s 34.5s 3m 0s
Report_585 1m 0s 1m 0s 1m 0s 3m 0s
ExperienceBundle_0 1m 0s 1m 0s 1m 0s 3m 0s
CustomObject_2 1m 0s 1m 0s 58.6s 2m 58s
CustomObject_25 2m 0s 39.9s 18.6s 2m 58s
CustomObject_28 2m 0s 29.5s 19.2s 2m 48s
PermissionSet_3 53.4s 51.2s 1m 0s 2m 44s
CustomObject_31 2m 0s 20.5s 19.6s 2m 40s
CustomObject_17 2m 0s 20.8s 15.5s 2m 36s
CustomObject_7 1m 0s 40.5s 49.3s 2m 29s
CustomObject_26 1m 0s 37.7s 36.4s 2m 14s
CustomObject_16 55.8s 36.4s 28.3s 2m 0s
Report_568 35.2s 37.9s 40.4s 1m 53s
Report_436 35.1s 36.8s 38.3s 1m 50s
CustomObject_20 1m 0s 25.2s 21.9s 1m 47s
CustomObject_21 1m 0s 18.6s 27.1s 1m 45s
CustomObject_22 1m 0s 20.4s 22.8s 1m 43s
Observations on affected metadata types
The failures span virtually every metadata category retrieved from this org: Report, Flow, CustomObject, ApexClass, AuraDefinitionBundle, CustomMetadata, CustomApplication, EmailTemplate, ExperienceBundle, PermissionSet, Queue, ConversationMessageDefinition, and more.
Crucially, the same metadata type appears in both the success and failure tables — for example, Flow_1 may succeed while Flow_6 fails. This rules out any single metadata type being the root cause and strongly suggests a server-side or CLI-internal state issue rather than a problem with specific metadata content.
The checksum values in the error are always identical across all failures and all retries:
- expected:
9c5c572820b3a6b4497ccc765f9516316e9d1f07 - actual:
a792b2660a21e9d230fb3bba873862e8cbcb1e8f
The stability of these two hashes across thousands of independent API calls (different packages, different timing, different retry attempts) strongly suggests this is a corruption or inconsistency in a cached/persisted Git index file on the CLI side, not a transient network or server issue.
Additional information
- Memory anomaly: during this run, each
sfprocess was consuming ~4 GB of RAM on this specific org. This does not happen with other orgs. We do not know if this is related. - No reproduction on pipeline: the issue only manifests at full-org scale. Small subsets retrieve fine.
- Identical checksums across all failures: the
expectedandactualSHA1 values in theisomorphic-giterror are literally the same string in every single failing invocation across the entire 3.6 GB log. This is not random corruption. - Org type: production sandbox with managed packages installed (multiple namespaces including
NE__). - CLI version in use: 2.110.3 (upgrade blocked by Using --flags-dir on (third-party) sf plugins does not work #3493).
We are happy to provide additional log excerpts, the sf doctor output, or any other diagnostic information that might help narrow down the root cause. We understand this is a complex scenario and we appreciate your time.
System Information
- Shell: zsh on macOS
- CLI version: 2.110.3 (upgrade intentionally held back due to Using --flags-dir on (third-party) sf plugins does not work #3493)
{
"architecture": "darwin-arm64",
"cliVersion": "@salesforce/cli/2.123.1",
"nodeVersion": "node-v24.13.1",
"osVersion": "Darwin 25.2.0",
"rootPath": "/opt/homebrew/lib/node_modules/@salesforce/cli",
"shell": "zsh",
"pluginVersions": [
"@oclif/plugin-autocomplete 3.2.35 (core)",
"@oclif/plugin-commands 4.1.34 (core)",
"@oclif/plugin-help 6.2.33 (core)",
"@oclif/plugin-not-found 3.2.68 (core)",
"@oclif/plugin-plugins 5.4.48 (core)",
"@oclif/plugin-search 1.2.32 (core)",
"@oclif/plugin-update 4.7.8 (core)",
"@oclif/plugin-version 2.2.33 (core)",
"@oclif/plugin-warn-if-update-available 3.1.48 (core)",
"@oclif/plugin-which 3.2.40 (core)",
"@salesforce/cli 2.110.3 (core)",
"agent 1.24.13 (core)",
"apex 3.8.3 (core)",
"api 1.3.3 (core)",
"auth 3.9.9 (core)",
"data 4.0.58 (core)",
"deploy-retrieve 3.23.3 (core)",
"info 3.4.88 (core)",
"limits 3.3.67 (core)",
"marketplace 1.3.8 (core)",
"org 5.9.32 (core)",
"packaging 2.20.5 (core)",
"schema 3.3.82 (core)",
"settings 2.4.48 (core)",
"sobject 1.4.73 (core)",
"telemetry 3.6.58 (core)",
"templates 56.3.65 (core)",
"trust 3.7.113 (core)",
"user 3.6.38 (core)",
"apex-code-coverage-transformer 2.14.1 (user) published 120 days ago (Mon Oct 27 2025) (latest is 2.17.0)",
"sfdx-plugin-source-read 1.5.6 (user) published 164 days ago (Sun Sep 14 2025)"
]
}Run
sf version --verbose --jsonand paste the output above before submitting.