Skip to content

perf(zero_trust_list): optimize large list performance with hash-base…#6901

Open
rexscaria wants to merge 12 commits intocloudflare:mainfrom
rexscaria:perf/zero-trust-list-items-optimization
Open

perf(zero_trust_list): optimize large list performance with hash-base…#6901
rexscaria wants to merge 12 commits intocloudflare:mainfrom
rexscaria:perf/zero-trust-list-items-optimization

Conversation

@rexscaria
Copy link
Copy Markdown
Contributor

@rexscaria rexscaria commented Mar 10, 2026

ZT Lists with 2000+ items were taking 70+ seconds to plan due to Terraform's expensive set comparison for nested objects. This optimization:

We have to use set for lists to make sure we don;t have duplicates. But set comparison is painfully slower than list comparison in terraform. The comparison is happening in the set internals library, same place as the performance bottleneck is observed. We do not want to make them list and introduce bugs. Instead we are suggesting a new modifyPlan with hash based comparison of the items.

  1. In Read(): Preserve state items instead of replacing with API items
  2. In ModifyPlan(): Compare items using fast SHA256 hash (~0.5ms for 5000 items) instead of Terraform's set diff (~70s for 2000 items)

Performance improvement: ~70,000x faster for no-change plans on large lists.

Trade-off: External drift detection on individual items is reduced, but Terraform config remains the source of truth.

Tests added. I can add an acceptance test ASAP

@rexscaria rexscaria requested a review from a team as a code owner March 10, 2026 22:28
…d comparison

Lists with 2000+ items were taking 70+ seconds to plan due to Terraform's
expensive set comparison for nested objects. This optimization:

1. In Read(): Preserve state items instead of replacing with API items
2. In ModifyPlan(): Compare items using fast SHA256 hash (~0.5ms for 5000
   items) instead of Terraform's set diff (~70s for 2000 items)

Performance improvement: ~70,000x faster for no-change plans on large lists.

Trade-off: External drift detection on individual items is reduced, but
Terraform config remains the source of truth.
@rexscaria rexscaria force-pushed the perf/zero-trust-list-items-optimization branch from c3e165a to 568b570 Compare March 10, 2026 22:35
Rex Scaria added 11 commits March 24, 2026 10:45
- Fix Read() to fetch items via GET /gateway/lists/{id}/items so external
  drift is detectable; removes the existingItems state-preservation workaround
- Fix items-fetch error handling to only remove resource from state on 404;
  transient errors (timeouts, 5xx, 429) now surface as proper diagnostics
- Fix Description field to use types.StringNull() when absent in API response,
  preventing perpetual diffs on ImportStateVerify
- Fix ModifyPlan to guard against null state (first create) in addition to
  null plan (destroy)
- Fix Delete to remove spurious resp.State.Set() call (no-op in framework)
- Fix all io.ReadAll errors silently discarded with _
- Fix ListExact/ListSizeExact -> SetExact/SetSizeExact in tests (items is a
  SetNestedAttribute; integer index paths don't apply)
- Fix TestAccCloudflareTeamsList_Update to use ConfigStateChecks + SetExact
  instead of broken items.0.value index-based checks
- Replace null-byte separator in computeItemsHash with length-prefixed
  encoding to eliminate field/entry boundary collision classes
- Add modify_plan_test.go with unit tests for all ModifyPlan paths
- Add buildPlan/buildNullPlan helpers; remove reuse of tfsdk.State for plans
- Add teamslistconfig2000items.tf dedicated test template
- Strengthen SameItems/ReorderedItems test assertions to verify plan raw
  equals state raw after suppression, not just non-nil
…elpers

Reduces repeated boilerplate across Create/Update/Read/ImportState without
changing any behaviour.
Move all hand-written logic (fetchItems, suppressItemsDiff, computeItemsHash,
helpers) into items.go so it is clearly separated from the generated resource.go.

resource.go now has two minimal touch points:
  - Read()       calls fetchItems() — one line
  - ModifyPlan() delegates to suppressItemsDiff() — one line

This makes the generated file easy to re-apply on future regenerations without
losing any of our custom behaviour.
An empty list (0 items) and a deleted list (404) both previously returned
(nil, nil) from fetchItems, causing Read() to call RemoveResource on any
list that had no items configured.

Fix: fetchItems now always returns a non-nil slice on success
(empty slice = list exists with 0 items, nil = 404/deleted).

Also:
- Move file-level godoc comment above package declaration
- Correct O(n²) complexity claim to O(n × cost-per-element)
- readBody now defers Body.Close() — intentional improvement over generator
cloudflare#1 computeItemsHash: null and empty string now hash differently. Previously
   types.String.ValueString() collapsed both to "", silently suppressing a
   diff when a user changed description from null to "". Now encoded with
   explicit type tags: null -> "null", string -> "s:<value>".

cloudflare#2 computeItemsHash: nil item now encodes as "nil" sentinel instead of ""
   (zero value), preventing collision with a real item whose value and
   description are both empty.

cloudflare#3 gatewayItemDescription: also check IsMissing() so absent JSON fields are
   stored as StringNull() rather than StringValue(""), avoiding perpetual
   plan diffs for configs that omit description.

cloudflare#4 readBody moved back to resource.go where it is used. Removed dead code
   and unjustified io/net/http imports from items.go.

cloudflare#5 suppressItemsDiff: replace resp.Plan.Set(ctx, plan) with
   resp.Plan.SetAttribute(ctx, path.Root("items"), ...) to surgically
   update only the items attribute without clobbering unknown computed
   values in other plan fields.

Update hash_test.go to reflect the corrected semantics: null != empty string,
nil item != empty-value item.
…nation assertion

cloudflare#1 suppressItemsDiff: bail out before hashing if any plan item contains an
   unknown Value or Description. Previously, unknown fields collapsed to ""
   or "null" via ValueString(), which could match state and suppress a real
   diff when items reference computed outputs from other resources.

cloudflare#2 gatewayItemDescription: remove dead IsMissing() check. IsNull() already
   covers missing fields (status <= null), so the extra check was unreachable
   and could mislead future readers about field semantics.

cloudflare#3 fetchItems: assert len(page.Result) <= 1 and return an error if violated,
   rather than silently dropping page.Result[1:] and corrupting state.
…nostics

cloudflare#1 computeItemsHash: null/unknown/empty now all have distinct type tags.
   Previously unknown collapsed to empty string (value) or "null" (desc),
   enabling false-positive diff suppression when items reference computed
   outputs. Encoding is now v:null/v:unknown/v(n):<val> and d:null/d:unknown/d(n):<val>.

cloudflare#2 suppressItemsDiff: use local diag.Diagnostics for Plan.Get/State.Get so
   deserialization failures in this best-effort hook do not surface as
   provider errors in the plan response.

cloudflare#3 suppressItemsDiff: add !IsKnown() guard so the function exits early when
   the plan or state is in an unknown state (resource ID not yet computed).

cloudflare#4 fetchItems: check page.JSON.Result.IsNull() and return an error when the
   API returns a null result field rather than treating it as 0 items.

Update hash_test.go: fix concatenation collision test to use new encoding
format; add unknown vs null tests for value and description.
…gination guard

- A null page.JSON.Result now correctly returns an empty items slice (list
  exists with 0 items) instead of an error. The Cloudflare API may return
  null for result on an empty list, which is valid.

- Removed the len(page.Result) > 1 guard. SinglePage[[]GatewayItem].Result
  is [][]GatewayItem where the outer slice reflects the API response
  structure, not pagination pages. The guard was semantically wrong and
  would false-error on valid responses. The inner loop now directly iterates
  page.Result[0] guarded by the null + length check.
…-element guard

cloudflare#1 fetchItems: iterate all outer Result batches instead of only Result[0].
   Silent data loss if API ever returns more than one outer element.

cloudflare#2 suppressItemsDiff: treat nil Items pointer as equivalent to empty slice
   so configs with items=[] and state with nil items are compared correctly.

cloudflare#3 suppressItemsDiff: filter nil elements from stateItems before SetAttribute
   to avoid potential panic in the framework's slice reflection.

cloudflare#4 fetchItems: remove redundant page.JSON.Result.IsNull() check — it is
   implied by len(page.Result) > 0 since a null field decodes to nil/empty.
…skipping in hash

cloudflare#1 suppressItemsDiff: early-return when both plan.Items and state.Items are
   nil (items not configured at all), preventing the function from writing an
   empty items=[] into the plan and causing perpetual diffs for configs that
   omit the items block entirely.

cloudflare#2 computeItemsHash: skip nil elements instead of encoding them as 'nil',
   matching the nil-stripping behavior in suppressItemsDiff. Previously the
   hash included nil tokens but the written plan value did not, creating a
   latent inconsistency in nil semantics between the two functions.
…Description

apijson.Field has four statuses: missing(0), null(1), invalid(2), valid(3).
IsNull() only covers status <= null, missing the invalid case where the field
is present in the JSON response but failed to decode.

When status == invalid, item.Description is the Go zero value "". Without
this fix, an invalid field is stored in state as "" instead of null, causing
a perpetual spurious diff that hash suppression cannot resolve.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant