[MISC] Always fastcache. by hughperkins · Pull Request #2751 · Genesis-Embodied-AI/Genesis

hughperkins · 2026-04-30T20:49:10Z

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

- Collapse the `_kernel_set_gravity_field` / `_kernel_set_gravity_ndarray` pair into a single `_kernel_set_gravity(..., gravity: qd.Tensor)`. The `qd.Tensor` annotation routes Ndarray args through the ndarray feature path and Field args through the template path, so one kernel covers both backends with no runtime branch. - Broaden the dump/load-checkpoint isinstance guards to also accept `qd.Tensor` wrappers. Today Genesis allocates bare impls via `qd.field` / `qd.ndarray` so no current call site changes behavior; this is a forward-compat superset for a future where `qd.tensor(...)` factory-allocated wrappers start showing up on these attrs. - Update type annotations on `qd_to_python`/`qd_to_torch`/`qd_to_numpy`, `Solver.qpos`, and `array_class.V_ANNOTATION` accordingly. Unchanged: `issubclass(data_type, qd.Field)` zero-copy branch in `qd_to_python` (still bare-impl specific; wrapper-side zerocopy is out of scope for this PR).

Allocate the six Tier-1 constraint-state fields read on every constraint by the linesearch inner loop — Jaref, jv, efc_D, efc_frictionloss, diag, active — via a new V_TENSOR factory that returns qd.Tensor wrappers around the same Field / Ndarray that the existing V() allocator would have produced. The wrapper is unwrapped back to the bare impl by Kernel.__call__ before the JIT cache key is computed (Quadrants stork-19), so the compiled kernel code is identical to today and there is no per-call overhead. Host-side state.Jaref[i_c, i_b] reads continue to short through the wrapper's __getitem__ to impl[i_c, i_b] under the identity layout this commit uses. Phase 1 is plumbing only: identity layout, no behavior change expected. Acceptance is full Genesis unit suite + bench_cluster_wandb flat within ±1% on both gs.use_ndarray={True,False}. Phase 2 will collapse the constraint_layout_transposed static-config flag onto per-tensor layout= and is tracked in the design doc. Bisection escape hatch: GS_TENSOR_BARE_TIER1=1 reverts V_TENSOR to the legacy bare V(...) allocator at process start. Useful if a bench regression turns up and we need to confirm the wrapper is the cause. Plan: perso_hugh/doc/genesis_tensor_migration.md

Quadrants PR Genesis-Embodied-AI#446 renamed @qd.kernel(gpu_graph=) -> @qd.kernel(graph=) in early April. Genesis still passes the old name in _kernel_solve_gpu_graph, which fails on every Quadrants version that includes the rename. Update the call site to use the new name. This is unrelated to the stork-20 Phase-1 wrapper migration; just unblocks Genesis tests on the current Quadrants editable build.

# Conflicts: # genesis/engine/solvers/rigid/constraint/solver_breakdown.py

…-1 fields as qd.Tensor Use qd.tensor(...) directly at the 6 Tier-1 allocation sites, gated by a single _TENSOR_BACKEND module constant. Remove the V_TENSOR wrapper and the GS_TENSOR_BARE_TIER1 bisection fallback (the 21% regression diagnosed against the WandB baseline was an upstream Genesis perf gain we hadn't merged, not a qd.Tensor regression — see perso_hugh/doc/regression_2026apr23_stork_log.md). Annotate the 6 Tier-1 qd.Tensor fields (active, diag, Jaref, efc_frictionloss, efc_D, jv) as qd.Tensor in StructConstraintState. Leave V_ANNOTATION in place everywhere else. No functional change.

Replace every V(dtype=, shape=) allocation with qd.tensor(dtype, shape, backend=_TENSOR_BACKEND), every V_VEC(...) with qd.Vector.tensor(...), and every V_MAT(...) with qd.Matrix.tensor(...). Replace all V_ANNOTATION kernel/func parameter annotations with qd.Tensor across 13 files (solver, collider, abd, path_planning, etc.). Remove the V, V_VEC, V_MAT, V_ANNOTATION definitions from array_class.py. Only DATA_ORIENTED and _TENSOR_BACKEND remain as module-level helpers. Summary: - 431 V() → qd.tensor() - 72 V_VEC() → qd.Vector.tensor() - 3 V_MAT() → qd.Matrix.tensor() - ~550 V_ANNOTATION → qd.Tensor (annotations) - 1 array_class.V() in base_solver.py → qd.tensor()

qd.Tensor as a parameter annotation only works for top-level @qd.kernel args (the template mapper handles unwrapping). For @qd.func parameters called from within kernels, the AST transformer needs qd.template(). Reverts 37 @qd.func parameter annotations from qd.Tensor back to qd.template() across 9 files. Keeps qd.Tensor for @qd.kernel params (9 sites) and struct field annotations (~500 sites).

qd.Tensor as a struct field annotation in @qd.data_oriented classes causes QuadrantsTypeError during AST compilation when the struct is passed to @qd.func. The data_oriented mechanism needs runtime annotations (qd.types.ndarray() or qd.template) to properly handle struct fields. Introduce _STRUCT_FIELD_ANNOTATION (runtime: qd.types.ndarray() or qd.template; TYPE_CHECKING: union type) and use it for all 501 struct field annotations. Allocations remain as qd.tensor() and @qd.kernel parameter annotations remain as qd.Tensor.

All struct field annotations now use qd.Tensor directly. All struct classes now use @dataclasses.dataclass(frozen=True). AutoInitMeta and BASE_METACLASS are removed — dataclasses provides __init__, __eq__, __hash__ natively. Enabled by Quadrants hp/tensor-stork-23-optC-v2 which adds support for qd.Tensor struct fields and frozen-dataclass template args in both FIELD and NDARRAY backends.

…nfig classes StructColliderStaticConfig, StructGJKStaticConfig, and StructRigidSimStaticConfig need both mutability (post-construction assignment) and hashability (passed as kernel template args). @qd.data_oriented provides hashability; _AutoInitMeta generates __init__ from annotations. This matches their original decorators.

qd_to_python uses issubclass checks (e.g. qd.MatrixField) to determine reshaping logic, but qd.Tensor is not a subclass of any Field/Ndarray type. This caused the m=1 reshape for vector fields to be skipped, producing transposed data and numerical mismatches in test_data_accessor. Unwrap qd.Tensor to its raw impl at the entry of each function so all downstream logic works exactly as it did with raw fields/ndarrays.

Remove temporary diagnostic prints added to trace test_data_accessor. Root cause was qd.tensor() returning base Tensor instead of VectorTensor for compound types, fixed in Quadrants.

…into hp/tensor-genesis-2

All parameters that were previously annotated with array_class.V_ANNOTATION and changed to qd.template() are now annotated with qd.Tensor instead, which is the polymorphic annotation that accepts either backend.

qd_to_python already unwraps, and value.shape works on Tensor wrappers.

Keep dtype= and shape= keyword arguments to match the style on main and reduce diff noise.

…fix blank line - Inline _run_cycle() into test_backend_switching (only used once) - Revert _AutoInitMeta back to AutoInitMeta (no visibility change needed) - Restore blank line before return in AutoInitMeta.__new__

Remove the gs._initialized guard from array_class.py (no longer needed since V/V_VEC/V_MAT are now functions that evaluate backend at runtime). Add test_basic_sim_subprocess that runs a fresh genesis import + init + build + step in a subprocess, for both field and ndarray backends.

Fastcache is hardcoded at module load via @qd.kernel(fastcache=...), so it cannot be toggled between init/destroy cycles.

…ing test

Fastcache silently falls back to normal compilation for unsupported parameter types (e.g. qd.field), so it's safe to always enable it. When using ndarray backend, fastcache kicks in; when using field backend, it gracefully degrades with no error. - Replace all @qd.kernel(fastcache=gs.use_fastcache) with fastcache=True - Remove fastcache guard and GS_ENABLE_FASTCACHE env var from gs.init() - Remove GS_ENABLE_FASTCACHE from conftest.py - Remove enable_fastcache parametrization and FIXME from backend switching test

Fastcache is always on now, so remove the enable_fastcache parameter from test_static and test_num_envs, and update expected cache behavior: - src-ll-cache is always used for ndarray - fe-ll-cache is never reached for ndarray (fastcache handles it) - Remove GS_ENABLE_FASTCACHE env var from test_ndarray_no_compile

Fastcache is always on via @qd.kernel(fastcache=True) and silently falls back for unsupported types. No need for a variable or log.

The `if gs.use_ndarray else qd.template()` conditionals are no longer needed: qd.Tensor handles both field and ndarray backends when structs are flattened into kernel args. The init guard is also unnecessary since array_class is only imported after gs.init().

PLACEHOLDER is accessed as a global inside kernels (not passed as a parameter), so it must be a raw field/template-injectable value, not a qd.Tensor wrapper.

PLACEHOLDER was a dummy tensor passed to func_solve_mass_batch's out_bw slot in forward mode. Accessing it as a module global inside pure (fastcache=True) kernels triggered purity violations. Fix: change out_bw annotation to qd.template(), pass None from forward- mode callers. All out_bw accesses are guarded by qd.static(is_backward) so None is never dereferenced. Backward callers still pass real tensors.

When a @qd.data_oriented class kernel passes self.struct_attr to a @qd.func, the caller-side expansion fails because the value is an instance (not a type). Fix by passing entities_info and rigid_global_info as explicit kernel parameters so they go through the normal struct flattening path.

hughperkins · 2026-04-30T23:27:24Z

At this point:

benchmarks run
mostly look ok
except for anymal_uniform_kinematic

Eliminates the CPU regression on anymal_uniform_kinematic by ensuring struct parameters are skipped in the kernel launch_kernel loop (line 490) rather than going through _recursive_set_args for per-field dispatch. qd.template() works correctly with both field and ndarray backends. Made-with: Cursor

…tions" This reverts commit e6b6482.

hughperkins · 2026-05-02T08:34:43Z

Updated benchmarks, with quadrnats main at this point (actually tested on a branch, but that has now been merged to main):

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc2aa450e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T08:41:10Z

+                entities_info=entities_info,
+                rigid_global_info=rigid_global_info,


Thread resolve arguments into island CG gradient solve

When use_contact_island=True and the island solver takes the CG branch, _func_update_gradient calls self._solver.func_solve_mass_batch(...) with entities_info=entities_info and rigid_global_info=rigid_global_info, but those names are not defined in that method scope. This raises an undefined-name failure (also flagged as F821) and breaks the constraint solve path for that configuration; the values need to be passed into _func_update_gradient (or read from self) instead of referenced as free variables.

Useful? React with 👍 / 👎.

hughperkins · 2026-05-02T09:03:44Z

Note: I manually verified that fastcache was working, on both ubuntu, and on Mac.

But probably good that at least one other person does this.

…l chain _func_update_gradient referenced bare `entities_info` and `rigid_global_info` which don't exist in its local scope — they are parameters of the `resolve` kernel but were never threaded through the intermediate @qd.func calls. Co-authored-by: Cursor <[email protected]>

hughperkins added 30 commits April 22, 2026 11:08

Merge remote-tracking branch 'upstream/main' into hp/tensor-genesis-1

6976d8b

# Conflicts: # genesis/engine/solvers/rigid/constraint/solver_breakdown.py

debug: add diagnostic prints to qd_to_python for tensor unwrap

18c898a

debug: add detailed diagnostic prints at comparison point

122b125

debug: use stdout for diagnostic prints

8c42d44

fix: remove debug prints, keep qd.Tensor unwrap fix

4164973

Remove temporary diagnostic prints added to trace test_data_accessor. Root cause was qd.tensor() returning base Tensor instead of VectorTensor for compound types, fixed in Quadrants.

b10

53734a0

Merge branch 'main' into hp/tensor-genesis-2

e95dda6

precommit

50b6ba1

Merge branch 'hp/tensor-genesis-2' of github.com:hughperkins/Genesis …

e07f13f

…into hp/tensor-genesis-2

2b1

10d61c0

Merge remote-tracking branch 'origin/main' into hp/tensor-genesis-2

bc7a593

3b1

0398ea3

0.7.3

5359dac

Use qd.Tensor instead of qd.template() for V_ANNOTATION parameters

d578d5e

All parameters that were previously annotated with array_class.V_ANNOTATION and changed to qd.template() are now annotated with qd.Tensor instead, which is the polymorphic annotation that accepts either backend.

Remove redundant qd.Tensor unwrap in qd_to_numpy

b8b11ca

qd_to_python already unwraps, and value.shape works on Tensor wrappers.

Restore deleted comment on enable_mujoco_multi_contact field

80c9398

Restore inline comments in _AutoInitMeta.__init__

29b9c5a

Revert unnecessary defensive .get() on __annotations__

e768efc

Restore original ordering: PLACEHOLDER, maybe_shape before _AutoInitMeta

ef22e02

Restore @dataclass_transform on _AutoInitMeta for type checker support

09a2824

hughperkins added 13 commits April 29, 2026 13:52

Add FIXME comments for _gravity raw field limitation

871b191

Restore kwargs style for V/V_VEC/V_MAT calls in array_class.py

a33e12a

Keep dtype= and shape= keyword arguments to match the style on main and reduce diff noise.

Restore blank line between imports in sf_solver.py

0d72691

Use explicit eq=True on dataclass structs in array_class.py

68ec7a5

Restore fastcache guard; disable fastcache in backend switching test

3c567e9

Fastcache is hardcoded at module load via @qd.kernel(fastcache=...), so it cannot be toggled between init/destroy cycles.

Parametrize backend switching test over enable_fastcache

bfa7f61

Add FIXME about fastcache=True passing unexpectedly in backend switch…

ecab781

…ing test

Restore gs._initialized guard in array_class.py

fa7a29c

Remove use_fastcache variable and log message

3a11959

Fastcache is always on via @qd.kernel(fastcache=True) and silently falls back for unsupported types. No need for a variable or log.

hughperkins changed the title ~~Hp/always fastcache~~ [MISC] Always fastcache. Apr 30, 2026

hughperkins added 4 commits April 30, 2026 13:52

Use raw qd.field for PLACEHOLDER to avoid fastcache purity violation

c9947fa

PLACEHOLDER is accessed as a global inside kernels (not passed as a parameter), so it must be a raw field/template-injectable value, not a qd.Tensor wrapper.

hughperkins added 2 commits May 1, 2026 01:03

Revert "Use qd.template() unconditionally for struct parameter annota…

cc2aa45

…tions" This reverts commit e6b6482.

hughperkins marked this pull request as ready for review May 2, 2026 08:35

hughperkins requested review from YilingQiao and duburcqa as code owners May 2, 2026 08:35

claude Bot reviewed May 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

hughperkins and others added 2 commits May 2, 2026 07:21

.4

19b4f4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Always fastcache.#2751

[MISC] Always fastcache.#2751
hughperkins wants to merge 67 commits intoGenesis-Embodied-AI:mainfrom
hughperkins:hp/always-fastcache

hughperkins commented Apr 30, 2026

Uh oh!

hughperkins commented Apr 30, 2026

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

claude Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		entities_info=entities_info,
		rigid_global_info=rigid_global_info,

Conversation

hughperkins commented Apr 30, 2026

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

hughperkins commented Apr 30, 2026

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant