Skip to content

[MISC] Always fastcache.#2751

Open
hughperkins wants to merge 67 commits intoGenesis-Embodied-AI:mainfrom
hughperkins:hp/always-fastcache
Open

[MISC] Always fastcache.#2751
hughperkins wants to merge 67 commits intoGenesis-Embodied-AI:mainfrom
hughperkins:hp/always-fastcache

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

- Collapse the `_kernel_set_gravity_field` / `_kernel_set_gravity_ndarray`
  pair into a single `_kernel_set_gravity(..., gravity: qd.Tensor)`.
  The `qd.Tensor` annotation routes Ndarray args through the ndarray
  feature path and Field args through the template path, so one kernel
  covers both backends with no runtime branch.

- Broaden the dump/load-checkpoint isinstance guards to also accept
  `qd.Tensor` wrappers. Today Genesis allocates bare impls via
  `qd.field` / `qd.ndarray` so no current call site changes behavior;
  this is a forward-compat superset for a future where `qd.tensor(...)`
  factory-allocated wrappers start showing up on these attrs.

- Update type annotations on `qd_to_python`/`qd_to_torch`/`qd_to_numpy`,
  `Solver.qpos`, and `array_class.V_ANNOTATION` accordingly.

Unchanged: `issubclass(data_type, qd.Field)` zero-copy branch in
`qd_to_python` (still bare-impl specific; wrapper-side zerocopy is
out of scope for this PR).
Allocate the six Tier-1 constraint-state fields read on every
constraint by the linesearch inner loop — Jaref, jv, efc_D,
efc_frictionloss, diag, active — via a new V_TENSOR factory that
returns qd.Tensor wrappers around the same Field / Ndarray that the
existing V() allocator would have produced.

The wrapper is unwrapped back to the bare impl by Kernel.__call__
before the JIT cache key is computed (Quadrants stork-19), so the
compiled kernel code is identical to today and there is no per-call
overhead. Host-side state.Jaref[i_c, i_b] reads continue to short
through the wrapper's __getitem__ to impl[i_c, i_b] under the
identity layout this commit uses.

Phase 1 is plumbing only: identity layout, no behavior change
expected. Acceptance is full Genesis unit suite + bench_cluster_wandb
flat within ±1% on both gs.use_ndarray={True,False}. Phase 2 will
collapse the constraint_layout_transposed static-config flag onto
per-tensor layout= and is tracked in the design doc.

Bisection escape hatch: GS_TENSOR_BARE_TIER1=1 reverts V_TENSOR to
the legacy bare V(...) allocator at process start. Useful if a bench
regression turns up and we need to confirm the wrapper is the cause.

Plan: perso_hugh/doc/genesis_tensor_migration.md
Quadrants PR Genesis-Embodied-AI#446 renamed @qd.kernel(gpu_graph=) -> @qd.kernel(graph=)
in early April. Genesis still passes the old name in
_kernel_solve_gpu_graph, which fails on every Quadrants version that
includes the rename. Update the call site to use the new name.

This is unrelated to the stork-20 Phase-1 wrapper migration; just
unblocks Genesis tests on the current Quadrants editable build.
# Conflicts:
#	genesis/engine/solvers/rigid/constraint/solver_breakdown.py
…-1 fields as qd.Tensor

Use qd.tensor(...) directly at the 6 Tier-1 allocation sites, gated by
a single _TENSOR_BACKEND module constant. Remove the V_TENSOR wrapper
and the GS_TENSOR_BARE_TIER1 bisection fallback (the 21% regression
diagnosed against the WandB baseline was an upstream Genesis perf gain
we hadn't merged, not a qd.Tensor regression — see
perso_hugh/doc/regression_2026apr23_stork_log.md).

Annotate the 6 Tier-1 qd.Tensor fields (active, diag, Jaref,
efc_frictionloss, efc_D, jv) as qd.Tensor in StructConstraintState.
Leave V_ANNOTATION in place everywhere else.

No functional change.
Replace every V(dtype=, shape=) allocation with qd.tensor(dtype, shape,
backend=_TENSOR_BACKEND), every V_VEC(...) with qd.Vector.tensor(...),
and every V_MAT(...) with qd.Matrix.tensor(...).

Replace all V_ANNOTATION kernel/func parameter annotations with qd.Tensor
across 13 files (solver, collider, abd, path_planning, etc.).

Remove the V, V_VEC, V_MAT, V_ANNOTATION definitions from array_class.py.
Only DATA_ORIENTED and _TENSOR_BACKEND remain as module-level helpers.

Summary:
- 431 V() → qd.tensor()
- 72 V_VEC() → qd.Vector.tensor()
- 3 V_MAT() → qd.Matrix.tensor()
- ~550 V_ANNOTATION → qd.Tensor (annotations)
- 1 array_class.V() in base_solver.py → qd.tensor()
qd.Tensor as a parameter annotation only works for top-level @qd.kernel
args (the template mapper handles unwrapping). For @qd.func parameters
called from within kernels, the AST transformer needs qd.template().

Reverts 37 @qd.func parameter annotations from qd.Tensor back to
qd.template() across 9 files. Keeps qd.Tensor for @qd.kernel params
(9 sites) and struct field annotations (~500 sites).
qd.Tensor as a struct field annotation in @qd.data_oriented classes
causes QuadrantsTypeError during AST compilation when the struct is
passed to @qd.func. The data_oriented mechanism needs runtime annotations
(qd.types.ndarray() or qd.template) to properly handle struct fields.

Introduce _STRUCT_FIELD_ANNOTATION (runtime: qd.types.ndarray() or
qd.template; TYPE_CHECKING: union type) and use it for all 501 struct
field annotations. Allocations remain as qd.tensor() and @qd.kernel
parameter annotations remain as qd.Tensor.
All struct field annotations now use qd.Tensor directly.
All struct classes now use @dataclasses.dataclass(frozen=True).
AutoInitMeta and BASE_METACLASS are removed — dataclasses
provides __init__, __eq__, __hash__ natively.

Enabled by Quadrants hp/tensor-stork-23-optC-v2 which adds
support for qd.Tensor struct fields and frozen-dataclass
template args in both FIELD and NDARRAY backends.
…nfig classes

StructColliderStaticConfig, StructGJKStaticConfig, and
StructRigidSimStaticConfig need both mutability (post-construction
assignment) and hashability (passed as kernel template args).
@qd.data_oriented provides hashability; _AutoInitMeta generates
__init__ from annotations. This matches their original decorators.
qd_to_python uses issubclass checks (e.g. qd.MatrixField) to determine
reshaping logic, but qd.Tensor is not a subclass of any Field/Ndarray
type. This caused the m=1 reshape for vector fields to be skipped,
producing transposed data and numerical mismatches in test_data_accessor.

Unwrap qd.Tensor to its raw impl at the entry of each function so all
downstream logic works exactly as it did with raw fields/ndarrays.
Remove temporary diagnostic prints added to trace test_data_accessor.
Root cause was qd.tensor() returning base Tensor instead of
VectorTensor for compound types, fixed in Quadrants.
All parameters that were previously annotated with array_class.V_ANNOTATION
and changed to qd.template() are now annotated with qd.Tensor instead,
which is the polymorphic annotation that accepts either backend.
qd_to_python already unwraps, and value.shape works on Tensor wrappers.
Keep dtype= and shape= keyword arguments to match the style on main
and reduce diff noise.
…fix blank line

- Inline _run_cycle() into test_backend_switching (only used once)
- Revert _AutoInitMeta back to AutoInitMeta (no visibility change needed)
- Restore blank line before return in AutoInitMeta.__new__
Remove the gs._initialized guard from array_class.py (no longer needed
since V/V_VEC/V_MAT are now functions that evaluate backend at runtime).

Add test_basic_sim_subprocess that runs a fresh genesis import + init +
build + step in a subprocess, for both field and ndarray backends.
Fastcache is hardcoded at module load via @qd.kernel(fastcache=...),
so it cannot be toggled between init/destroy cycles.
Fastcache silently falls back to normal compilation for unsupported
parameter types (e.g. qd.field), so it's safe to always enable it.
When using ndarray backend, fastcache kicks in; when using field
backend, it gracefully degrades with no error.

- Replace all @qd.kernel(fastcache=gs.use_fastcache) with fastcache=True
- Remove fastcache guard and GS_ENABLE_FASTCACHE env var from gs.init()
- Remove GS_ENABLE_FASTCACHE from conftest.py
- Remove enable_fastcache parametrization and FIXME from backend switching test
Fastcache is always on now, so remove the enable_fastcache parameter
from test_static and test_num_envs, and update expected cache behavior:
- src-ll-cache is always used for ndarray
- fe-ll-cache is never reached for ndarray (fastcache handles it)
- Remove GS_ENABLE_FASTCACHE env var from test_ndarray_no_compile
Fastcache is always on via @qd.kernel(fastcache=True) and silently
falls back for unsupported types. No need for a variable or log.
@hughperkins hughperkins changed the title Hp/always fastcache [MISC] Always fastcache. Apr 30, 2026
The `if gs.use_ndarray else qd.template()` conditionals are no longer
needed: qd.Tensor handles both field and ndarray backends when structs
are flattened into kernel args. The init guard is also unnecessary since
array_class is only imported after gs.init().
PLACEHOLDER is accessed as a global inside kernels (not passed as a
parameter), so it must be a raw field/template-injectable value, not a
qd.Tensor wrapper.
PLACEHOLDER was a dummy tensor passed to func_solve_mass_batch's out_bw
slot in forward mode. Accessing it as a module global inside pure
(fastcache=True) kernels triggered purity violations.

Fix: change out_bw annotation to qd.template(), pass None from forward-
mode callers. All out_bw accesses are guarded by qd.static(is_backward)
so None is never dereferenced. Backward callers still pass real tensors.
When a @qd.data_oriented class kernel passes self.struct_attr to a
@qd.func, the caller-side expansion fails because the value is an
instance (not a type). Fix by passing entities_info and rigid_global_info
as explicit kernel parameters so they go through the normal struct
flattening path.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

At this point:

  • benchmarks run
  • mostly look ok
  • except for anymal_uniform_kinematic
2026-04-30-1826-always-fastcache

Eliminates the CPU regression on anymal_uniform_kinematic by ensuring
struct parameters are skipped in the kernel launch_kernel loop (line 490)
rather than going through _recursive_set_args for per-field dispatch.

qd.template() works correctly with both field and ndarray backends.

Made-with: Cursor
@hughperkins
Copy link
Copy Markdown
Collaborator Author

Updated benchmarks, with quadrnats main at this point (actually tested on a branch, but that has now been merged to main):

20260501-1517-non-field-plan-shortcut

@hughperkins hughperkins marked this pull request as ready for review May 2, 2026 08:35
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc2aa450e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +981 to +982
entities_info=entities_info,
rigid_global_info=rigid_global_info,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Thread resolve arguments into island CG gradient solve

When use_contact_island=True and the island solver takes the CG branch, _func_update_gradient calls self._solver.func_solve_mass_batch(...) with entities_info=entities_info and rigid_global_info=rigid_global_info, but those names are not defined in that method scope. This raises an undefined-name failure (also flagged as F821) and breaks the constraint solve path for that configuration; the values need to be passed into _func_update_gradient (or read from self) instead of referenced as free variables.

Useful? React with 👍 / 👎.

@hughperkins
Copy link
Copy Markdown
Collaborator Author

Note: I manually verified that fastcache was working, on both ubuntu, and on Mac.

But probably good that at least one other person does this.

hughperkins and others added 2 commits May 2, 2026 07:21
…l chain

_func_update_gradient referenced bare `entities_info` and `rigid_global_info`
which don't exist in its local scope — they are parameters of the `resolve`
kernel but were never threaded through the intermediate @qd.func calls.

Co-authored-by: Cursor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant