Optimization: replace mat33 with wp.quat, move to quaternion math#902
Optimization: replace mat33 with wp.quat, move to quaternion math#902adenzler-nvidia wants to merge 116 commits intogoogle-deepmind:mainfrom
Conversation
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
|
Given the intention is to remove the xmat fields and replace them with xquat, really want to hear from @kevinzakka and @StafaH - can you both let us know whether this API change will be a big hassle on your end or not too bad? |
|
Downstream we would need to update the code, but it will be a straightforward task, and the performance gains will be worth it. I suggest moving forward with this change. Rendering is a memory heavy task so more compact representations + better memory load/store will have a large impact. |
|
Hi all, we only use |
|
Nice, thanks for the info. @StafaH does the batch renderer use the mujoco camera system? Should I include the change from cam_xmat to cam_xquat as well in this change? Otherwise we can also make it a follow-up, but maybe it's a good idea to batch potential API changes in the same PR. |
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
mujoco_warp/_src/types.py
Outdated
| collision_pairid: ids from broadphase (naconmax, 2) | ||
| collision_worldid: collision world ids from broadphase (naconmax,) | ||
| ncollision: collision count from broadphase (1,) | ||
| xiquat: Cartesian orientation of body inertia (nworld, nbody, 4) |
There was a problem hiding this comment.
nit: looks like shape info column is misaligned by one space
There was a problem hiding this comment.
thanks, fixed.
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
Signed-off-by: Alain Denzler <[email protected]>
|
Now I remember why I didn't change from mat to quat in that 1 place - box_edge failing due to numerical (rounding) issues because floating point math is different now. I adjusted 1 threshold and all the tests are ok now, but I'm not sure if I'm allowed to make that change. |
Signed-off-by: Alain Denzler <[email protected]>
|
@erikfrey should be good to go now! |
|
following some discussion with @yuvaltassa and @kbayes, we are thinking to hold off on merging this pr for now. additional considerations we have in mind:
|
|
Thanks for the update. I do understand the concerns, however I would like to provide a rebuttal:
Happy to discuss further and explain more. |
|
Alain, two questions:
As Taylor said and you agreed, once we have the aligned memory this should be even more pronounced, so we should probably get back to it once we have that feature. |
|
for 1 - I don't think this makes a lot of sense, as it will likely slow down things even more. It might be an option to have a separate kernel that does quat->mat conversion at the end, but in my opinion it's not a good idea to introduce multiple sources of truth into your API. We could make it optional though.
I do see the dilemma of having to change your API, only to reverse that decision potentially once warp gains new features/memory alignment. I don't mind shelving this until we have a better indication of what's going to come. Maybe I'll do a few experiments myself. I stand by my opinion above that memory alignment is not going to significantly change these numbers, and is going to be in favor of quaternions. But I'm happy to be proven wrong. |
|
After testing this change downstream, look like for small scenes (<20 geoms) there won't be any perf gain, but for large scenes there is a big speedup.
~15 geoms Branch: Main: Large scene > 100 geom: Branch: Main: I imported the changes into my fork here: https://github.com/StafaH/mujoco_warp/tree/dev/adenzler/matrices You can run the benchmark from that branch against the main render branch. Values might look different for others, this is from my 3080. The change itself is pretty straightforward. I think we can add in |
Sorry for the big PR. I hope this does not have too many downstream changes.
This is giving us decent speedups in the early parts of the pipeline.
Numbers on an RTX Pro 6000 Blackwell: