Skip to content

Comments

pre-computed and kind=auto dist visualizations protype#310

Open
OriolAbril wants to merge 3 commits intomainfrom
dist_compute_helper
Open

pre-computed and kind=auto dist visualizations protype#310
OriolAbril wants to merge 3 commits intomainfrom
dist_compute_helper

Conversation

@OriolAbril
Copy link
Member

@OriolAbril OriolAbril commented Jul 23, 2025

Rough idea shared on slack, I will update the description as the idea and implementation gets more clear.


This should close #299, close #40 and also close #149.


📚 Documentation preview 📚: https://arviz-plots--310.org.readthedocs.build/en/310/

@OriolAbril OriolAbril force-pushed the dist_compute_helper branch from 67070c5 to 8153402 Compare July 23, 2025 17:33
@aloctavodia
Copy link
Contributor

@OriolAbril, what is the status of this? Anything I could do to help?

@OriolAbril
Copy link
Member Author

I need to prioritize it over the rest of things. This one is probably more important but doesn't feel so urgent which is something I always struggle with


# point estimate text
if pet_kwargs is not False:
# TODO: handle multiple kinds
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did but I am still not sure if past me was a bit dumb or too smart for present me to understand.

My current reasoning. When we do the dim for dim in density.sel(plot_axis="y").dims if dim not in point.dims I don't think there is any scenario where kde_dim or hist_dim_{var_name} so that will already contain all the dimensions we want to reduce independently of the kind. The explicit addition should be redundant I think? But maybe I am missing some edge case that made us go for the explicit alternative?

Note: we can now have y values in either the "y" or "histogram" coordinate value, depending on the variable the values will be in one of them (and only one) and the other will be nan. With xarray ignoring nans by default, keeping any of these two coords that are present plus the dim filter I mention above we reduce plot_axis too (unless it were in point data which again I see no case where it could happen).

Comment on lines +139 to +142
active_dims : list
Dimensions that have either faceting or aesthetic mappings
active for that visual. Should not be reduced and should have
a groupby performed on them if computing summaries.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this variant because it is needed to know when we need a groupby as these happen on dimensions we are faceting on or have aesthetics mapped but also repeated coord values. In those cases we'll reduce that dimension through a groupby so we end up with that dimension still present but with unique coord values now.

def _compute_viz_for_subset(viz, data, var_names, active_dims, reduce_dims, kwargs):
func = {"dot": qds, "ecdf": ecdf, "hist": histogram, "kde": kde}[viz]
viz_out = xr.Dataset()
for var_name in var_names:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loop over variables and work at the dataarray level because I was getting unwanted extra broadcastings when doing dataset groupbys. I am not sure if this is an actual limitation or if the way we handle that in https://github.com/arviz-devs/arviz-stats/blob/b3887678dbde60904b782c38456b70c48811a9b2/src/arviz_stats/utils.py#L300 is not idea.

That being said, while I started with a first block to evaluate which variables to compute which visualization for, sticking to dataraay only handling I think allows simplifying the functions and logic to loop over dataarray, check which visualization to use and then compute that immediately without so many intermediate variables.

Comment on lines +1067 to +1071
if fun_label in self._viz_dt.children:
for var_name, da in artist_dt.items():
self._viz_dt[fun_label][var_name] = da
else:
self._viz_dt[fun_label] = artist_dt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this so we can store all the visuals even if calling .map on the same visual for different subsets of variables.

@OriolAbril OriolAbril marked this pull request as ready for review February 18, 2026 19:22
@OriolAbril
Copy link
Member Author

Still needs credible_interval and point_estimates to take groupbys into account (otherwise some cases can end up as nonsensical results or concat errors - in which case try setting point_estimate_text to false and so on).

Should be ready to review but not yet merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add options to plot.density_kind? plot_ppc: groupby behaviour Allow precomputing

2 participants