pre-computed and kind=auto dist visualizations protype#310
pre-computed and kind=auto dist visualizations protype#310OriolAbril wants to merge 3 commits intomainfrom
Conversation
67070c5 to
8153402
Compare
|
@OriolAbril, what is the status of this? Anything I could do to help? |
|
I need to prioritize it over the rest of things. This one is probably more important but doesn't feel so urgent which is something I always struggle with |
8153402 to
1e3763d
Compare
|
|
||
| # point estimate text | ||
| if pet_kwargs is not False: | ||
| # TODO: handle multiple kinds |
There was a problem hiding this comment.
I think I did but I am still not sure if past me was a bit dumb or too smart for present me to understand.
My current reasoning. When we do the dim for dim in density.sel(plot_axis="y").dims if dim not in point.dims I don't think there is any scenario where kde_dim or hist_dim_{var_name} so that will already contain all the dimensions we want to reduce independently of the kind. The explicit addition should be redundant I think? But maybe I am missing some edge case that made us go for the explicit alternative?
Note: we can now have y values in either the "y" or "histogram" coordinate value, depending on the variable the values will be in one of them (and only one) and the other will be nan. With xarray ignoring nans by default, keeping any of these two coords that are present plus the dim filter I mention above we reduce plot_axis too (unless it were in point data which again I see no case where it could happen).
| active_dims : list | ||
| Dimensions that have either faceting or aesthetic mappings | ||
| active for that visual. Should not be reduced and should have | ||
| a groupby performed on them if computing summaries. |
There was a problem hiding this comment.
I have added this variant because it is needed to know when we need a groupby as these happen on dimensions we are faceting on or have aesthetics mapped but also repeated coord values. In those cases we'll reduce that dimension through a groupby so we end up with that dimension still present but with unique coord values now.
| def _compute_viz_for_subset(viz, data, var_names, active_dims, reduce_dims, kwargs): | ||
| func = {"dot": qds, "ecdf": ecdf, "hist": histogram, "kde": kde}[viz] | ||
| viz_out = xr.Dataset() | ||
| for var_name in var_names: |
There was a problem hiding this comment.
I loop over variables and work at the dataarray level because I was getting unwanted extra broadcastings when doing dataset groupbys. I am not sure if this is an actual limitation or if the way we handle that in https://github.com/arviz-devs/arviz-stats/blob/b3887678dbde60904b782c38456b70c48811a9b2/src/arviz_stats/utils.py#L300 is not idea.
That being said, while I started with a first block to evaluate which variables to compute which visualization for, sticking to dataraay only handling I think allows simplifying the functions and logic to loop over dataarray, check which visualization to use and then compute that immediately without so many intermediate variables.
| if fun_label in self._viz_dt.children: | ||
| for var_name, da in artist_dt.items(): | ||
| self._viz_dt[fun_label][var_name] = da | ||
| else: | ||
| self._viz_dt[fun_label] = artist_dt |
There was a problem hiding this comment.
I added this so we can store all the visuals even if calling .map on the same visual for different subsets of variables.
|
Still needs credible_interval and point_estimates to take groupbys into account (otherwise some cases can end up as nonsensical results or concat errors - in which case try setting point_estimate_text to false and so on). Should be ready to review but not yet merge. |
Rough idea shared on slack, I will update the description as the idea and implementation gets more clear.
This should close #299, close #40 and also close #149.
📚 Documentation preview 📚: https://arviz-plots--310.org.readthedocs.build/en/310/