50 graphviper changes needed for astroviper demonstrator#55
Conversation
…o_parallel_coords.
…graphviper-changes-needed-for-astroviper-demonstrator
…graphviper-changes-needed-for-astroviper-demonstrator
|
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR refactors interpolate_data_coords_onto_parallel_coords in graphviper to reduce overhead when building task-to-dataset selection mappings for the astroviper demonstrator use case, primarily by reordering loops and adding a faster “nearest” interpolation path.
Changes:
- Added helper utilities for chunk handling and a
np.searchsorted-based nearest-index helper to avoid constructing SciPy interpolator objects in common cases. - Reworked the interpolation loop structure to compute per-dimension constants once and avoid redundant work.
- Minor API/style cleanup around
ps_partitionandNonehandling.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif interpolation_method == "nearest": | ||
| # Fast path: no interpolator object construction | ||
| interp_index = _nearest_interp_indices(coord_values, edges) | ||
| else: | ||
| # Fallback for non-nearest methods |
| # We redo this for every partition, because task number will have changed | ||
| iter_chunks_indices, parallel_dims = _make_iter_chunks_indices(parallel_coords) | ||
| for chunk_indices in iter_chunks_indices: | ||
| logger.debug(f"chunk_index: {task_id}, {chunk_indices}") |
| # ----------------------------------------------------------------------- | ||
| # Phase 2 — build node_task_data_mapping | ||
| # | ||
| # Largely unchanged from v1; parallel_dims is extracted once since it is | ||
| # the same for every partition. logger.debug uses %s formatting to skip | ||
| # string construction when debug logging is inactive. | ||
| # ----------------------------------------------------------------------- | ||
| node_task_data_mapping = {} | ||
|
|
||
| task_id = 0 | ||
| for partition in partition_map.keys(): | ||
| # We redo this for every partition, because task number will have changed | ||
| iter_chunks_indices, parallel_dims = _make_iter_chunks_indices(parallel_coords) | ||
| for chunk_indices in iter_chunks_indices: |
| # input_data: Union[Dict, xr.DataTree], | ||
| # interpolation_method: { | ||
| # "linear", | ||
| # "nearest", | ||
| # "nearest-up", | ||
| # "zero", | ||
| # "slinear", | ||
| # "quadratic", | ||
| # "cubic", | ||
| # "previous", | ||
| # "next", | ||
| # } = "nearest", | ||
| # assume_sorted: bool = True, | ||
| # ps_partition: Optional[ | ||
| # list[str] | ||
| # ] = None, # Current options are {'field_name', 'spectral_window_name'} | ||
| # ) -> Dict: | ||
| # """Interpolate data_coords onto parallel_coords to create the ``node_task_data_mapping``. For the case of string coordinates (for example antenna_name), only exact matching is performed. | ||
|
|
||
| # Parameters | ||
| # ---------- | ||
| # parallel_coords : Dict | ||
| # The parallel coordinates determine the parallelism of the map graph. | ||
| # The keys in the parallel coordinates can by any combination of the dimension coordinates in the input data. | ||
| # See notes in docstring for structure. | ||
| # input_data : Union[Dict, ProcessingSet] | ||
| # Can either be a `ProcessingSet <https://github.com/casangi/xradio/blob/main/src/xradio/correlated_data/processing_set.py>`_ or a Dictionary of `xarray.Datasets <https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html>`_. Only coordinates are needed so no actual data is loaded into memory. | ||
| # interpolation_method : {"linear", "nearest", "nearest-up", "zero", "slinear", "quadratic", "cubic", "previous", "next",}, optional | ||
| # The kind of interpolation method to use as described in `Scipy documentation <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html>`_ , by default ``nearest``. | ||
| # assume_sorted : bool, optional | ||
| # Are the data in parallel_coords and input_data monotonically increasing in value, by default True. |
| ) | ||
|
|
||
| d = {} | ||
| # We loop over the cartersian product of the keys |
No description provided.