disclaimer: prettified with ai
The sampler documentation should be cleaned up so it is easier to read and better aligned with the public API. In ChunkSampler, the parameter section should be reordered to follow the constructor flow more naturally, and several descriptions should be rewritten to better explain chunking, batching, masking, and RNG behavior. In DistributedRandomSampler, long parameter descriptions should be wrapped and tightened so the generated docs are easier to scan.
Proposed Changes
- Reorder the
ChunkSampler parameter docs to match the way users read and configure the sampler.
- Clarify
chunk_size, preload_nchunks, and batch_size, especially the relationship between them.
- Rewrite
shuffle, drop_last, mask, and rng descriptions to be more explicit and user-facing.
- Reformat long
DistributedRandomSampler parameter descriptions for readability.
- Keep this as a documentation-only change with no functional behavior change.
Focused Diff
diff --git a/src/annbatch/samplers/_chunk_sampler.py b/src/annbatch/samplers/_chunk_sampler.py
@@
- batch_size
- Number of observations per batch.
chunk_size
- Size of each chunk i.e. the range of each chunk yielded.
- mask
- A slice defining the observation range to sample from (start:stop).
- shuffle
- Whether to shuffle chunk and index order.
+ Number of contiguous observations per on-disk chunk.
preload_nchunks
- Number of chunks to load per iteration.
- drop_last
- Whether to drop the last incomplete batch.
- rng
- Random number generator for shuffling. Note that :func:`torch.manual_seed`
- has no effect on reproducibility here; pass a seeded
- :class:`numpy.random.Generator` to control randomness.
+ Number of chunks to group into each I/O request.
+ ``chunk_size * preload_nchunks`` must be divisible by
+ ``batch_size``.
+ batch_size
+ Number of observations per batch. Must not exceed
+ ``chunk_size * preload_nchunks``.
@@
+ shuffle
+ If ``True``, shuffle chunk order within each epoch.
+ drop_last
+ If ``True``, drop the final batch when it contains fewer than
+ ``batch_size`` observations.
+ mask
+ A ``slice`` restricting sampling to a sub-range of observations.
+ For example, ``slice(100, 500)`` limits sampling to observations
+ 100 through 499.
+ rng
+ A :class:`numpy.random.Generator` used for shuffling and
+ replacement draws. When ``None``, a new default generator is
+ created.
diff --git a/src/annbatch/samplers/_distributed_random_sampler.py b/src/annbatch/samplers/_distributed_random_sampler.py
@@
- Either a string naming a distributed backend (``"torch"`` or ``"jax"``),
- or a callable that returns ``(rank, world_size)``.
+ Either a string naming a distributed backend (``"torch"`` or
+ ``"jax"``), or a callable that returns ``(rank, world_size)``.
@@
- If *True*, round each rank's observation count down to a multiple of ``batch_size`` so that all workers (ranks) yield the same number of batches.
- Set to *False* to use the raw ``n_obs // world_size`` split, which may result in an uneven number of batches per worker.
+ If *True*, round each rank's observation count down to a
+ multiple of ``batch_size`` so that all workers (ranks) yield
+ the same number of batches.
+ Set to *False* to use the raw ``n_obs // world_size`` split,
+ which may result in an uneven number of batches per worker.
disclaimer: prettified with ai
The sampler documentation should be cleaned up so it is easier to read and better aligned with the public API. In
ChunkSampler, the parameter section should be reordered to follow the constructor flow more naturally, and several descriptions should be rewritten to better explain chunking, batching, masking, and RNG behavior. InDistributedRandomSampler, long parameter descriptions should be wrapped and tightened so the generated docs are easier to scan.Proposed Changes
ChunkSamplerparameter docs to match the way users read and configure the sampler.chunk_size,preload_nchunks, andbatch_size, especially the relationship between them.shuffle,drop_last,mask, andrngdescriptions to be more explicit and user-facing.DistributedRandomSamplerparameter descriptions for readability.Focused Diff