Feature request
Change the way how mapping datasets in veomni/data/dataset.py bring randomness to the data fetching of index over the length of the dataset
Motivation
When we use some dataset (like the DynamicBatchingDataset) without DistributedSampler on a mapping dataset, the fetching of data might cause a get_item() to be called on the mapping dataset with an index greater than the length of the datset. However, in veomni/data/dataset.py we can see that for such an index, the randomness is brought by calling "random.shuffle" on the whole indices array, which is not deterministic because there is no seed defined to control the shuffle.
Your contribution
It can be updated with a PR to update the way the mapping dataset is shuffled when index is over the length
Feature request
Change the way how mapping datasets in
veomni/data/dataset.pybring randomness to the data fetching of index over the length of the datasetMotivation
When we use some dataset (like the DynamicBatchingDataset) without DistributedSampler on a mapping dataset, the fetching of data might cause a get_item() to be called on the mapping dataset with an index greater than the length of the datset. However, in
veomni/data/dataset.pywe can see that for such an index, the randomness is brought by calling "random.shuffle" on the whole indices array, which is not deterministic because there is no seed defined to control the shuffle.Your contribution
It can be updated with a PR to update the way the mapping dataset is shuffled when index is over the length