Hi, thanks for releasing RoboOmni and OmniAction.
I have two questions while inspecting the dataset and trying to understand the training setup.
1. Timing / FPS consistency with the source OXE datasets
For subsets such as bridge_*, fractal_*, jaco_play_*, and ucsd_kitchen_*, does OmniAction keep the same frame rate / control frequency as the corresponding source OXE datasets, or were the trajectories temporally re-sampled during preprocessing?
I would especially like to confirm:
- whether OmniAction preserves the original timing of the source datasets
- whether any temporal downsampling or upsampling was applied
- whether the alignment between audio/context signals and robot trajectories assumes the original timing or a modified timing
- whether different subsets use different effective frame rates
This would be very helpful for interpreting action timing correctly and for reproducing training faithfully.
2. jaco_play_events audio path issue
While inspecting jaco_play_events, I found that the speech_conv field contains paths such as ./events/jaco_play_247_1.wav and ./events/jaco_play_247_2.wav.
However, I could not find the corresponding jaco_play_* audio files in the public dataset repository under speech/events, or elsewhere under speech.
At the same time, some metadata JSON files under speech/events appear to contain internal or original-generation speech paths, so I am not sure whether:
- the
jaco_play_* audio files are missing from the public release
- the files were renamed or reorganized
- or there is an additional preprocessing step needed to map
speech_conv to the actual audio files used in training
Could you clarify the intended audio directory structure for training on OmniAction, especially for jaco_play_events?
Thanks a lot.
Hi, thanks for releasing RoboOmni and OmniAction.
I have two questions while inspecting the dataset and trying to understand the training setup.
1. Timing / FPS consistency with the source OXE datasets
For subsets such as
bridge_*,fractal_*,jaco_play_*, anducsd_kitchen_*, does OmniAction keep the same frame rate / control frequency as the corresponding source OXE datasets, or were the trajectories temporally re-sampled during preprocessing?I would especially like to confirm:
This would be very helpful for interpreting action timing correctly and for reproducing training faithfully.
2.
jaco_play_eventsaudio path issueWhile inspecting
jaco_play_events, I found that thespeech_convfield contains paths such as./events/jaco_play_247_1.wavand./events/jaco_play_247_2.wav.However, I could not find the corresponding
jaco_play_*audio files in the public dataset repository underspeech/events, or elsewhere underspeech.At the same time, some metadata JSON files under
speech/eventsappear to contain internal or original-generation speech paths, so I am not sure whether:jaco_play_*audio files are missing from the public releasespeech_convto the actual audio files used in trainingCould you clarify the intended audio directory structure for training on OmniAction, especially for
jaco_play_events?Thanks a lot.