[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper#902
[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper#9021van2ha0 wants to merge 4 commits intodatajuicer:mainfrom
Conversation
Summary of ChangesHello @1van2ha0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the video processing capabilities of the Data-Juicer framework by introducing two powerful new mappers. One mapper focuses on extracting crucial camera intrinsic data from video streams, which is vital for computer vision tasks requiring precise spatial understanding. The other provides flexible video segmentation, enabling users to break down long videos into manageable, overlapping or non-overlapping clips based on frames or duration. These additions enhance the framework's utility for multimodal data preparation, particularly for video-centric applications. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces two new mappers, video_calibration_mapper and video_split_by_frame_mapper, along with associated demos and tests. The changes are generally well-structured, but there are a few critical issues and areas for improvement. A significant bug has been introduced in ray_dataset.py that breaks the handling of Mapper and Filter operators. Additionally, there are opportunities to improve resource management in video_calibration_mapper and error handling in video_split_by_frame_mapper. Please see the detailed comments for specific feedback.
| from data_juicer.core.data.schema import Schema | ||
| from data_juicer.core.tracer import should_trace_op | ||
| from data_juicer.ops import Deduplicator, Filter, Mapper, Pipeline | ||
| from data_juicer.ops import Deduplicator, Mapper, Pipeline |
There was a problem hiding this comment.
The Filter operator is removed from imports, but its logic is still used (and needed) in _run_single_op. This should be re-added to fix the broken logic for handling Filter operators.
| from data_juicer.ops import Deduplicator, Mapper, Pipeline | |
| from data_juicer.ops import Deduplicator, Filter, Mapper, Pipeline |
| if tracer and should_trace_op(tracer, op._name) and original_process: | ||
| op.process = original_process | ||
| elif isinstance(op, Filter): | ||
| # Use cached_columns instead of self.data.columns() to avoid breaking pipeline |
There was a problem hiding this comment.
Removing elif isinstance(op, Filter): merges the logic for Filter operators into the Mapper operator block. This will cause an AttributeError when a Mapper op is processed, as Mapper does not have methods like compute_stats which are specific to Filter. The separate logic for Mapper and Filter should be restored.
elif isinstance(op, Filter):
# Use cached_columns instead of self.data.columns() to avoid breaking pipeline| for i in range(1): | ||
| for t, image, intrinsics, size_factor in stream: | ||
| if droid is None: | ||
| # Update args with actual image size | ||
| self.droid_args.image_size = [image.shape[2], image.shape[3]] | ||
| droid = Droid(self.droid_args) | ||
|
|
||
| droid.track(t, image, intrinsics=intrinsics) | ||
| sf = size_factor | ||
|
|
||
| if droid is not None: | ||
| # Terminate and get results | ||
| # We need to pass the stream again for terminate? | ||
| # demo.py: droid.terminate(image_stream(...)) | ||
| # It seems terminate does a final BA pass using the stream? | ||
| # Let's recreate stream | ||
| stream_second_pass = self._image_stream(video_path) | ||
| traj_est, intr_est = droid.terminate(stream_second_pass) | ||
|
|
||
| # Rescale intrinsics back to original resolution | ||
| if sf: | ||
| intr_est = intr_est.copy() | ||
| intr_est[0:4:2] /= sf[0] | ||
| intr_est[1:4:2] /= sf[1] | ||
|
|
||
| intr_est_list = intr_est.tolist() | ||
|
|
||
| # except Exception as e: | ||
| # # Log error or just skip | ||
| # print(f"Error processing video {video_path}: {e}") | ||
| # finally: | ||
| # Cleanup | ||
| if droid: | ||
| del droid | ||
| torch.cuda.empty_cache() |
There was a problem hiding this comment.
The resource cleanup logic (e.g., del droid, torch.cuda.empty_cache()) is not guaranteed to run if an exception occurs during video processing. It's better to wrap the processing logic in a try...finally block to ensure resources are always released. The for i in range(1): loop is unnecessary and can be removed. Also, consider using the logger for error messages instead of print.
try:
for t, image, intrinsics, size_factor in stream:
if droid is None:
# Update args with actual image size
self.droid_args.image_size = [image.shape[2], image.shape[3]]
droid = Droid(self.droid_args)
droid.track(t, image, intrinsics=intrinsics)
sf = size_factor
if droid is not None:
# Terminate and get results
stream_second_pass = self._image_stream(video_path)
_, intr_est = droid.terminate(stream_second_pass)
# Rescale intrinsics back to original resolution
if sf:
intr_est = intr_est.copy()
intr_est[0:4:2] /= sf[0]
intr_est[1:4:2] /= sf[1]
intr_est_list = intr_est.tolist()
except Exception as e:
if self.verbose:
logger.warning(f'Error processing video {video_path}: {e}')
finally:
# Cleanup
if droid:
del droid
torch.cuda.empty_cache()| keys = samples_after_split[0].keys() | ||
| res_samples = {} | ||
| for key in keys: | ||
| res_samples[key] = [s[key] for s in samples_after_split] | ||
| return res_samples |
There was a problem hiding this comment.
If samples_after_split is empty, accessing samples_after_split[0] will raise an IndexError. This can happen if no samples are kept or generated. You should add a check to handle this case to prevent the operator from crashing.
| keys = samples_after_split[0].keys() | |
| res_samples = {} | |
| for key in keys: | |
| res_samples[key] = [s[key] for s in samples_after_split] | |
| return res_samples | |
| if not samples_after_split: | |
| return {key: [] for key in samples.keys()} | |
| keys = samples_after_split[0].keys() | |
| res_samples = {} | |
| for key in keys: | |
| res_samples[key] = [s[key] for s in samples_after_split] | |
| return res_samples |
| # Default encoding args from user code | ||
| # We can allow overriding via ffmpeg_extra_args | ||
| if self.ffmpeg_extra_args: | ||
| import shlex |
This PR introduces two new operators:
Core implementation logic for both operators has been completed, but significant validation and polish work remains before this can be considered production-ready.
Completed
demos/install_droidcalib.sh) created for DroidCalib in video_calibration_mapperPending Work
DroidCalibbefore use. Included shell script (demos/install_droidcalib.sh) has not been tested and may fail.Next Steps
demos/install_droidcalib.shscript