Skip to content

[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper#902

Open
1van2ha0 wants to merge 4 commits intodatajuicer:mainfrom
1van2ha0:feature/vlapipeline
Open

[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper#902
1van2ha0 wants to merge 4 commits intodatajuicer:mainfrom
1van2ha0:feature/vlapipeline

Conversation

@1van2ha0
Copy link

@1van2ha0 1van2ha0 commented Feb 1, 2026

This PR introduces two new operators:

  • video_calibration_mapper: Extract camera intrinsics from videos using DroidCalib.
  • video_split_by_frame_mapper: Splits videos into segments based on a specified frame count or duration.

Core implementation logic for both operators has been completed, but significant validation and polish work remains before this can be considered production-ready.

Completed

  • Core logic implementation for Operators
  • Basic unit tests written for both operators
  • Minimal working demos
  • Shell installation script (demos/install_droidcalib.sh) created for DroidCalib in video_calibration_mapper

Pending Work

Item Status Details
unittests not validated Tests are written but have not been executed. Expected to fail.
demos rough Current demos are minimal stubs
DroidCalib blocker Requires manual installation of DroidCalib before use. Included shell script (demos/install_droidcalib.sh) has not been tested and may fail.

Next Steps

  • Test and fix demos/install_droidcalib.sh script
  • Verify unit tests pass in clean environment
  • Refine demos

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @1van2ha0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the video processing capabilities of the Data-Juicer framework by introducing two powerful new mappers. One mapper focuses on extracting crucial camera intrinsic data from video streams, which is vital for computer vision tasks requiring precise spatial understanding. The other provides flexible video segmentation, enabling users to break down long videos into manageable, overlapping or non-overlapping clips based on frames or duration. These additions enhance the framework's utility for multimodal data preparation, particularly for video-centric applications.

Highlights

  • New Video Calibration Mapper: Introduced VideoCalibrationMapper which extracts camera intrinsics from videos using the DroidCalib library. This operator dynamically loads DroidCalib and handles video processing to determine camera parameters.
  • New Video Split by Frame Mapper: Added VideoSplitByFrameMapper to segment videos based on a specified frame count or duration, with support for overlapping segments. It leverages FFmpeg for video splitting.
  • Dependency Management for DroidCalib: The VideoCalibrationMapper includes logic for ensuring DroidCalib is importable at runtime, and a dedicated shell script (demos/video_calibration/install_droidcalib.sh) is provided for its installation.
  • Work in Progress (WIP) Status: The core logic for both new operators is complete, but significant validation, testing, and polish work is still pending, as indicated by the '[WIP]' tag and the PR description.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new mappers, video_calibration_mapper and video_split_by_frame_mapper, along with associated demos and tests. The changes are generally well-structured, but there are a few critical issues and areas for improvement. A significant bug has been introduced in ray_dataset.py that breaks the handling of Mapper and Filter operators. Additionally, there are opportunities to improve resource management in video_calibration_mapper and error handling in video_split_by_frame_mapper. Please see the detailed comments for specific feedback.

from data_juicer.core.data.schema import Schema
from data_juicer.core.tracer import should_trace_op
from data_juicer.ops import Deduplicator, Filter, Mapper, Pipeline
from data_juicer.ops import Deduplicator, Mapper, Pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Filter operator is removed from imports, but its logic is still used (and needed) in _run_single_op. This should be re-added to fix the broken logic for handling Filter operators.

Suggested change
from data_juicer.ops import Deduplicator, Mapper, Pipeline
from data_juicer.ops import Deduplicator, Filter, Mapper, Pipeline

if tracer and should_trace_op(tracer, op._name) and original_process:
op.process = original_process
elif isinstance(op, Filter):
# Use cached_columns instead of self.data.columns() to avoid breaking pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Removing elif isinstance(op, Filter): merges the logic for Filter operators into the Mapper operator block. This will cause an AttributeError when a Mapper op is processed, as Mapper does not have methods like compute_stats which are specific to Filter. The separate logic for Mapper and Filter should be restored.

            elif isinstance(op, Filter):
                # Use cached_columns instead of self.data.columns() to avoid breaking pipeline

Comment on lines +239 to +273
for i in range(1):
for t, image, intrinsics, size_factor in stream:
if droid is None:
# Update args with actual image size
self.droid_args.image_size = [image.shape[2], image.shape[3]]
droid = Droid(self.droid_args)

droid.track(t, image, intrinsics=intrinsics)
sf = size_factor

if droid is not None:
# Terminate and get results
# We need to pass the stream again for terminate?
# demo.py: droid.terminate(image_stream(...))
# It seems terminate does a final BA pass using the stream?
# Let's recreate stream
stream_second_pass = self._image_stream(video_path)
traj_est, intr_est = droid.terminate(stream_second_pass)

# Rescale intrinsics back to original resolution
if sf:
intr_est = intr_est.copy()
intr_est[0:4:2] /= sf[0]
intr_est[1:4:2] /= sf[1]

intr_est_list = intr_est.tolist()

# except Exception as e:
# # Log error or just skip
# print(f"Error processing video {video_path}: {e}")
# finally:
# Cleanup
if droid:
del droid
torch.cuda.empty_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The resource cleanup logic (e.g., del droid, torch.cuda.empty_cache()) is not guaranteed to run if an exception occurs during video processing. It's better to wrap the processing logic in a try...finally block to ensure resources are always released. The for i in range(1): loop is unnecessary and can be removed. Also, consider using the logger for error messages instead of print.

        try:
            for t, image, intrinsics, size_factor in stream:
                if droid is None:
                    # Update args with actual image size
                    self.droid_args.image_size = [image.shape[2], image.shape[3]]
                    droid = Droid(self.droid_args)

                droid.track(t, image, intrinsics=intrinsics)
                sf = size_factor

            if droid is not None:
                # Terminate and get results
                stream_second_pass = self._image_stream(video_path)
                _, intr_est = droid.terminate(stream_second_pass)

                # Rescale intrinsics back to original resolution
                if sf:
                    intr_est = intr_est.copy()
                    intr_est[0:4:2] /= sf[0]
                    intr_est[1:4:2] /= sf[1]

                intr_est_list = intr_est.tolist()
        except Exception as e:
            if self.verbose:
                logger.warning(f'Error processing video {video_path}: {e}')
        finally:
            # Cleanup
            if droid:
                del droid
            torch.cuda.empty_cache()

Comment on lines +286 to +290
keys = samples_after_split[0].keys()
res_samples = {}
for key in keys:
res_samples[key] = [s[key] for s in samples_after_split]
return res_samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If samples_after_split is empty, accessing samples_after_split[0] will raise an IndexError. This can happen if no samples are kept or generated. You should add a check to handle this case to prevent the operator from crashing.

Suggested change
keys = samples_after_split[0].keys()
res_samples = {}
for key in keys:
res_samples[key] = [s[key] for s in samples_after_split]
return res_samples
if not samples_after_split:
return {key: [] for key in samples.keys()}
keys = samples_after_split[0].keys()
res_samples = {}
for key in keys:
res_samples[key] = [s[key] for s in samples_after_split]
return res_samples

# Default encoding args from user code
# We can allow overriding via ffmpeg_extra_args
if self.ffmpeg_extra_args:
import shlex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports should be at the top of the file to follow standard Python style guides (PEP 8). Please move import shlex to the top-level imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant