perf(subtitles): add concurrent chunk processing for AI segmentation#1294
perf(subtitles): add concurrent chunk processing for AI segmentation#1294alexj11324 wants to merge 6 commits intomengxi-ream:mainfrom
Conversation
Process up to 3 subtitle chunks in parallel using Promise.all, with collect-then-merge pattern to prevent race conditions on shared processedFragments array. Uses exact raw-start Set filtering instead of time-range filtering to avoid cross-chunk data loss. - Add MAX_CONCURRENT_SEGMENTS constant - Export rebalanceToTargetRange for AI success path (skip redundant processSubtitles since AI already handles semantic segmentation) - Add onChunkProcessed callback to trigger translation ticks - Add triggerTranslationTick to TranslationCoordinator
|
Contributor trust score9/100 — New contributor This score estimates contributor familiarity with Outcome
Score breakdown
Signals used
Policy
Updated automatically when the PR changes or when a maintainer reruns the workflow. |
There was a problem hiding this comment.
Pull request overview
This PR improves AI subtitle segmentation throughput and correctness by processing multiple subtitle chunks concurrently, safely merging results to avoid races and cross-chunk fragment loss, and triggering translation work promptly as segmentation completes.
Changes:
- Process up to
MAX_CONCURRENT_SEGMENTSchunks concurrently inSegmentationPipeline, using a collect-then-merge approach. - Refine fragment merging to filter by exact raw-start keys and use
rebalanceToTargetRangeon AI output. - Add
triggerTranslationTick()toTranslationCoordinatorand wire it via anonChunkProcessedcallback.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/subtitles/processor/optimizer.ts |
Exports rebalanceToTargetRange for reuse by the segmentation pipeline. |
src/utils/constants/subtitles.ts |
Introduces MAX_CONCURRENT_SEGMENTS concurrency constant. |
src/entrypoints/subtitles.content/universal-adapter.ts |
Wires onChunkProcessed to translation ticking after segmentation merge. |
src/entrypoints/subtitles.content/translation-coordinator.ts |
Adds triggerTranslationTick() entrypoint for external tick triggering. |
src/entrypoints/subtitles.content/segmentation-pipeline.ts |
Implements concurrent chunk selection/processing and synchronous merge of results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2658387ef2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.mergeFragments(result.fragments, result.chunk) | ||
| } | ||
| } | ||
| this.onChunkProcessed?.() |
There was a problem hiding this comment.
Skip post-batch translation tick after stop
When subtitles are turned off during an in-flight AI segmentation batch, TranslationCoordinator.stop() only detaches listeners, but runLoop() still invokes onChunkProcessed after Promise.all resolves. Since this callback is wired to triggerTranslationTick(), it can launch translateNearby() and enqueue translation work even after the user has disabled subtitles, causing unnecessary API usage and state updates after shutdown. Please guard this callback with the stop state (or clear the callback on stop) so stop() actually halts all translation activity.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 25232a0. Both guards are now in place: this.stopped check after Promise.all in the pipeline, and stopped flag in TranslationCoordinator.triggerTranslationTick(). The callback is effectively a no-op after stop.
- Check `this.stopped` after `Promise.all` in segmentation pipeline to skip merge and callback when pipeline was stopped mid-batch - Add `stopped` flag to TranslationCoordinator, checked in `triggerTranslationTick()` to prevent translation after stop() - Fall back to non-AI processing when config is null instead of returning null, which previously left chunks permanently marked as processed but never actually processed
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/entrypoints/subtitles.content/segmentation-pipeline.ts">
<violation number="1" location="src/entrypoints/subtitles.content/segmentation-pipeline.ts:91">
P1: Stopping during an in-flight batch can drop processed chunk results because the new stop check breaks before merging while starts are already marked processed.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25232a0584
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
processChunk now always returns ChunkResult (config-null path falls back to optimizeSubtitles), so the | null union and null check in the caller are unnecessary.
When stop() is called during an in-flight Promise.all batch, the claimed fragments in segmentedRawStarts must be rolled back so they can be reprocessed on the next start()/restart(). Without this, those chunks are permanently marked as processed but never merged into processedFragments.
Wrap the callback in try-catch so a throwing callback does not kill the segmentation loop as an unhandled rejection.
Add console.warn to surface error details (network failure, rate-limit, malformed response) when AI segmentation fails and falls back to the simple optimizer.
Summary
MAX_CONCURRENT_SEGMENTS), reducing AI segmentation latency for long videosprocessChunkreturns results,Promise.allcollects them, thenmergeFragmentsruns synchronouslymergeFragmentsuses exact raw-start Set filtering instead of time-range[chunkStart, chunkEnd]filtering, preventing fragments from being incorrectly removed whenrebalanceToTargetRangeadjusts boundary timestampsrebalanceToTargetRangeinstead of fulloptimizeSubtitles— AI already performs semantic sentence segmentation, only length rebalancing is neededtriggerTranslationTick()toTranslationCoordinator, triggered viaonChunkProcessedcallback after all chunks in a batch are mergedSupersedes #1264 (closed due to fork deletion). This is a clean reimplementation on latest
mainwith both race condition fixes applied.Test plan
pnpm type-checkpassespnpm lintpassespnpm test— 106 files, 964 tests all passSummary by cubic
Process up to 3 subtitle chunks in parallel to speed up segmentation for long videos. Fixes races, prevents cross‑chunk data loss, handles mid‑batch stops safely, and logs errors before falling back when needed.
New Features
Promise.all.rebalanceToTargetRangeinstead ofoptimizeSubtitles.onChunkProcessednow callstriggerTranslationTick()to start translation promptly.Bug Fixes
processedFragments.stop(), and fall back when config is missing (with warnings).Written for commit 5953da6. Summary will update on new commits.