Skip to content

perf(subtitles): add concurrent chunk processing for AI segmentation#1294

Open
alexj11324 wants to merge 6 commits intomengxi-ream:mainfrom
alexj11324:worktree-fix-pr-1264-race-condition
Open

perf(subtitles): add concurrent chunk processing for AI segmentation#1294
alexj11324 wants to merge 6 commits intomengxi-ream:mainfrom
alexj11324:worktree-fix-pr-1264-race-condition

Conversation

@alexj11324
Copy link
Copy Markdown

@alexj11324 alexj11324 commented Apr 8, 2026

Summary

  • Process up to 3 subtitle chunks in parallel (MAX_CONCURRENT_SEGMENTS), reducing AI segmentation latency for long videos
  • Fix race condition from original perf: Optimize subtitle segmentation with concurrent chunk processing #1264: use collect-then-merge pattern — processChunk returns results, Promise.all collects them, then mergeFragments runs synchronously
  • Fix cross-chunk data loss: mergeFragments uses exact raw-start Set filtering instead of time-range [chunkStart, chunkEnd] filtering, preventing fragments from being incorrectly removed when rebalanceToTargetRange adjusts boundary timestamps
  • AI success path uses rebalanceToTargetRange instead of full optimizeSubtitles — AI already performs semantic sentence segmentation, only length rebalancing is needed
  • Add triggerTranslationTick() to TranslationCoordinator, triggered via onChunkProcessed callback after all chunks in a batch are merged

Supersedes #1264 (closed due to fork deletion). This is a clean reimplementation on latest main with both race condition fixes applied.

Test plan

  • pnpm type-check passes
  • pnpm lint passes
  • pnpm test — 106 files, 964 tests all pass
  • Manual test: seek around in a long video, verify segments near chunk boundaries display correctly
  • Manual test: verify translation kicks in promptly after segmentation completes

Summary by cubic

Process up to 3 subtitle chunks in parallel to speed up segmentation for long videos. Fixes races, prevents cross‑chunk data loss, handles mid‑batch stops safely, and logs errors before falling back when needed.

  • New Features

    • Concurrent chunk processing (up to 3) using Promise.all.
    • Lighter success path with rebalanceToTargetRange instead of optimizeSubtitles.
    • onChunkProcessed now calls triggerTranslationTick() to start translation promptly.
  • Bug Fixes

    • Collect‑then‑merge pattern removes the race on processedFragments.
    • Exact raw‑start filtering prevents cross‑chunk data loss.
    • Stop‑safe: roll back claimed starts, skip merge/callbacks, guard callback errors, block translation after stop(), and fall back when config is missing (with warnings).

Written for commit 5953da6. Summary will update on new commits.

Process up to 3 subtitle chunks in parallel using Promise.all,
with collect-then-merge pattern to prevent race conditions on
shared processedFragments array. Uses exact raw-start Set filtering
instead of time-range filtering to avoid cross-chunk data loss.

- Add MAX_CONCURRENT_SEGMENTS constant
- Export rebalanceToTargetRange for AI success path (skip redundant
  processSubtitles since AI already handles semantic segmentation)
- Add onChunkProcessed callback to trigger translation ticks
- Add triggerTranslationTick to TranslationCoordinator
Copilot AI review requested due to automatic review settings April 8, 2026 01:40
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 8, 2026
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

⚠️ No Changeset found

Latest commit: 5953da6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added perf contrib-trust:new PR author trust score is 0-29. needs-maintainer-review Contributor trust automation recommends maintainer review. labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Contributor trust score

9/100 — New contributor

This score estimates contributor familiarity with mengxi-ream/read-frog using public GitHub signals. It is advisory only and does not block merges automatically.

Outcome

Score breakdown

Dimension Score Signals
Repo familiarity 0/35 commits in repo, merged PRs, reviews
Community standing 4/25 account age, followers, repo role
OSS influence 3/20 stars on owned non-fork repositories
PR track record 2/20 merge rate across resolved PRs in this repo

Signals used

  • Repo commits: 0 (author commits reachable from the repository default branch)
  • Repo PR history: merged 0, open 1, closed-unmerged 3
  • Repo reviews: 0
  • PR changed lines: 119 (+92 / -27)
  • Repo permission: read
  • Followers: 2
  • Account age: 18 months
  • Owned non-fork repos considered: max 1, total 2 (alexj11324/claude-skills (1), alexj11324/TSSPIM-Tsunami-and-Storm-Surge-Population-Impact-Model (1), alexj11324/AI-Scientist-v2 (0), alexj11324/BDI_LLM_Formal_Ver (0), alexj11324/11711-HW (0), alexj11324/DataStructure (0))

Policy

  • Low-score review threshold: < 30
  • Auto-close: score < 20 and changed lines > 1000
  • Policy version: v1.1

Updated automatically when the PR changes or when a maintainer reruns the workflow.

@mengxi-ream mengxi-ream requested a review from taiiiyang April 8, 2026 01:41
@dosubot dosubot bot added the app: browser extension Related to browser extension label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves AI subtitle segmentation throughput and correctness by processing multiple subtitle chunks concurrently, safely merging results to avoid races and cross-chunk fragment loss, and triggering translation work promptly as segmentation completes.

Changes:

  • Process up to MAX_CONCURRENT_SEGMENTS chunks concurrently in SegmentationPipeline, using a collect-then-merge approach.
  • Refine fragment merging to filter by exact raw-start keys and use rebalanceToTargetRange on AI output.
  • Add triggerTranslationTick() to TranslationCoordinator and wire it via an onChunkProcessed callback.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/utils/subtitles/processor/optimizer.ts Exports rebalanceToTargetRange for reuse by the segmentation pipeline.
src/utils/constants/subtitles.ts Introduces MAX_CONCURRENT_SEGMENTS concurrency constant.
src/entrypoints/subtitles.content/universal-adapter.ts Wires onChunkProcessed to translation ticking after segmentation merge.
src/entrypoints/subtitles.content/translation-coordinator.ts Adds triggerTranslationTick() entrypoint for external tick triggering.
src/entrypoints/subtitles.content/segmentation-pipeline.ts Implements concurrent chunk selection/processing and synchronous merge of results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2658387ef2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

this.mergeFragments(result.fragments, result.chunk)
}
}
this.onChunkProcessed?.()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip post-batch translation tick after stop

When subtitles are turned off during an in-flight AI segmentation batch, TranslationCoordinator.stop() only detaches listeners, but runLoop() still invokes onChunkProcessed after Promise.all resolves. Since this callback is wired to triggerTranslationTick(), it can launch translateNearby() and enqueue translation work even after the user has disabled subtitles, causing unnecessary API usage and state updates after shutdown. Please guard this callback with the stop state (or clear the callback on stop) so stop() actually halts all translation activity.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 25232a0. Both guards are now in place: this.stopped check after Promise.all in the pipeline, and stopped flag in TranslationCoordinator.triggerTranslationTick(). The callback is effectively a no-op after stop.

- Check `this.stopped` after `Promise.all` in segmentation pipeline
  to skip merge and callback when pipeline was stopped mid-batch
- Add `stopped` flag to TranslationCoordinator, checked in
  `triggerTranslationTick()` to prevent translation after stop()
- Fall back to non-AI processing when config is null instead of
  returning null, which previously left chunks permanently marked
  as processed but never actually processed
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/entrypoints/subtitles.content/segmentation-pipeline.ts">

<violation number="1" location="src/entrypoints/subtitles.content/segmentation-pipeline.ts:91">
P1: Stopping during an in-flight batch can drop processed chunk results because the new stop check breaks before merging while starts are already marked processed.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25232a0584

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

processChunk now always returns ChunkResult (config-null path falls
back to optimizeSubtitles), so the | null union and null check in
the caller are unnecessary.
When stop() is called during an in-flight Promise.all batch,
the claimed fragments in segmentedRawStarts must be rolled back
so they can be reprocessed on the next start()/restart().
Without this, those chunks are permanently marked as processed
but never merged into processedFragments.
Wrap the callback in try-catch so a throwing callback does not
kill the segmentation loop as an unhandled rejection.
Add console.warn to surface error details (network failure,
rate-limit, malformed response) when AI segmentation fails
and falls back to the simple optimizer.
Copy link
Copy Markdown
Collaborator

@taiiiyang taiiiyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another pass locally and reviewed the updated flow again.
I don't see any remaining concrete issues now — the concurrency handling and stop/restart guards look good to me.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: browser extension Related to browser extension contrib-trust:new PR author trust score is 0-29. needs-maintainer-review Contributor trust automation recommends maintainer review. perf size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants