Skip to content

Conversation

@cubewhy
Copy link

@cubewhy cubewhy commented Feb 1, 2026

Type of Changes

  • ✨ New feature (feat)
  • πŸ› Bug fix (fix)
  • πŸ“ Documentation change (docs)
  • πŸ’„ UI/style change (style)
  • ♻️ Code refactoring (refactor)
  • ⚑ Performance improvement (perf)
  • βœ… Test related (test)
  • πŸ”§ Build or dependencies update (build)
  • πŸ”„ CI/CD related (ci)
  • 🌐 Internationalization (i18n)
  • 🧠 AI model related (ai)
  • πŸ”„ Revert a previous commit (revert)
  • πŸ“¦ Other changes that do not modify src or test files (chore)

Description

  • Introduce 300+ dom rules from @TianmuTNT's fork
  • Refactor original dom-rules.ts for loading rules from json (I have merged the exist rules into the json file)

Related Issue

None

How Has This Been Tested?

Verified via a manually smoke test

  • Added unit tests
  • Verified through manual testing

Screenshots

None

Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly if necessary
  • My code follows the code style of this project
  • My changes do not break existing functionality
  • If my code was generated by AI, I have proofread and improved it as necessary.

I think I cannot access the source code of page https://www.readfrog.app/zh/tutorial/code-contribution/custom-dom-rules
So I cannot modify the docs.

Additional Information


Summary by cubic

Added 360+ website-specific DOM exclusion and block-translation rules, loaded from a JSON config with wildcard URL matching. This improves translation accuracy across popular sites and makes rules easier to maintain.

  • New Features

    • Added JSON-based DOM rules (dontWalkIntoSelectors, forceBlockTranslationSelectors).
    • Implemented wildcard URL pattern matching (*, **) with protocol-optional support.
  • Refactors

    • dom-rules.ts now loads rules from JSON and exposes findMatchingSelectors.
    • Updated filter.ts to use dynamic selector lookup.
    • Enabled resolveJsonModule in tsconfig.

Written for commit 01b23f9. Summary will update on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Feb 1, 2026

πŸ¦‹ Changeset detected

Latest commit: 01b23f9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@read-frog/extension Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 1, 2026
@dosubot
Copy link

dosubot bot commented Feb 1, 2026

Related Documentation

Checked 6 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?Β Β Join Discord

@cubewhy cubewhy changed the title Add 300+ dom rules feat(dom-rules): Add 300+ dom rules Feb 1, 2026
@github-actions github-actions bot added the feat label Feb 1, 2026
@dosubot dosubot bot added the app: website Related to website app label Feb 1, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Prompt for AI agents (all issues)

Check if these issues are valid β€” if so, understand the root cause of each and fix them.


<file name="src/utils/constants/dom-rules.ts">

<violation number="1" location="src/utils/constants/dom-rules.ts:152">
P2: findMatchingSelectors returns empty results while domRules is still loading asynchronously, and call sites use it synchronously. Early DOM scans can skip domain-specific blocking rules until the JSON import finishes, changing behavior from the previous synchronous constants.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@claude
Copy link

claude bot commented Feb 1, 2026

PR Review: feat(dom-rules): Add 300+ dom rules

Overall Assessment

🟒 Good addition - Bringing in 360+ website-specific DOM rules from a fork is valuable for improving translation accuracy across popular sites. The refactor to JSON-based configuration makes rules easier to maintain.


Key Insights

Data structure: JSON config with two rule types (dontWalkIntoSelectors, forceBlockTranslationSelectors) mapped by URL patterns. Clean separation.

Complexity: The glob-to-regex pattern matching is the main logic addition. Implementation is straightforward but has edge cases.

Risk points:

  • Existing tests may break due to changed imports/exports
  • Performance impact of pattern matching on every DOM check
  • Some rules in JSON duplicate the same selector arrays (e.g., Twitter variants)

Issues to Address

1. Potential Breaking Change: YouTube rules removed

The original CUSTOM_DONT_WALK_INTO_ELEMENT_SELECTOR_MAP had YouTube-specific rules using imported constants:

`.${SUBTITLES_VIEW_CLASS}`,
`.${STATE_MESSAGE_CLASS}`,
`.${TRANSLATE_BUTTON_CLASS}`,

These dynamic selectors based on JS constants are NOT present in the JSON file. The YouTube entry in dom-rules.json only has:

"www.youtube.com": [".ytp-caption-segment"]

This is a regression - the extension's own subtitle-related elements won't be excluded from translation.

Suggestion: Either add the actual class values to the JSON, or create a mechanism to merge runtime constants with JSON rules for YouTube.

2. Performance: Pattern matching on every element

findMatchingSelectors() is called for every element check via isCustomDontWalkIntoElement() and isCustomForceBlockTranslation(). Each call:

  • Creates a new URL object
  • Iterates through all 300+ patterns when no exact match

Suggestion: Cache the matched selectors per URL at the start of page translation rather than re-computing per element.

3. Massive duplication in JSON

Twitter/X rules are duplicated 10+ times for different subdomains. Same for Stack Exchange sites.

Suggestion: Consider using wildcard patterns more aggressively:

"*.twitter.com": [...],
"*.x.com": [...],

Or introduce rule inheritance/aliases to reduce the 2200-line JSON file.

4. Missing tests for new pattern matching logic

globToRegex() and matchUrlPattern() are new untested functions with edge cases:

  • Protocol handling (https:// vs http:// vs none)
  • Wildcard at different positions (*.domain.com, domain.com/*, **)
  • URL with query strings and fragments

The existing tests in custom-dont-walk.test.ts and custom-force-block.test.ts use setHost() which only tests hostname matching, not the new URL pattern features.

5. Edge case bug in matchUrlPattern

if (!pattern.startsWith('http://') && !pattern.startsWith('https://')) {
  return regex.test(url.replace(/^https?:\/\//, ''))
}

If pattern doesn't start with protocol but URL does, it strips protocol from URL. But if pattern contains path like github.com/user/*, the generated regex expects the pattern format, not the stripped URL format. This could cause false negatives.


Minor Issues

  1. Comment inconsistency (line 2340-2341):
if (!domRules)
  // Not initialized yet
  return []

Comment on wrong line - should be above or inline.

  1. Empty selector handling: If dontWalkIntoElementSelectorList is empty, joining produces "" which when passed to element.matches("") throws an error. The check if (!dontWalkSelector) handles this, but worth noting.

  2. Some JSON patterns use full URLs with https://:

"https://www.reddit.com/r/*/comments/*/*": [...]

While others use just hostnames. This inconsistency works due to the pattern matching but reduces clarity.


What's Good

  • Clean interface design (DomRulesConfig)
  • Proper TypeScript typing with the JSON module
  • Sensible fallback behavior (empty array when no rules match)
  • Good changeset documentation

Recommendation

Worth merging with fixes:

  1. Add YouTube constants to JSON (or implement runtime merge)
  2. Add caching for pattern matching results
  3. Add unit tests for globToRegex and matchUrlPattern

The core value (300+ rules) is solid. The implementation needs minor hardening before merge.

@ananaBMaster
Copy link
Collaborator

can I ask where are these rules originally from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: website Related to website app feat size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants