feat(dom-rules): Add 300+ dom rules #931

cubewhy · 2026-02-01T12:13:01Z

Type of Changes

Description

Introduce 300+ dom rules from @TianmuTNT's fork
Refactor original dom-rules.ts for loading rules from json (I have merged the exist rules into the json file)

Related Issue

None

How Has This Been Tested?

Verified via a manually smoke test

Added unit tests
Verified through manual testing

Screenshots

None

Checklist

I have tested these changes locally
I have updated the documentation accordingly if necessary
My code follows the code style of this project
My changes do not break existing functionality
If my code was generated by AI, I have proofread and improved it as necessary.

I think I cannot access the source code of page https://www.readfrog.app/zh/tutorial/code-contribution/custom-dom-rules
So I cannot modify the docs.

Additional Information

Summary by cubic

Added 360+ website-specific DOM exclusion and block-translation rules, loaded from a JSON config with wildcard URL matching. This improves translation accuracy across popular sites and makes rules easier to maintain.

New Features
- Added JSON-based DOM rules (dontWalkIntoSelectors, forceBlockTranslationSelectors).
- Implemented wildcard URL pattern matching (*, **) with protocol-optional support.
Refactors
- dom-rules.ts now loads rules from JSON and exposes findMatchingSelectors.
- Updated filter.ts to use dynamic selector lookup.
- Enabled resolveJsonModule in tsconfig.

^{Written for commit 01b23f9. Summary will update on new commits.}

changeset-bot · 2026-02-01T12:13:05Z

🦋 Changeset detected

Latest commit: 01b23f9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@read-frog/extension	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

dosubot · 2026-02-01T12:13:15Z

Related Documentation

Checked 6 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

cubic-dev-ai

1 issue found across 5 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/utils/constants/dom-rules.ts">

<violation number="1" location="src/utils/constants/dom-rules.ts:152">
P2: findMatchingSelectors returns empty results while domRules is still loading asynchronously, and call sites use it synchronously. Early DOM scans can skip domain-specific blocking rules until the JSON import finishes, changing behavior from the previous synchronous constants.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/utils/constants/dom-rules.ts

claude · 2026-02-01T15:11:21Z

PR Review: feat(dom-rules): Add 300+ dom rules

Overall Assessment

🟢 Good addition - Bringing in 360+ website-specific DOM rules from a fork is valuable for improving translation accuracy across popular sites. The refactor to JSON-based configuration makes rules easier to maintain.

Key Insights

Data structure: JSON config with two rule types (dontWalkIntoSelectors, forceBlockTranslationSelectors) mapped by URL patterns. Clean separation.

Complexity: The glob-to-regex pattern matching is the main logic addition. Implementation is straightforward but has edge cases.

Risk points:

Existing tests may break due to changed imports/exports
Performance impact of pattern matching on every DOM check
Some rules in JSON duplicate the same selector arrays (e.g., Twitter variants)

Issues to Address

1. Potential Breaking Change: YouTube rules removed

The original CUSTOM_DONT_WALK_INTO_ELEMENT_SELECTOR_MAP had YouTube-specific rules using imported constants:

`.${SUBTITLES_VIEW_CLASS}`,
`.${STATE_MESSAGE_CLASS}`,
`.${TRANSLATE_BUTTON_CLASS}`,

These dynamic selectors based on JS constants are NOT present in the JSON file. The YouTube entry in dom-rules.json only has:

"www.youtube.com": [".ytp-caption-segment"]

This is a regression - the extension's own subtitle-related elements won't be excluded from translation.

Suggestion: Either add the actual class values to the JSON, or create a mechanism to merge runtime constants with JSON rules for YouTube.

2. Performance: Pattern matching on every element

findMatchingSelectors() is called for every element check via isCustomDontWalkIntoElement() and isCustomForceBlockTranslation(). Each call:

Creates a new URL object
Iterates through all 300+ patterns when no exact match

Suggestion: Cache the matched selectors per URL at the start of page translation rather than re-computing per element.

3. Massive duplication in JSON

Twitter/X rules are duplicated 10+ times for different subdomains. Same for Stack Exchange sites.

Suggestion: Consider using wildcard patterns more aggressively:

"*.twitter.com": [...],
"*.x.com": [...],

Or introduce rule inheritance/aliases to reduce the 2200-line JSON file.

4. Missing tests for new pattern matching logic

globToRegex() and matchUrlPattern() are new untested functions with edge cases:

Protocol handling (https:// vs http:// vs none)
Wildcard at different positions (*.domain.com, domain.com/*, **)
URL with query strings and fragments

The existing tests in custom-dont-walk.test.ts and custom-force-block.test.ts use setHost() which only tests hostname matching, not the new URL pattern features.

5. Edge case bug in matchUrlPattern

if (!pattern.startsWith('http://') && !pattern.startsWith('https://')) {
  return regex.test(url.replace(/^https?:\/\//, ''))
}

If pattern doesn't start with protocol but URL does, it strips protocol from URL. But if pattern contains path like github.com/user/*, the generated regex expects the pattern format, not the stripped URL format. This could cause false negatives.

Minor Issues

Comment inconsistency (line 2340-2341):

if (!domRules)
  // Not initialized yet
  return []

Comment on wrong line - should be above or inline.

Empty selector handling: If dontWalkIntoElementSelectorList is empty, joining produces "" which when passed to element.matches("") throws an error. The check if (!dontWalkSelector) handles this, but worth noting.
Some JSON patterns use full URLs with https://:

"https://www.reddit.com/r/*/comments/*/*": [...]

While others use just hostnames. This inconsistency works due to the pattern matching but reduces clarity.

What's Good

Clean interface design (DomRulesConfig)
Proper TypeScript typing with the JSON module
Sensible fallback behavior (empty array when no rules match)
Good changeset documentation

Recommendation

Worth merging with fixes:

Add YouTube constants to JSON (or implement runtime merge)
Add caching for pattern matching results
Add unit tests for globToRegex and matchUrlPattern

The core value (300+ rules) is solid. The implementation needs minor hardening before merge.

ananaBMaster · 2026-02-02T07:27:51Z

can I ask where are these rules originally from?

TianmuTNT and others added 2 commits February 1, 2026 18:50

feat: add 360+ website-specific DOM exclusion rules

146f538

refactor: unify dom-rules.json and built-in rules

2a66606

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 1, 2026

cubewhy changed the title ~~Add 300+ dom rules~~ feat(dom-rules): Add 300+ dom rules Feb 1, 2026

github-actions bot added the feat label Feb 1, 2026

dosubot bot added the app: website Related to website app label Feb 1, 2026

cubic-dev-ai bot reviewed Feb 1, 2026

View reviewed changes

src/utils/constants/dom-rules.ts Show resolved Hide resolved

cubewhy added 2 commits February 1, 2026 20:25

fix: load dom-rules in sync code

f2cbf97

fix: remove the test ruleset

594088d

cubewhy added 2 commits February 2, 2026 11:23

chore: correct comment position

c99864c

fix: remove url protocol before match pattern

01b23f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dom-rules): Add 300+ dom rules #931

feat(dom-rules): Add 300+ dom rules #931

Uh oh!

cubewhy commented Feb 1, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

claude bot commented Feb 1, 2026

Uh oh!

ananaBMaster commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat(dom-rules): Add 300+ dom rules #931

Are you sure you want to change the base?

feat(dom-rules): Add 300+ dom rules #931

Uh oh!

Conversation

cubewhy commented Feb 1, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Changes

Description

Related Issue

How Has This Been Tested?

Screenshots

Checklist

Additional Information

Summary by cubic

Uh oh!

changeset-bot bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot commented Feb 1, 2026

PR Review: feat(dom-rules): Add 300+ dom rules

Overall Assessment

Key Insights

Issues to Address

1. Potential Breaking Change: YouTube rules removed

2. Performance: Pattern matching on every element

3. Massive duplication in JSON

4. Missing tests for new pattern matching logic

5. Edge case bug in matchUrlPattern

Minor Issues

What's Good

Recommendation

Uh oh!

ananaBMaster commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cubewhy commented Feb 1, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Feb 1, 2026 •

edited

Loading