Skip to content

refactor EmailValidator to use precompiled regex and simplify validation logic#7575

Open
fxrstor wants to merge 1 commit intobisq-network:masterfrom
fxrstor:master
Open

refactor EmailValidator to use precompiled regex and simplify validation logic#7575
fxrstor wants to merge 1 commit intobisq-network:masterfrom
fxrstor:master

Conversation

@fxrstor
Copy link
Copy Markdown

@fxrstor fxrstor commented Jan 10, 2026

Description

the previous implementation used ad-hoc split() and ad-hoc quote handling which made the logic hardto follow and errorprone. This change puts use to a precompiled pattern (faster, easier to read), enforces conservative but practical rules (no leading/trailing/consecutive dots in local part, label rules for domain, TLD sanity checks, punycode support), and sets practical length limits (trim + 6..254). The validator intentionally excludes extremely permissive RFC edge cases (quoted local parts, comments, etc.) that commonly break UIs and payment rails.

The validator becomes stricter only on exotic RFC constructs (quoted strings, nested comments). It accepts all commonly used addresses while removing flaky, UI-breaking cases.

Changes

  • Introduced SIMPLE_EMAIL_PATTERN (precompiled pattern) and replaced the previous multi-step string logic.
  • Added explicit TLD checks (min length 2, letters-only or punycode xn--).
  • Replaced instance invalidAddress with a constant INVALID_ADDRESS.
  • Cleaned up input trimming and length constraints.

Related issues

No related issues, just a general improvement.

Summary by CodeRabbit

  • Refactor
    • Optimized email validation logic for improved performance and maintainability while preserving existing validation behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg bot commented Jan 10, 2026

Thanks for opening this pull request!

Please check out our contributor checklist and check if Travis or Codacy found any issues with your PR. Also make sure your commits are signed, and that you applied Bisq's code style and formatting.

A maintainer will add an is:priority label to your PR if it is up for compensation. Please see our Bisq Q1 2020 Update post for more details.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 10, 2026

📝 Walkthrough

Walkthrough

Replaces ad‑hoc local-part/domain parsing in EmailValidator with a conservative precompiled regex, adds centralized null/trim and length checks, simplifies control flow, introduces punycode TLD handling and a static INVALID_ADDRESS constant, and removes legacy quote‑handling logic. (49 words)

Changes

Cohort / File(s) Summary
Email Validation Refactor
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
Replaced manual parsing with a precompiled regex (SIMPLE_EMAIL_PATTERN); introduced INVALID_ADDRESS constant; consolidated validation steps (null/trim, length 6–254, regex structure); added TLD presence/length checks and basic xn-- punycode handling; removed legacy quote-handling helper and simplified control flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through strings and dots,
I nudged the quotes out of plots,
Now regex sings a tidy tune,
Punycode winks beneath the moon,
Validation snug as carrot spots.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring EmailValidator to use precompiled regex and simplify validation logic, which matches the core objective of the PR.
Description check ✅ Passed The description includes a clear explanation of changes, motivation for the refactor, technical details about the new pattern and rules, but does not explicitly reference any issue numbers as suggested by the template.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)

47-54: Local part length limit (64 chars) documented but not enforced.

The documentation at line 30 mentions "max 64 characters" for the local part, but this constraint is not validated. The regex has no length restriction on the local part, and only the total email length (254) is checked. Consider adding explicit validation:

int atIndex = email.indexOf('@');
if (atIndex > 64) return INVALID_ADDRESS;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2fe13d and c4e947a.

📒 Files selected for processing (1)
  • desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (2)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (2)

56-56: Static initialization of localized message may cause issues.

Res.get() is invoked at class-load time. If EmailValidator is loaded before the i18n system (Res) is fully initialized, this could return a wrong or null message. Additionally, if the locale changes at runtime, this cached message won't update.

Consider lazy evaluation by returning the validation result inline:

-    private static final ValidationResult INVALID_ADDRESS = new ValidationResult(false, Res.get("validation.email.invalidAddress"));
...
-        if (input == null) return INVALID_ADDRESS;
+        if (input == null) return new ValidationResult(false, Res.get("validation.email.invalidAddress"));

Alternatively, keep the constant for the boolean and fetch the message dynamically in a helper method.

Please verify whether Res is guaranteed to be initialized before EmailValidator is ever loaded in the application lifecycle.


58-96: Validation logic is sound and well-structured.

The validation flow correctly:

  • Handles null and whitespace
  • Enforces practical length bounds
  • Uses precompiled regex for structure validation
  • Performs additional TLD validation including punycode support

Minor observation: the checks at lines 72-73 are technically redundant since the regex already guarantees at least one dot in the domain part, but they serve as defensive programming and don't hurt.

@fxrstor fxrstor changed the title improve EmailValidator refactor EmailValidator to use precompiled regex and simplify validation logic Jan 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java:
- Around line 47-54: The regex in EmailValidator (SIMPLE_EMAIL_PATTERN) does not
enforce the 64-character max for the local-part; update the pattern so the
local-part is constrained to at most 64 characters (e.g. wrap the existing
local-part segment with a length-limiting group or use a lookahead like
(?=.{1,64}@) so the local part cannot exceed 64 chars) while keeping the
existing local-part label rules and preserving the overall total-length check
logic elsewhere; modify SIMPLE_EMAIL_PATTERN accordingly and run existing unit
tests for validate/email-related methods to ensure no other constraints are
broken.
- Around line 80-93: The TLD punycode check in EmailValidator is case-sensitive;
change the detection to be case-insensitive so "XN--" is treated the same as
"xn--" by checking tld using a case-insensitive test (e.g., compare a lowercased
tld or use a case-insensitive regionMatches) when evaluating the "xn--" prefix,
then keep the existing punycode sanity checks (length and allowed chars) using
the same tld variable; update the conditional that currently uses
tld.startsWith("xn--") to a case-insensitive equivalent and leave the subsequent
character validation logic unchanged.
🧹 Nitpick comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)

23-25: Drop Matcher local variable (and possibly the Matcher import).
You can inline .matcher(email).matches() as shown above; if Matcher isn’t otherwise used, the import can go too.

Also applies to: 67-70

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4e947a and b15c10c.

📒 Files selected for processing (1)
  • desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)

56-56: Verify new ValidationResult(true) matches project conventions/UX.
If callers rely on a specific “valid” message/value object (or InputValidator has a canonical OK result), this could be a subtle behavior change.

Also applies to: 95-95

Comment thread desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java:
- Around line 68-108: Add a new unit test class for EmailValidator that
exercises the public validate(String) method: create tests asserting valid
addresses (normal user@domain, addresses with subdomains, and punycode TLDs
starting with "xn--"), invalid addresses (missing or too-short/too-long TLD,
consecutive dots, local-part too long or zero-length, null input), and edge
boundaries (minimum total length 6, maximum total length 254, local-part lengths
at 1 and 64). Use the ValidationResult returned by validate(...) to assert
validity/invalidity and include cases that confirm the conservative behavior
(reject quoted/exotic RFC forms). Reference EmailValidator.validate and the
ValidationResult class in test names so reviewers can find coverage for the
refactored validator.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b15c10c and bdef0e2.

📒 Files selected for processing (1)
  • desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (11)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (11)

23-25: LGTM! Imports are appropriate for the regex-based refactoring.

The new imports support the precompiled pattern approach and locale-safe string operations.


27-45: Excellent documentation of the conservative validation approach.

The updated comments clearly explain the intentional trade-offs and constraints, which will help future maintainers understand why certain RFC-valid addresses are rejected.


52-56: Regex pattern is well-structured and correct for conservative email validation.

The pattern effectively prevents:

  • Leading, trailing, or consecutive dots in local part
  • Invalid label structures in domain (ensures labels start/end with alphanumeric)
  • Missing TLD (requires at least 2 domain labels)

The inclusion of underscore in the local part is a practical choice for real-world compatibility.


58-58: Good refactoring to use a static constant.

This eliminates repeated allocations and makes the code cleaner.


69-74: Early validation checks are correct and efficient.

The null check, trimming, and length boundaries (6-254) align with practical email address requirements and prevent unnecessary regex processing for obviously invalid inputs.


76-78: Local-part boundary checks are correct.

The check at > 64 correctly enforces the 64-character maximum for the local part (when @ is at index 64, the local part occupies indices 0-63, exactly 64 characters).


80-81: LGTM! Straightforward application of the precompiled pattern.


83-90: TLD extraction and length validation are correct.

The checks at lines 84-85 are redundant given the regex guarantees, but serve as defensive programming, which is acceptable. The 2-character minimum TLD requirement is appropriate for modern email validation.


92-99: Punycode TLD validation correctly supports internationalized domains.

The use of Locale.ROOT for case-insensitive comparison is correct and prevents locale-specific issues. The validation logic appropriately handles punycode TLDs (xn--) with proper length and character constraints.


100-105: ASCII TLD validation is correct.

Requiring letters-only for standard (non-punycode) TLDs aligns with the conservative validation approach and real-world TLD requirements.


107-107: LGTM! Success case handled correctly.

Comment on lines 68 to 108
public ValidationResult validate(String input) {
if (input == null || input.length() < 6 || input.length() > 100) // shortest address is [email protected], max length 100
return invalidAddress;
String[] subStrings;
String local, domain;

subStrings = input.split("@", -1);

if (subStrings.length == 1) // address does not contain '@'
return invalidAddress;
if (subStrings.length > 2) // multiple @'s included -> check for valid double quotes
if (!checkForValidQuotes(subStrings)) // around @'s -> "..@..@.." and concatenate local part
return invalidAddress;
local = subStrings[0];
domain = subStrings[subStrings.length - 1];

if (local.isEmpty())
return invalidAddress;

// local part cannot begin or end with '.'
if (local.startsWith(".") || local.endsWith("."))
return invalidAddress;

String[] splitDomain = domain.split("\\.", -1); // '.' is a regex in java and has to be escaped
String tld = splitDomain[splitDomain.length - 1];
if (splitDomain.length < 2)
return invalidAddress;

if (splitDomain[0] == null || splitDomain[0].isEmpty())
return invalidAddress;

// TLD length is at least two
if (tld.length() < 2)
return invalidAddress;

// TLD is letters only
for (int k = 0; k < tld.length(); k++)
if (!Character.isLetter(tld.charAt(k)))
return invalidAddress;
return new ValidationResult(true);
}

///////////////////////////////////////////////////////////////////////////////////////////
// Private methods
///////////////////////////////////////////////////////////////////////////////////////////

private boolean checkForValidQuotes(String[] subStrings) {
int length = subStrings.length - 2; // is index on last substring of local part

// check for odd number of double quotes before first and after last '@'
if ((subStrings[0].split("\"", -1).length % 2 == 1) || (subStrings[length].split("\"", -1).length % 2 == 1))
return false;
for (int k = 1; k < length; k++) {
if (subStrings[k].split("\"", -1).length % 2 == 0)
return false;
if (input == null) return INVALID_ADDRESS;

String email = input.trim();

// Practical length limits: shortest plausible [email protected] (6), longest allowed by spec 254
if (email.length() < 6 || email.length() > 254) return INVALID_ADDRESS;

// local-part must be 1..64 chars
int at = email.indexOf('@');
if (at <= 0 || at > 64) return INVALID_ADDRESS;

// Apply regex pattern
if (!SIMPLE_EMAIL_PATTERN.matcher(email).matches()) return INVALID_ADDRESS;

// Check TLD (last label after final dot)
int lastDot = email.lastIndexOf('.');
if (lastDot < 0 || lastDot == email.length() - 1) return INVALID_ADDRESS;

String tld = email.substring(lastDot + 1);

// TLD length must be at least 2 characters
if (tld.length() < 2) return INVALID_ADDRESS;

// allow punycode TLDs for IDN support
if (tld.toLowerCase(Locale.ROOT).startsWith("xn--")) {
// basic sanity check for punycode TLD
if (tld.length() > 63) return INVALID_ADDRESS;
for (int i = 4; i < tld.length(); i++) {
char c = tld.charAt(i);
if (!(Character.isLetterOrDigit(c) || c == '-')) return INVALID_ADDRESS;
}
} else {
// ASCII letters only
for (int i = 0; i < tld.length(); i++) {
if (!Character.isLetter(tld.charAt(i))) return INVALID_ADDRESS;
}
}

String patchLocal = "";
for (int k = 0; k <= length; k++) // remember: length is last index not array length
patchLocal = patchLocal.concat(subStrings[k]); // @'s are not reinstalled, since not needed for further checks
subStrings[0] = patchLocal;
return true;
return new ValidationResult(true);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find test files for EmailValidator
echo "=== Searching for EmailValidator tests ==="
find . -type f \( -name "*Test.java" -o -name "*Tests.java" -o -name "Test*.java" \) | xargs rg -l "EmailValidator" 2>/dev/null || echo "No test files found"

echo -e "\n=== Checking EmailValidator location and structure ==="
fd -e java EmailValidator

echo -e "\n=== Checking for usage of EmailValidator in codebase ==="
rg -n "EmailValidator" --type java -C 2 | head -50

echo -e "\n=== Looking for the SIMPLE_EMAIL_PATTERN definition ==="
rg -n "SIMPLE_EMAIL_PATTERN" desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java

Repository: bisq-network/bisq

Length of output: 5830


🏁 Script executed:

# Get the full file to understand the complete validation context
echo "=== Full EmailValidator file ==="
wc -l desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java

Repository: bisq-network/bisq

Length of output: 168


🏁 Script executed:

# Get the full EmailValidator file
echo "=== Full EmailValidator.java content ==="
cat -n desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java

echo -e "\n=== Checking for tests in payment form test files ==="
find desktop/src/test -name "*FormTest*.java" -o -name "*Form*Test.java" | head -10

echo -e "\n=== Looking for any test that might use EmailValidator indirectly ==="
rg -l "CashDepositForm|MoneyGramForm|PaxumForm|PayseraForm" desktop/src/test --type java

Repository: bisq-network/bisq

Length of output: 5367


🏁 Script executed:

# Check if there are integration or form validation tests
echo "=== Checking for validation-related tests ==="
ls -la desktop/src/test/java/bisq/desktop/util/validation/ 2>/dev/null || echo "Directory not found"

echo -e "\n=== All test files in validation directory ==="
find desktop/src/test -path "*/validation/*" -name "*Test*.java" 2>/dev/null

Repository: bisq-network/bisq

Length of output: 2377


Add test coverage for the refactored EmailValidator.

No test file exists for EmailValidator (unlike other validators in the same directory). Since this validator is actively used in payment forms (CashDepositForm, MoneyGramForm, PaxumForm, PayseraForm) to validate user email input, add a test suite covering:

  • Valid addresses: standard formats, subdomains, international domains with punycode (xn--)
  • Invalid addresses: missing/invalid TLD, consecutive dots, wrong local-part length, non-ASCII TLD characters
  • Edge cases: minimum valid length (6 chars), maximum length (254 chars), boundary local-part lengths

The implementation correctly enforces a conservative validation approach by design (intentionally excluding quoted strings and exotic RFC constructs); test coverage ensures this behavior is maintained and edge cases are handled correctly.

🤖 Prompt for AI Agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
around lines 68 - 108, Add a new unit test class for EmailValidator that
exercises the public validate(String) method: create tests asserting valid
addresses (normal user@domain, addresses with subdomains, and punycode TLDs
starting with "xn--"), invalid addresses (missing or too-short/too-long TLD,
consecutive dots, local-part too long or zero-length, null input), and edge
boundaries (minimum total length 6, maximum total length 254, local-part lengths
at 1 and 64). Use the ValidationResult returned by validate(...) to assert
validity/invalidity and include cases that confirm the conservative behavior
(reject quoted/exotic RFC forms). Reference EmailValidator.validate and the
ValidationResult class in test names so reviewers can find coverage for the
refactored validator.

@HenrikJannsen
Copy link
Copy Markdown
Collaborator

Would you mind to add a test class as well?

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants