refactor EmailValidator to use precompiled regex and simplify validation logic#7575
refactor EmailValidator to use precompiled regex and simplify validation logic#7575fxrstor wants to merge 1 commit intobisq-network:masterfrom
Conversation
|
Thanks for opening this pull request! |
📝 WalkthroughWalkthroughReplaces ad‑hoc local-part/domain parsing in EmailValidator with a conservative precompiled regex, adds centralized null/trim and length checks, simplifies control flow, introduces punycode TLD handling and a static INVALID_ADDRESS constant, and removes legacy quote‑handling logic. (49 words) Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)
47-54: Local part length limit (64 chars) documented but not enforced.The documentation at line 30 mentions "max 64 characters" for the local part, but this constraint is not validated. The regex has no length restriction on the local part, and only the total email length (254) is checked. Consider adding explicit validation:
int atIndex = email.indexOf('@'); if (atIndex > 64) return INVALID_ADDRESS;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (2)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (2)
56-56: Static initialization of localized message may cause issues.
Res.get()is invoked at class-load time. IfEmailValidatoris loaded before the i18n system (Res) is fully initialized, this could return a wrong or null message. Additionally, if the locale changes at runtime, this cached message won't update.Consider lazy evaluation by returning the validation result inline:
- private static final ValidationResult INVALID_ADDRESS = new ValidationResult(false, Res.get("validation.email.invalidAddress")); ... - if (input == null) return INVALID_ADDRESS; + if (input == null) return new ValidationResult(false, Res.get("validation.email.invalidAddress"));Alternatively, keep the constant for the boolean and fetch the message dynamically in a helper method.
Please verify whether
Resis guaranteed to be initialized beforeEmailValidatoris ever loaded in the application lifecycle.
58-96: Validation logic is sound and well-structured.The validation flow correctly:
- Handles null and whitespace
- Enforces practical length bounds
- Uses precompiled regex for structure validation
- Performs additional TLD validation including punycode support
Minor observation: the checks at lines 72-73 are technically redundant since the regex already guarantees at least one dot in the domain part, but they serve as defensive programming and don't hurt.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java:
- Around line 47-54: The regex in EmailValidator (SIMPLE_EMAIL_PATTERN) does not
enforce the 64-character max for the local-part; update the pattern so the
local-part is constrained to at most 64 characters (e.g. wrap the existing
local-part segment with a length-limiting group or use a lookahead like
(?=.{1,64}@) so the local part cannot exceed 64 chars) while keeping the
existing local-part label rules and preserving the overall total-length check
logic elsewhere; modify SIMPLE_EMAIL_PATTERN accordingly and run existing unit
tests for validate/email-related methods to ensure no other constraints are
broken.
- Around line 80-93: The TLD punycode check in EmailValidator is case-sensitive;
change the detection to be case-insensitive so "XN--" is treated the same as
"xn--" by checking tld using a case-insensitive test (e.g., compare a lowercased
tld or use a case-insensitive regionMatches) when evaluating the "xn--" prefix,
then keep the existing punycode sanity checks (length and allowed chars) using
the same tld variable; update the conditional that currently uses
tld.startsWith("xn--") to a case-insensitive equivalent and leave the subsequent
character validation logic unchanged.
🧹 Nitpick comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)
23-25: DropMatcherlocal variable (and possibly theMatcherimport).
You can inline.matcher(email).matches()as shown above; ifMatcherisn’t otherwise used, the import can go too.Also applies to: 67-70
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (1)
56-56: Verifynew ValidationResult(true)matches project conventions/UX.
If callers rely on a specific “valid” message/value object (orInputValidatorhas a canonical OK result), this could be a subtle behavior change.Also applies to: 95-95
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java:
- Around line 68-108: Add a new unit test class for EmailValidator that
exercises the public validate(String) method: create tests asserting valid
addresses (normal user@domain, addresses with subdomains, and punycode TLDs
starting with "xn--"), invalid addresses (missing or too-short/too-long TLD,
consecutive dots, local-part too long or zero-length, null input), and edge
boundaries (minimum total length 6, maximum total length 254, local-part lengths
at 1 and 64). Use the ValidationResult returned by validate(...) to assert
validity/invalidity and include cases that confirm the conservative behavior
(reject quoted/exotic RFC forms). Reference EmailValidator.validate and the
ValidationResult class in test names so reviewers can find coverage for the
refactored validator.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
🔇 Additional comments (11)
desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java (11)
23-25: LGTM! Imports are appropriate for the regex-based refactoring.The new imports support the precompiled pattern approach and locale-safe string operations.
27-45: Excellent documentation of the conservative validation approach.The updated comments clearly explain the intentional trade-offs and constraints, which will help future maintainers understand why certain RFC-valid addresses are rejected.
52-56: Regex pattern is well-structured and correct for conservative email validation.The pattern effectively prevents:
- Leading, trailing, or consecutive dots in local part
- Invalid label structures in domain (ensures labels start/end with alphanumeric)
- Missing TLD (requires at least 2 domain labels)
The inclusion of underscore in the local part is a practical choice for real-world compatibility.
58-58: Good refactoring to use a static constant.This eliminates repeated allocations and makes the code cleaner.
69-74: Early validation checks are correct and efficient.The null check, trimming, and length boundaries (6-254) align with practical email address requirements and prevent unnecessary regex processing for obviously invalid inputs.
76-78: Local-part boundary checks are correct.The check
at > 64correctly enforces the 64-character maximum for the local part (when @ is at index 64, the local part occupies indices 0-63, exactly 64 characters).
80-81: LGTM! Straightforward application of the precompiled pattern.
83-90: TLD extraction and length validation are correct.The checks at lines 84-85 are redundant given the regex guarantees, but serve as defensive programming, which is acceptable. The 2-character minimum TLD requirement is appropriate for modern email validation.
92-99: Punycode TLD validation correctly supports internationalized domains.The use of
Locale.ROOTfor case-insensitive comparison is correct and prevents locale-specific issues. The validation logic appropriately handles punycode TLDs (xn--) with proper length and character constraints.
100-105: ASCII TLD validation is correct.Requiring letters-only for standard (non-punycode) TLDs aligns with the conservative validation approach and real-world TLD requirements.
107-107: LGTM! Success case handled correctly.
| public ValidationResult validate(String input) { | ||
| if (input == null || input.length() < 6 || input.length() > 100) // shortest address is [email protected], max length 100 | ||
| return invalidAddress; | ||
| String[] subStrings; | ||
| String local, domain; | ||
|
|
||
| subStrings = input.split("@", -1); | ||
|
|
||
| if (subStrings.length == 1) // address does not contain '@' | ||
| return invalidAddress; | ||
| if (subStrings.length > 2) // multiple @'s included -> check for valid double quotes | ||
| if (!checkForValidQuotes(subStrings)) // around @'s -> "..@..@.." and concatenate local part | ||
| return invalidAddress; | ||
| local = subStrings[0]; | ||
| domain = subStrings[subStrings.length - 1]; | ||
|
|
||
| if (local.isEmpty()) | ||
| return invalidAddress; | ||
|
|
||
| // local part cannot begin or end with '.' | ||
| if (local.startsWith(".") || local.endsWith(".")) | ||
| return invalidAddress; | ||
|
|
||
| String[] splitDomain = domain.split("\\.", -1); // '.' is a regex in java and has to be escaped | ||
| String tld = splitDomain[splitDomain.length - 1]; | ||
| if (splitDomain.length < 2) | ||
| return invalidAddress; | ||
|
|
||
| if (splitDomain[0] == null || splitDomain[0].isEmpty()) | ||
| return invalidAddress; | ||
|
|
||
| // TLD length is at least two | ||
| if (tld.length() < 2) | ||
| return invalidAddress; | ||
|
|
||
| // TLD is letters only | ||
| for (int k = 0; k < tld.length(); k++) | ||
| if (!Character.isLetter(tld.charAt(k))) | ||
| return invalidAddress; | ||
| return new ValidationResult(true); | ||
| } | ||
|
|
||
| /////////////////////////////////////////////////////////////////////////////////////////// | ||
| // Private methods | ||
| /////////////////////////////////////////////////////////////////////////////////////////// | ||
|
|
||
| private boolean checkForValidQuotes(String[] subStrings) { | ||
| int length = subStrings.length - 2; // is index on last substring of local part | ||
|
|
||
| // check for odd number of double quotes before first and after last '@' | ||
| if ((subStrings[0].split("\"", -1).length % 2 == 1) || (subStrings[length].split("\"", -1).length % 2 == 1)) | ||
| return false; | ||
| for (int k = 1; k < length; k++) { | ||
| if (subStrings[k].split("\"", -1).length % 2 == 0) | ||
| return false; | ||
| if (input == null) return INVALID_ADDRESS; | ||
|
|
||
| String email = input.trim(); | ||
|
|
||
| // Practical length limits: shortest plausible [email protected] (6), longest allowed by spec 254 | ||
| if (email.length() < 6 || email.length() > 254) return INVALID_ADDRESS; | ||
|
|
||
| // local-part must be 1..64 chars | ||
| int at = email.indexOf('@'); | ||
| if (at <= 0 || at > 64) return INVALID_ADDRESS; | ||
|
|
||
| // Apply regex pattern | ||
| if (!SIMPLE_EMAIL_PATTERN.matcher(email).matches()) return INVALID_ADDRESS; | ||
|
|
||
| // Check TLD (last label after final dot) | ||
| int lastDot = email.lastIndexOf('.'); | ||
| if (lastDot < 0 || lastDot == email.length() - 1) return INVALID_ADDRESS; | ||
|
|
||
| String tld = email.substring(lastDot + 1); | ||
|
|
||
| // TLD length must be at least 2 characters | ||
| if (tld.length() < 2) return INVALID_ADDRESS; | ||
|
|
||
| // allow punycode TLDs for IDN support | ||
| if (tld.toLowerCase(Locale.ROOT).startsWith("xn--")) { | ||
| // basic sanity check for punycode TLD | ||
| if (tld.length() > 63) return INVALID_ADDRESS; | ||
| for (int i = 4; i < tld.length(); i++) { | ||
| char c = tld.charAt(i); | ||
| if (!(Character.isLetterOrDigit(c) || c == '-')) return INVALID_ADDRESS; | ||
| } | ||
| } else { | ||
| // ASCII letters only | ||
| for (int i = 0; i < tld.length(); i++) { | ||
| if (!Character.isLetter(tld.charAt(i))) return INVALID_ADDRESS; | ||
| } | ||
| } | ||
|
|
||
| String patchLocal = ""; | ||
| for (int k = 0; k <= length; k++) // remember: length is last index not array length | ||
| patchLocal = patchLocal.concat(subStrings[k]); // @'s are not reinstalled, since not needed for further checks | ||
| subStrings[0] = patchLocal; | ||
| return true; | ||
| return new ValidationResult(true); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find test files for EmailValidator
echo "=== Searching for EmailValidator tests ==="
find . -type f \( -name "*Test.java" -o -name "*Tests.java" -o -name "Test*.java" \) | xargs rg -l "EmailValidator" 2>/dev/null || echo "No test files found"
echo -e "\n=== Checking EmailValidator location and structure ==="
fd -e java EmailValidator
echo -e "\n=== Checking for usage of EmailValidator in codebase ==="
rg -n "EmailValidator" --type java -C 2 | head -50
echo -e "\n=== Looking for the SIMPLE_EMAIL_PATTERN definition ==="
rg -n "SIMPLE_EMAIL_PATTERN" desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.javaRepository: bisq-network/bisq
Length of output: 5830
🏁 Script executed:
# Get the full file to understand the complete validation context
echo "=== Full EmailValidator file ==="
wc -l desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.javaRepository: bisq-network/bisq
Length of output: 168
🏁 Script executed:
# Get the full EmailValidator file
echo "=== Full EmailValidator.java content ==="
cat -n desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
echo -e "\n=== Checking for tests in payment form test files ==="
find desktop/src/test -name "*FormTest*.java" -o -name "*Form*Test.java" | head -10
echo -e "\n=== Looking for any test that might use EmailValidator indirectly ==="
rg -l "CashDepositForm|MoneyGramForm|PaxumForm|PayseraForm" desktop/src/test --type javaRepository: bisq-network/bisq
Length of output: 5367
🏁 Script executed:
# Check if there are integration or form validation tests
echo "=== Checking for validation-related tests ==="
ls -la desktop/src/test/java/bisq/desktop/util/validation/ 2>/dev/null || echo "Directory not found"
echo -e "\n=== All test files in validation directory ==="
find desktop/src/test -path "*/validation/*" -name "*Test*.java" 2>/dev/nullRepository: bisq-network/bisq
Length of output: 2377
Add test coverage for the refactored EmailValidator.
No test file exists for EmailValidator (unlike other validators in the same directory). Since this validator is actively used in payment forms (CashDepositForm, MoneyGramForm, PaxumForm, PayseraForm) to validate user email input, add a test suite covering:
- Valid addresses: standard formats, subdomains, international domains with punycode (xn--)
- Invalid addresses: missing/invalid TLD, consecutive dots, wrong local-part length, non-ASCII TLD characters
- Edge cases: minimum valid length (6 chars), maximum length (254 chars), boundary local-part lengths
The implementation correctly enforces a conservative validation approach by design (intentionally excluding quoted strings and exotic RFC constructs); test coverage ensures this behavior is maintained and edge cases are handled correctly.
🤖 Prompt for AI Agents
In @desktop/src/main/java/bisq/desktop/util/validation/EmailValidator.java
around lines 68 - 108, Add a new unit test class for EmailValidator that
exercises the public validate(String) method: create tests asserting valid
addresses (normal user@domain, addresses with subdomains, and punycode TLDs
starting with "xn--"), invalid addresses (missing or too-short/too-long TLD,
consecutive dots, local-part too long or zero-length, null input), and edge
boundaries (minimum total length 6, maximum total length 254, local-part lengths
at 1 and 64). Use the ValidationResult returned by validate(...) to assert
validity/invalidity and include cases that confirm the conservative behavior
(reject quoted/exotic RFC forms). Reference EmailValidator.validate and the
ValidationResult class in test names so reviewers can find coverage for the
refactored validator.
|
Would you mind to add a test class as well? |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Description
the previous implementation used ad-hoc split() and ad-hoc quote handling which made the logic hardto follow and errorprone. This change puts use to a precompiled pattern (faster, easier to read), enforces conservative but practical rules (no leading/trailing/consecutive dots in local part, label rules for domain, TLD sanity checks, punycode support), and sets practical length limits (trim + 6..254). The validator intentionally excludes extremely permissive RFC edge cases (quoted local parts, comments, etc.) that commonly break UIs and payment rails.
The validator becomes stricter only on exotic RFC constructs (quoted strings, nested comments). It accepts all commonly used addresses while removing flaky, UI-breaking cases.
Changes
Related issues
No related issues, just a general improvement.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.