Conversation
…ation-for-sp1 chore: update sp1-verifier version to 5.0.0
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
Cargo.toml(3 hunks)package.json(1 hunks)
🔇 Additional comments (5)
package.json (2)
13-13: Clarify the pre-release version format.The version suffix
-9is unconventional. Pre-release versions typically follow semantic versioning conventions like0.4.66-rc.1,0.4.66-beta.9, or0.4.66-alpha.1.Is
0.4.66-9intentional, or should this be0.4.66-rc.1or similar?
4-12: Verify alignment between scripts and PR scope.The PR introduces a Vite-based browser test app, but the scripts section shows no new build or dev targets for it. Similarly, the AI summary mentions "Vite-based browser test harness," yet no Vite-related devDependencies are visible in the provided snippet.
Verify that:
- Required devDependencies (Vite, build tools for the browser app) are declared elsewhere (e.g., workspace dependencies, root
package.jsonin a monorepo).- Build/dev scripts for the browser app are intentionally omitted or defined in a separate workspace package.
Cargo.toml (3)
3-3: Version bump rationale.The version has been bumped from 0.4.62-12 to 0.4.62-17 (a 5-patch increment in prerelease). Ensure this aligns with the scope of changes (new regex module, browser test app, circuit refactoring) and follows your project's versioning strategy.
20-20: I need to dig deeper to understand the relationship between these two dependencies.Let me check if the external
regexcrate is actually imported anywhere in the codebase:Based on my verification, I have sufficient information to generate a rewritten comment. The key findings are:
- fancy_regex is confirmed to be used: The script found
use fancy_regex::Regex;insrc/regex/extract.rs- regex crate dependency: fancy-regex internally delegates to an inner NFA implementation using the regex crate, which means fancy-regex depends on regex as a transitive dependency
- No direct imports of external regex crate found: The search results showed only references to
crate::regex::(a local module), notuse regex::(external crate)The review comment's concern about dual dependencies is actually valid and warrants clarification - the
regexcrate is not directly used in the codebase but is an indirect dependency offancy-regex.Dual regex dependencies are justified—
regexis a transitive dependency offancy-regex.fancy-regex uses a hybrid approach where a backtracking virtual machine delegates to the regex crate for simple pattern matching. The
regexdependency is not directly used in the codebase; rather, it's pulled in as a transitive dependency throughfancy_regex. This is intentional and necessary for fancy-regex's functionality. The dual dependency can be left as-is or clarified in a comment.
53-53: No API compatibility issues found with sp1-verifier 5.0.0.The code uses
Groth16Verifier::verify()with the signature matching the v5.0.0 API, and theGROTH16_VK_BYTESconstant is available. The single usage insrc/wasm.rsis fully compatible. Thedefault-features = falseconfiguration is intentional and correct for WASM builds.
| serde = { version = "1.0.159", features = ["derive"] } | ||
| zk-regex-apis = { git = "https://github.com/zkemail/zk-regex.git" } | ||
| zk-regex-compiler = { git = "https://github.com/zkemail/zk-regex.git" } | ||
| zk-regex-compiler = { git = "https://github.com/zkemail/zk-regex.git", rev = "35014c2bff69115bb92be2dc2c9b006fa69a6649" } |
There was a problem hiding this comment.
🧩 Analysis chain
Verify zk-regex-compiler revision is valid and compatible.
The dependency is pinned to a specific commit hash. Ensure this revision exists, is reachable, and doesn't introduce security or stability issues.
🏁 Script executed:
#!/bin/bash
# Verify the zk-regex-compiler rev is valid and the git repository is accessible
git ls-remote https://github.com/zkemail/zk-regex.git 35014c2bff69115bb92be2dc2c9b006fa69a6649Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Alternative verification: fetch repository info and check for commit
# Try with --heads and --tags to see all refs
git ls-remote --heads --tags https://github.com/zkemail/zk-regex.git | head -50
Length of output: 3471
🌐 Web query:
zkemail zk-regex commit 35014c2bff69115bb92be2dc2c9b006fa69a6649
💡 Result:
I couldn't find that exact commit hash in public zk-email/zk-regex sources. I can search GitHub (or the specific repo) for that commit and return the commit message / diff — do you want me to do that? (If yes, tell me which repo or allow me to search the zkemail org.) [1][2]
Sources:
[1] zkemail GitHub org.
[2] zk-regex documentation.
🏁 Script executed:
#!/bin/bash
# Direct attempt to fetch the specific commit
git fetch --depth=1 https://github.com/zkemail/zk-regex.git 35014c2bff69115bb92be2dc2c9b006fa69a6649 2>&1 || echo "Fetch failed or commit not found"Length of output: 101
Fix zk-regex-compiler revision: commit does not exist in repository.
The pinned commit 35014c2bff69115bb92be2dc2c9b006fa69a6649 is not found in the zkemail/zk-regex repository. Cargo will fail to resolve this dependency during build. Use a valid commit hash or tag (e.g., tag 2.1.0 which is publicly available).
🤖 Prompt for AI Agents
In Cargo.toml around line 16, the dependency zk-regex-compiler pins a
non-existent commit hash; replace the rev value with a valid reference (either a
real commit hash from the zkemail/zk-regex repo or a published tag such as
"2.1.0") by updating the dependency to use rev = "<valid-hash>" or tag = "2.1.0"
(or use git = ... and branch = "<branch>" if preferred), then run cargo update
or cargo fetch and verify cargo build succeeds.
…neration chore: updated zk-regex version
[REG-590][REG-588] fix: generate partial body hash when the body verification is enabled
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/parse_email.rs (1)
261-293: Align fallback logic betweenget_command()andget_command_idxes().There's an inconsistency between these two methods:
get_command()(lines 261-282) has a fallback strategy: it first triescanonicalized_body(line 266), and if that fails, falls back tocleaned_body(line 272).get_command_idxes()(lines 285-293) only searches incleaned_body(line 290), with no fallback.This inconsistency can lead to bugs:
- If the command exists in
canonicalized_bodybut notcleaned_body,get_command()succeeds whileget_command_idxes()fails.- If the command exists in both bodies at different positions, the indices returned by
get_command_idxes()won't match the substring returned byget_command().Consider one of these solutions:
Option 1: Make
get_command_idxes()match the fallback logic:pub fn get_command_idxes(&self, ignore_body_hash_check: bool) -> Result<(usize, usize)> { let regex_config = serde_json::from_str(include_str!("../regexes/command.json"))?; if ignore_body_hash_check { Ok((0, 0)) } else { - let idxes = extract_substr_idxes(&self.cleaned_body, ®ex_config, None, false)?[0]; - Ok(idxes) + match extract_substr_idxes(&self.canonicalized_body, ®ex_config, None, false) { + Ok(idxes) => Ok(idxes[0]), + Err(_) => { + let idxes = extract_substr_idxes(&self.cleaned_body, ®ex_config, None, false)?[0]; + Ok(idxes) + } + } } }Option 2: Simplify both methods to only use
cleaned_body(ifcanonicalized_bodyfallback is unnecessary).
♻️ Duplicate comments (4)
browser_test/src/noirProver.ts (1)
270-282: Critical: Hex decoding corrupts bytes with leading zeros.The
replace(/^0+/, "")on line 272 strips all leading zeros, corrupting bytes like0x0a(newline) into"a"or0x00into"", producing odd-length hex strings that cannot decode properly. Control characters and null bytes in public outputs will be silently mangled.Apply this diff to fix the decoding:
function toUtf8(hex: string): string { - // Remove '0x' prefix and leading zeros - const cleanHex = hex.slice(2).replace(/^0+/, ""); + // Remove '0x' prefix + let cleanHex = hex.startsWith("0x") ? hex.slice(2) : hex; + + // Ensure even length by prepending '0' if needed + if (cleanHex.length % 2 !== 0) { + cleanHex = "0" + cleanHex; + } // Convert the hex to a Uint8Array const bytes = new Uint8Array(cleanHex.length / 2); for (let i = 0; i < cleanHex.length; i += 2) { bytes[i / 2] = parseInt(cleanHex.substring(i, i + 2), 16); } // Use TextDecoder to convert to UTF-8 return new TextDecoder().decode(bytes); }src/circuit/noir/mod.rs (3)
26-33: Security: Reconsider defaulting body hash check to skip.Defaulting
ignore_body_hash_checktotruebypasses a critical email authentication mechanism, potentially allowing body tampering to go undetected. Consider defaulting tofalseto enforce verification by default.Apply this diff:
- ParsedEmail::new_from_raw_email(email, params.ignore_body_hash_check.unwrap_or(true)) + ParsedEmail::new_from_raw_email(email, params.ignore_body_hash_check.unwrap_or(false))
158-164: Replace unwrap with error propagation.Line 162 uses
unwrap()onprecomputed_sha, which will panic if it'sNone. Since the function returnsResult, propagate the error properly.Apply this diff:
// Always generate partial_body_hash when body verification is enabled if params.ignore_body_hash_check.is_some_and(|x| !x) { noir_circuit_input.partial_body_real_length = Some(parsed_email.canonicalized_body.len()); - let partial_hash = u8_to_u32(email_circuit_inputs.precomputed_sha.unwrap().as_slice())?; + let partial_hash = u8_to_u32( + email_circuit_inputs + .precomputed_sha + .as_ref() + .ok_or_else(|| anyhow::anyhow!("Precomputed SHA is missing"))? + .as_slice() + )?; noir_circuit_input.partial_body_hash = Some(partial_hash); }
290-295: Add null checks before unwrapping body references.Lines 292 and 294 use
unwrap()without verification. Ifremove_soft_line_breaksis enabled butdecoded_bodyisNone, or if body processing was skipped, these will panic.Apply this diff:
HaystackLocation::Body => { let body = if params.remove_soft_line_breaks { - noir_circuit_input.decoded_body.as_ref().unwrap() + noir_circuit_input.decoded_body.as_ref() + .ok_or_else(|| anyhow::anyhow!("Decoded body is missing when remove_soft_line_breaks is enabled"))? } else { - noir_circuit_input.body.as_ref().unwrap() + noir_circuit_input.body.as_ref() + .ok_or_else(|| anyhow::anyhow!("Body is missing when required for regex matching"))? };
🧹 Nitpick comments (2)
browser_test/src/noirProver.ts (1)
73-80: Consider using nullish coalescing for legacy field fallback.While the current
||operator works for truthy values, using nullish coalescing (??) would be more precise for handlingnull/undefinedspecifically, avoiding issues with falsy-but-valid values like empty strings.Apply this diff:
parts: dr.parts.map((p) => ({ // @ts-ignore is_public: p.isPublic || !!p.is_public, // @ts-ignore - regex_def: p.regexDef || p.regex_def, + regex_def: p.regexDef ?? p.regex_def, // @ts-ignore - ...((p.isPublic || p.is_public) && { maxLength: p.maxLength || p.max_length || 64 }), + ...((p.isPublic ?? p.is_public) && { maxLength: p.maxLength ?? p.max_length ?? 64 }), })),src/circuit/noir/mod.rs (1)
174-180: Address the TODO: handle missing selector gracefully.The TODO comment indicates this will fail if the selector is not found in the body (e.g., when selector doesn't account for soft line breaks). Consider adding explicit error handling or validation to provide a clearer error message.
Would you like me to help implement proper error handling for the missing selector case?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
browser_test/bun.lockis excluded by!**/*.lockyarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (8)
browser_test/package.json(1 hunks)browser_test/src/noirProver.ts(1 hunks)package.json(1 hunks)src/circuit/mod.rs(1 hunks)src/circuit/noir/mod.rs(1 hunks)src/circuit/noir/structs.rs(1 hunks)src/cryptos.rs(10 hunks)src/parse_email.rs(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- package.json
- browser_test/package.json
🧰 Additional context used
🧬 Code graph analysis (6)
browser_test/src/noirProver.ts (1)
src/wasm.rs (2)
parseEmail(51-77)generateNoirCircuitInputsWithRegexesAndExternalInputs(605-652)
src/circuit/noir/mod.rs (5)
src/parse_email.rs (2)
remove_quoted_printable_soft_breaks(323-345)new_from_raw_email(49-75)src/converters.rs (2)
string_to_circom_bigint_bytes(463-483)vec_u8_to_bigint(265-269)src/circuit/circom.rs (2)
compute_signal_length(577-579)clone(122-139)src/circuit/mod.rs (2)
generate_circuit_inputs(134-192)new(98-113)src/circuit/noir/utils.rs (6)
bn_to_limb_str_array(101-134)bn_to_redc_limb_str_array(161-164)get_header_sequence(195-228)u8_to_u32(304-317)get_address_header_sequence(248-290)trim_sha256_padding(319-350)
src/circuit/mod.rs (4)
src/cryptos.rs (5)
generate_partial_sha(494-557)sha256_pad(426-462)new(59-61)new(260-262)new(335-343)src/parse_email.rs (1)
remove_quoted_printable_soft_breaks(323-345)src/converters.rs (1)
to_circom_bigint_bytes(254-256)src/circuit/utils.rs (1)
get_adjusted_selector(66-87)
src/cryptos.rs (3)
src/regex/padding.rs (1)
pad_string(18-22)src/circuit/mod.rs (1)
new(98-113)src/converters.rs (2)
hex_to_field(20-67)field_to_hex(76-79)
src/parse_email.rs (2)
src/regex/patterns.rs (8)
extract_body_hash_idxes(106-108)extract_email_addr_idxes(81-83)extract_email_domain_idxes(86-88)extract_from_addr_idxes(91-93)extract_message_id_idxes(116-118)extract_subject_all_idxes(101-103)extract_timestamp_idxes(111-113)extract_to_addr_idxes(96-98)src/regex/extract.rs (1)
extract_substr_idxes(188-201)
src/circuit/noir/structs.rs (2)
src/cryptos.rs (1)
deserialize(183-226)src/circuit/circom.rs (1)
deserialize(155-161)
🔇 Additional comments (25)
src/parse_email.rs (3)
6-10: LGTM!The refactoring from external
zk_regex_apisto the localcrate::regexmodule is clean and aligns with the PR's goal of introducing a new internal regex runtime.
209-210: LGTM!The API call updates correctly pass
Nonefor the newnfa_graphparameter, which aligns with the updatedextract_substr_idxessignature.Also applies to: 214-214, 227-228, 231-231
340-342: LGTM!The refactoring to use
resize()for padding is cleaner and more idiomatic than manual loop-based padding. Usingusize::MAXas a sentinel value for padded positions is well-documented in the function comments and aligns with the new padding strategy.src/cryptos.rs (8)
3-3: LGTM! Good refactoring to use centralized padding utility.The import of
pad_stringfrom the new regex module improves code reusability and aligns with the PR's objective to consolidate padding logic.
111-127: Excellent security improvement! Validation prevents truncation collisions.The change to return
Result<Self>with explicit length validation is a critical security enhancement. By rejecting email addresses exceeding 256 bytes upfront, this prevents scenarios where different email addresses could produce identical padded representations due to silent truncation.The error handling is clear and the documentation properly explains the security rationale.
505-505: Good improvement! Error propagation instead of panicking.Replacing
.unwrap()with?operator for bothRegex::new()andString::from_utf8()provides better error handling by propagating errors to the caller rather than panicking at runtime.Also applies to: 515-515
800-816: LGTM! Proper error propagation throughout the function.The function signature change to return
Result<String>is consistent with the updatedPaddedEmailAddr::from_email_addrsignature. All error propagation using?operator is correct, and the function now properly handles errors from:
- Email address validation
- Hex-to-field conversions
- Account salt generation
Note: This is a breaking change to the public API.
833-833: Minor style improvement.Using
.or_else()for Option chaining is slightly more idiomatic than.unwrap_or_else()in this context, though both work correctly here.
601-697: Excellent test coverage for the new validation logic!The test suite comprehensively covers:
- Valid email address padding and verification
- Boundary condition (exactly 256 bytes - should pass)
- Over-limit rejection (257 bytes - should fail)
- Collision prevention security scenarios
- Long but valid emails under the limit
- Integration with
calculate_account_saltfor both success and error pathsThe tests are well-structured with clear assertions and descriptive error messages.
38-38: DKIM API endpoint is operational and returns valid data.The new endpoint
https://archive.zk.email/api/key/domainsuccessfully returns DKIM key records in the expected JSON format with proper structure (domain, selector, firstSeenAt, lastSeenAt, value fields). The change is verified as live and compatible.
532-532: Verify circuit-level SHA padding handling with circuit implementation team.The comment update at line 549 asserts that "the circuit handles SHA padding", allowing the remaining body to be padded to
max_remaining_body_lengthwith zeros rather than requiring it to be a multiple of 64 bytes. However, the verification script did not locate the circuit's actual SHA padding implementation (e.g.,partial_sha256_var_end) or corresponding tests that validate this behavior with variable-length inputs.Action required: Confirm with the circuit development team that the circuit implementation correctly handles variable-length remaining body inputs (i.e., not strictly multiples of 64) before this change is merged. The codebase reference alone is insufficient to verify this assumption.
src/circuit/mod.rs (3)
1-17: LGTM! Clean module organization.The module structure and imports are well-organized, with appropriate re-exports for public APIs.
19-75: LGTM! Well-designed data structures.The separation between public API structs (CircuitParams, CircuitOptions) and internal implementation structs (CircuitInput) provides good encapsulation. Serde annotations are correctly applied for JSON serialization.
77-114: LGTM! Secure defaults applied.The constructor correctly applies sensible defaults, including defaulting
ignore_body_hash_checktofalseto enforce body hash verification by default.browser_test/src/noirProver.ts (4)
1-15: LGTM! Clean SDK initialization.The imports and SDK configuration are well-structured, with helpful commented alternatives for different environments.
85-147: LGTM! Robust circuit execution flow.The Noir circuit execution is well-structured with appropriate error handling and data transformations (Map to object conversion for circuit inputs).
167-178: LGTM! Clean email fetching with error handling.The function properly handles fetch errors and returns the email content as text.
180-198: LGTM! Proper validation of required external inputs.The function correctly ensures all required external inputs are provided and augments them with maxLength constraints.
src/circuit/noir/structs.rs (3)
4-11: LGTM! Clean custom deserializer.The empty string to None deserializer is correctly implemented using
filter, providing clean handling of empty optional strings.
13-62: LGTM! Well-designed data structures.The struct definitions are clean and appropriate for Noir circuit inputs, with good use of
Optionfor conditional fields and the custom deserializer forsha_precompute_selectorto handle empty strings elegantly.
64-77: LGTM! Complete type definitions.The
HaystackLocationenum andRegexInputstruct provide clear, type-safe interfaces for regex processing in Noir circuits.src/circuit/noir/mod.rs (4)
1-24: LGTM! Clean module organization.Module structure, imports, and the
MODULUS_BITSconstant are well-organized.
35-105: LGTM! Clean circuit input construction.The conversion of parsed email data into circuit parameters and the initialization of
NoirCircuitInputsare well-structured.
223-277: LGTM! Clean integration of external inputs.The function properly constructs circuit parameters and processes external inputs with appropriate padding and serialization.
306-443: LGTM! Robust regex processing and test coverage.The regex input processing correctly handles capture groups with proper error checks, and the test module provides good coverage of the circuit input generation flow.
| let result = generate_partial_sha( | ||
| body_for_sha, | ||
| body_original_len, | ||
| adjusted_selector, | ||
| params.max_body_length, | ||
| ); | ||
|
|
||
| // Use match to handle the result and convert any error into an anyhow::Error | ||
| let (precomputed_sha, body_remaining_padded, body_remaining_length) = match result { | ||
| Ok((sha, remaining, len)) => (sha, remaining, len), | ||
| Err(e) => panic!("Failed to generate partial SHA: {:?}", e), | ||
| }; |
There was a problem hiding this comment.
Replace panic with error propagation.
The function panics on generate_partial_sha failure, but it already returns Result<CircuitInput>. Convert the error and propagate it using ? instead of panicking.
Apply this diff:
- // Use match to handle the result and convert any error into an anyhow::Error
- let (precomputed_sha, body_remaining_padded, body_remaining_length) = match result {
- Ok((sha, remaining, len)) => (sha, remaining, len),
- Err(e) => panic!("Failed to generate partial SHA: {:?}", e),
- };
+ // Convert the error into an anyhow::Error and propagate it
+ let (precomputed_sha, body_remaining_padded, body_remaining_length) = result
+ .map_err(|e| anyhow::anyhow!("Failed to generate partial SHA: {:?}", e))?;🤖 Prompt for AI Agents
In src/circuit/mod.rs around lines 172 to 183, the code currently panics on a
failed generate_partial_sha call; replace the match/panic with direct error
propagation. Call generate_partial_sha(...) and use the ? operator to propagate
the error (if needed, map the error into anyhow::Error with .map_err(|e|
anyhow::anyhow!(...)) before applying ?), then destructure the returned tuple
into (precomputed_sha, body_remaining_padded, body_remaining_length).
Support both legacy and new circuit input formats by:
- Adding optional fields (max_match_length, max_haystack_length, regex_graph_json) with aliases for old field names
- Implementing legacy path that generates {name}RegexIdx format for old circuits
- Using NFAGraph-based generation only when regex_graph_json is present
- Adding helper methods to detect circuit version and calculate defaults
- Fixing typo: removeSoftLinesBreaks -> removeSoftLineBreaks in tests
This ensures circuits compiled before the new compiler integration continue to work while supporting the new format.
Rutefig/fix proof gen old circuits
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/circuit/circom.rs (2)
630-637: Useu64(noti64) forcapture_group_start_indicesJSON numbersHere
capture_group_start_indiceselements are cast toi64while related fields (capture_group_ids,capture_group_starts) useu64, and indices are inherently non‑negative. This inconsistency is unnecessary and can be surprising when comparing Circom vs Noir outputs.Align this with the other fields by casting to
u64:- .map(|s| serde_json::Value::Number((*s as i64).into())) + .map(|s| serde_json::Value::Number((*s as u64).into()))
283-324: Silent fallback to index0can hide parsing failures for code/timestamp/command
code_idx,timestamp_idx, andcommand_idxall mapErr(_)to0, and the body-search fallback also forces0whenpadded_cleaned_bodyisNone. This silently suppresses errors and makes it impossible to distinguish “not found / failed to parse” from “found at position 0”, which can mis-wire circuit behavior.Consider instead:
- Propagating the errors (
?) if these indices are required, or- Using
Option<usize>inEmailCircuitInputand encoding the optionality explicitly at the circuit boundary, or- If
0is truly a safe sentinel, document that invariant clearly and ensure the circuits never treat index 0 as a valid match location.
🧹 Nitpick comments (4)
src/lib.rs (1)
9-10: Consider whetherregexneeds to be a public moduleYou’re already re-exporting the curated regex surface at the crate root; if external consumers don’t need to reach into
regex::*internals, you could make thismod regex;instead and keep the internal layout private. If consumers do rely on submodules, the currentpubis fine.src/circuit/circom.rs (3)
41-49:EmailCircuitParams.remove_soft_line_breaksis currently unusedThe
remove_soft_line_breaksflag is exposed onEmailCircuitParamsbut never read anywhere in this module, so toggling it has no effect. Either wire it into the email input generation (e.g., how the body is cleaned/padded) or drop the field to avoid a misleading public API.
569-581: DeadDecomposedRegexConfigconstruction in NFAGraph pathIn the NFAGraph branch you build and populate
decomposed_regex_config, but never actually use it; onlygen_circuit_inputswithNFAGraphis called afterward. This is dead code that adds confusion and may trigger unused-variable warnings.You can safely remove
decomposed_regex_config(and theVecDequedependency if unused elsewhere), or, if it’s intended for a future shared Circom/Noir path, add a brief comment to mark it as intentionally unused for now.
703-706:compute_signal_lengthis correct but lacks tests and can be simplifiedThe formula for
compute_signal_lengthmatches the intended “ceil(max_length / 31)”, but there’s a TODO and no tests here. Edge cases likemax_length = 0or exact multiples of 31 are easy to regress.Consider both (a) adding a small unit test table (e.g., 0, 1, 30, 31, 32) and (b) simplifying the implementation to
((max_length + 30) / 31)for readability. I can sketch the tests if helpful.Also applies to: 708-708
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.toml(3 hunks)package.json(1 hunks)src/circuit/circom.rs(1 hunks)src/lib.rs(2 hunks)ts_tests/circuit_input.test.ts(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- package.json
🧰 Additional context used
🧬 Code graph analysis (2)
src/lib.rs (3)
src/regex/patterns.rs (8)
extract_body_hash_idxes(106-108)extract_email_addr_idxes(81-83)extract_email_domain_idxes(86-88)extract_from_addr_idxes(91-93)extract_message_id_idxes(116-118)extract_subject_all_idxes(101-103)extract_timestamp_idxes(111-113)extract_to_addr_idxes(96-98)src/regex/extract.rs (2)
extract_substr(374-386)extract_substr_idxes(188-201)src/regex/padding.rs (2)
pad_bytes(32-36)pad_string(18-22)
src/circuit/circom.rs (6)
src/converters.rs (4)
field_to_hex(76-79)hex_to_u256(302-308)string_to_circom_bigint_bytes(463-483)vec_u8_to_bigint(265-269)src/parse_email.rs (3)
find_index_in_body(360-372)remove_quoted_printable_soft_breaks(323-345)new_from_raw_email(49-75)src/circuit/noir/utils.rs (1)
trim_sha256_padding(319-350)src/circuit/mod.rs (2)
generate_circuit_inputs(134-192)new(98-113)src/cryptos.rs (5)
from(273-275)new(59-61)new(260-262)new(335-343)from_email_addr(111-128)src/regex/extract.rs (1)
extract_substr_idxes(188-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test
🔇 Additional comments (5)
ts_tests/circuit_input.test.ts (1)
34-34: LGTM! Typo fix applied consistently.The property name correction from
removeSoftLinesBreakstoremoveSoftLineBreaks(removing the extra 's') is applied consistently across all three test cases. This aligns with the updated WASM API mentioned in the PR objectives.Also applies to: 79-79, 153-153
src/lib.rs (2)
11-11: WASM cfg gate forwasmmodule looks goodThe
#[cfg(target_arch = "wasm32")]around thewasmmodule keeps the native build surface clean while exposing the WASM helpers only where relevant; no issues here.
23-30: <!-- [scratchpad_start]
[task verify_re_export_impact 2/3]
[observations]
- Found lib.rs content showing re-export block at lines 24-30 (matches review snippet)
- Searched for
extract_across entire codebase - found extensive usage of:
- extract_body_hash_idxes, extract_email_addr_idxes, extract_email_domain_idxes
- extract_from_addr_idxes, extract_message_id_idxes, extract_subject_all_idxes
- extract_substr, extract_substr_idxes, extract_timestamp_idxes, extract_to_addr_idxes
- pad_bytes, pad_string in multiple files
- No results for
extract_substrs(plural) - onlyextract_substr(singular) exists- All re-exported items are actively used in src/parse_email.rs, src/regex/patterns.rs, src/regex/wasm.rs, src/circuit/circom.rs
- Functions defined in src/regex/patterns.rs (wrapper functions around extract_substr_idxes) and src/regex/extract.rs (core implementations)
[analysis]
- The original review suggested looking for removed helpers like
extract_substrs- No evidence of
extract_substrsexisting - appears never to have existed in codebase- All currently re-exported items are actively used and defined in the regex module
- Singular vs plural: code uses
extract_substr(singular) throughout, not plural variant[pending]
- Need to verify regex/mod.rs actually exports all these items
- Need to ensure all types are properly exported (DecomposedRegexConfig, ExtractionError, etc.)
- Check if re-export is complete or if any items are missing
[actions]
- Check src/regex/mod.rs to see what it actually exports
- Cross-reference all re-exported items with what's available in regex module
- Verify no breakage from the re-export set
#!/bin/bash # Check the regex module's mod.rs to see what it exports cat src/regex/mod.rs#!/bin/bash # Check if extract.rs and patterns.rs properly export their functions grep -n "^pub fn\|^pub struct\|^pub enum" src/regex/extract.rs src/regex/patterns.rs src/regex/types.rs#!/bin/bash # Verify all re-exported items from lib.rs actually exist in regex module # Check each function/type exists rg -n "pub fn extract_body_hash_idxes|pub fn extract_email_addr_idxes|pub fn extract_email_domain_idxes|pub fn extract_from_addr_idxes|pub fn extract_message_id_idxes|pub fn extract_subject_all_idxes|pub fn extract_substr|pub fn extract_substr_idxes|pub fn extract_timestamp_idxes|pub fn extract_to_addr_idxes|pub fn pad_bytes|pub fn pad_string" src/regex/Cargo.toml (2)
53-53: Let me search more specifically for any breaking changes documentation between versions:Let me search for the GitHub release notes more specifically:
Based on my verification, v5.0.0 was released as a fix for a Plonky3 library issue affecting SP1 zkVM security, specifically a FRI size check vulnerability. However, I found that:
No direct code usage detected: The codebase declares
sp1-verifieras a dependency but contains no direct Rust code imports or usages of it.Security fix, not breaking changes: The release notes indicate this was primarily a security fix rather than breaking API changes.
Incomplete verification: Without visible code usage of
sp1-verifieror documented API breaking changes, I cannot definitively confirm whether the upgrade required code modifications.Verify that the sp1-verifier 5.0.0 upgrade compiles successfully and that all tests pass. Confirm the dependency works as intended with the current codebase.
16-16: zk-regex-compiler revision is valid and reachable.The commit hash
a2ce67ff8ae667bc7846cf8920d7d866f2f02d4chas been verified as valid and accessible in the zkemail/zk-regex repository. The sp1-verifier upgrade from 4.0.0 to 5.0.0 shows no breaking changes in the public API based on the v5.0.0 release notes.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/circuit/circom.rs (2)
237-330: Silent 0 defaults for indices still risk hiding extraction failures
code_idx,timestamp_idx, andcommand_idxdefault to0on any extraction error (Lines 290–307) and are later overwritten withfind_index_in_bodyresults whenprecomputed_shais present (Lines 315–329). This keeps the previous behavior but still makes it hard to distinguish “not found / error” from a genuine index at 0.Consider one of:
- Propagating the errors instead of mapping
Err(_) => 0, or- Changing these fields to
Option<usize>inEmailCircuitInputand usingNoneon extraction failure, or- At minimum, documenting clearly that
0is a sentinel for “missing” and never a valid match position in your circuits.Example (propagate errors):
- let mut code_idx = match parsed_email.get_invitation_code_idxes( + let mut code_idx = parsed_email + .get_invitation_code_idxes( params .as_ref() .map_or(false, |p| p.ignore_body_hash_check.unwrap_or(false)), - ) { - Ok(indexes) => indexes.0, - Err(_) => 0, - }; - let timestamp_idx = match parsed_email.get_timestamp_idxes() { - Ok(indexes) => indexes.0, - Err(_) => 0, - }; - let mut command_idx = - match parsed_email.get_command_idxes(circuit_input_params.ignore_body_hash_check) { - Ok(indexes) => indexes.0, - Err(_) => 0, - }; + )? + .0; + let timestamp_idx = parsed_email.get_timestamp_idxes()?.0; + let mut command_idx = + parsed_email.get_command_idxes(circuit_input_params.ignore_body_hash_check)?.0;This issue was already raised in a previous review and still applies.
618-656: Useu64consistently for capture group indices instead ofi64All other Circom capture-group-related arrays here (
capture_group_ids,capture_group_starts) are serialized asu64, and the Noir side usesu64forcapture_group_start_indicesas well. This one cast toi64is inconsistent and can be confusing:.map(|s| serde_json::Value::Number((*s as i64).into()))Switching to
u64keeps all index-like fields non-negative and aligned:- .map(|s| serde_json::Value::Number((*s as i64).into())) + .map(|s| serde_json::Value::Number((*s as u64).into()))This was identified in an earlier review and still applies.
🧹 Nitpick comments (4)
package.json (1)
13-13: Clarify the pre-release version format.The version "0.4.66-15" uses a non-standard pre-release suffix. Semantic versioning typically expects formats like
0.4.66-alpha.15,0.4.66-beta.15, or0.4.66-rc.15to convey meaning.Please clarify what "-15" represents (e.g., build number, draft iteration) and consider adopting a standard pre-release identifier for clarity.
src/regex/types.rs (1)
30-71: Clarify behavior whenmax_bytes_per_partlength doesn’t matchparts
LegacyDecomposedRegexConfig::to_new_configsilently defaults to256whenmax_bytes_per_partisNoneor too short, which can surprise callers if they miscount parts.Consider either asserting or documenting this behavior explicitly, e.g.:
impl LegacyDecomposedRegexConfig { /// Convert old config to new config with optional max_bytes per part pub fn to_new_config(&self, max_bytes_per_part: Option<Vec<usize>>) -> DecomposedRegexConfig { + if let Some(ref v) = max_bytes_per_part { + debug_assert!( + v.len() >= self.parts.len(), + "max_bytes_per_part shorter than parts; missing entries default to 256", + ); + } let parts = self .parts .iter() .enumerate() .map(|(i, part)| { let max_bytes = max_bytes_per_part.as_ref().and_then(|v| v.get(i).copied()); part.to_new_part(max_bytes) }) .collect();This keeps runtime behavior but makes the implicit default less surprising.
src/circuit/circom.rs (2)
675-691: Minor readability tweak for external input value handlingThe call to
string_to_circom_bigint_bytesadds an extra reference:string_to_circom_bigint_bytes(&external_input.value.as_deref().unwrap_or(""))?;Since
as_deref().unwrap_or("")already yields&str, you can simplify to:- let mut value = - string_to_circom_bigint_bytes(&external_input.value.as_deref().unwrap_or(""))?; + let mut value = + string_to_circom_bigint_bytes(external_input.value.as_deref().unwrap_or(""))?;No functional change, just slightly clearer.
719-723:compute_signal_lengthlogic is correct; consider adding testsThe calculation
max_length / 31 + (if max_length % 31 != 0 { 1 } else { 0 })correctly computes the number of 31-byte limbs, including themax_length == 0case. Given the TODO, adding small table-driven tests (e.g., 0, 1, 31, 32, 62, 63) would lock this in.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
package.json(1 hunks)src/circuit/circom.rs(1 hunks)src/circuit/mod.rs(1 hunks)src/cryptos.rs(9 hunks)src/regex/types.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/circuit/mod.rs
🧰 Additional context used
🧬 Code graph analysis (2)
src/circuit/circom.rs (5)
src/circuit/mod.rs (3)
generate_circuit_inputs_old(194-256)generate_circuit_inputs(134-192)new(98-113)src/converters.rs (4)
field_to_hex(76-79)hex_to_u256(302-308)string_to_circom_bigint_bytes(463-483)vec_u8_to_bigint(265-269)src/parse_email.rs (3)
find_index_in_body(360-372)remove_quoted_printable_soft_breaks(323-345)new_from_raw_email(49-75)src/circuit/noir/utils.rs (1)
trim_sha256_padding(319-350)src/regex/extract.rs (1)
extract_substr_idxes(188-201)
src/cryptos.rs (2)
src/regex/padding.rs (1)
pad_string(18-22)src/converters.rs (2)
hex_to_field(20-67)field_to_hex(76-79)
🔇 Additional comments (14)
package.json (1)
4-12: Verify npm scripts cover new browser_test app.The PR introduces a browser_test Vite app. The current build scripts (
wasm:build,wasm:postbuild,build) appear focused on Rust WASM compilation. Confirm that:
- The browser_test app has its own separate build/dev scripts (e.g., in a nested
package.json), or- New npm scripts are needed to support browser_test development/build.
src/cryptos.rs (8)
3-3: LGTM!The new import for
pad_stringfrom the regex module is correctly used in the refactoredfrom_email_addrmethod (line 123) and aligns with the PR's goal of consolidating padding utilities.
111-128: LGTM! Excellent security improvement.The addition of length validation before padding prevents potential collision attacks where different email addresses exceeding 256 bytes could produce identical padded representations. The error handling is clear and the documentation thoroughly explains the security rationale.
559-630: LGTM! Good backward compatibility approach.The addition of
generate_partial_sha_oldmaintains support for the older Circom compiler that requires external padding. The 64-byte alignment check (lines 615-620) is correctly preserved for this compatibility path, while the new implementation delegates padding to the circuit.
674-770: LGTM! Excellent test coverage.The new tests comprehensively validate the email address length validation and padding behavior:
- Valid inputs and boundary cases (lines 674-702)
- Error handling for overflow (lines 704-718)
- Security properties preventing collisions (lines 720-730)
- Integration with
calculate_account_salt(lines 746-770)The tests are well-structured and provide strong confidence in the new validation logic.
872-890: LGTM! Proper error propagation implemented.The function signature change to return
Result<String>correctly propagates errors from:
PaddedEmailAddr::from_email_addr(line 875)hex_to_fieldconversions (lines 879, 881)AccountSalt::new(line 886)This provides better error visibility to callers and aligns with Rust best practices.
903-906: LGTM! Good robustness improvement.Adding a fallback to try the lowercase "from" header improves compatibility with email systems that may use different header casing. The
or_elsepattern ensures no behavioral change when the standard "From" header is present.
38-38: New DKIM API endpoint verified and functional.The endpoint change from
/api/keyto/api/key/domainhas been confirmed. The new endpoint returns HTTP 200 with a valid JSON array format where each element contains the required "value" field with DKIM records. The existing parsing logic at lines 1038-1041 (which uses.as_array()?.iter().filter_map(|record| record.get("value"))) correctly handles this response format. Query parameters remain unchanged. No breaking changes detected.
494-557: Verify circuit handling of non-64-byte-aligned SHA padding with test evidence.The Consensys Diligence audit of zkemail.nr (December 2024) identified that there are no tests for partial_sha256_var_end when the message size is not a multiple of block size (64 bytes). This directly validates your concern about removing the 64-byte alignment check in
generate_partial_sha.The code shifts SHA padding responsibility from Rust (
generate_partial_sha_oldchecked alignment explicitly) to the circuit'spartial_sha256_var_end, but this function is designed to hash additional parts of the message or complete the entire hash. Critical gap: no evidence in the current codebase that non-aligned inputs are tested or that recent audit findings have been addressed.Before merging, verify:
- Tests exist covering remaining bodies with lengths not divisible by 64
- The circuit implementation has been updated to handle the documented audit gaps
- Hash output matches standard implementations for all edge cases
src/regex/types.rs (1)
6-28: Error type and result alias are well-structuredThe
ExtractionErrorvariants and theExtractionResult<T>alias give clear, future-proof error reporting for regex extraction; no changes needed here.src/circuit/circom.rs (4)
61-220: Serde wrappers and legacy RegexPart handling look correctThe
SerializableRegexPart,RegexPartInput, andregex_parts_serdemodule cleanly support both new enum-based parts and legacy configs while preserving JSON compatibility. The customDecomposedRegexDebug/Cloneimplementations are also sound.
371-390: Claim circuit input generation is straightforward and correct
generate_claim_inputcleanly usesPaddedEmailAddr::from_email_addrand propagates validation via?, and the serialized shape ofClaimCircuitInputis simple and consistent.
397-451: Legacy regex circuit input generation matches old format
generate_legacy_circuit_inputscorrectly reconstructs{name}RegexIdxand{name}RegexIdxNfields fromextract_substr_idxes, preserving the legacy shape for old Circom circuits. No changes needed here.
581-604: NFAGraph path construction andregex_graph_jsonhandling look correctThe NFAGraph-based path correctly guards with
has_nfa_graph(), so the.unwrap()onregex_graph_jsonis safe, and the use ofget_max_haystack_length/max_match_lengthcleanly wires through togen_circuit_inputs.
| let email_circuit_inputs; | ||
| if decomposed_regexes[0].has_nfa_graph() { | ||
| email_circuit_inputs = generate_circuit_inputs(circuit_input_params.clone())?; | ||
| } else { | ||
| // TODO: remove all this temperory code for supporting old one | ||
| email_circuit_inputs = generate_circuit_inputs_old(circuit_input_params.clone())?; | ||
| } |
There was a problem hiding this comment.
Potential panic when decomposed_regexes is empty
generate_circuit_inputs_with_decomposed_regexes_and_external_inputs indexes decomposed_regexes[0] to decide between generate_circuit_inputs and generate_circuit_inputs_old. If the caller passes an empty vector (e.g., only external inputs), this will panic.
Using .first() avoids this and lets you define a safe default (e.g., new compiler by default):
- let email_circuit_inputs;
- if decomposed_regexes[0].has_nfa_graph() {
- email_circuit_inputs = generate_circuit_inputs(circuit_input_params.clone())?;
- } else {
- // TODO: remove all this temperory code for supporting old one
- email_circuit_inputs = generate_circuit_inputs_old(circuit_input_params.clone())?;
- }
+ let email_circuit_inputs = if decomposed_regexes
+ .first()
+ .map(|r| r.has_nfa_graph())
+ .unwrap_or(true)
+ {
+ generate_circuit_inputs(circuit_input_params.clone())?
+ } else {
+ // TODO: remove all this temporary code for supporting old one
+ generate_circuit_inputs_old(circuit_input_params.clone())?
+ };This keeps behavior for non-empty inputs while making the empty case safe.
🤖 Prompt for AI Agents
In src/circuit/circom.rs around lines 503 to 509, the code unsafely indexes
decomposed_regexes[0] and can panic when the vector is empty; change the logic
to use decomposed_regexes.first() (or pattern-match) and, if Some(first) call
first.has_nfa_graph() to choose between generate_circuit_inputs and
generate_circuit_inputs_old, otherwise treat the empty case as the safe default
(use generate_circuit_inputs) so the function no longer panics on empty input.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/cryptos.rs (2)
559-559: Minor: Grammatical issue in comment.The comment reads "Just supported for older circom compiler" but should be "Only supported for older circom compiler" or "Supported only for older circom compiler" for better clarity.
Apply this diff:
-// TODO : Just supported for older circom compiler, since it's circuit doesn't had the padding inside, we padded them externally +// TODO: Only supported for older circom compiler, since its circuit doesn't have padding inside, so we pad externally
920-929: Consider adding a clarifying comment for the.last()usage.The use of
.captures_iter(from_domain).last()cleverly handles cases where the From header might contain multiple@symbols (e.g., in display names like"Contact @ Company <user@domain.com>"). A brief comment explaining this would improve code clarity.// Extract domain from From header let from_domain = from_headers[0].as_str(); let from_re = Regex::new(r"@([^>\s]+)").unwrap(); + // Use .last() to get the domain from the actual email address, not from any @ in the display name let from_domain = from_re .captures_iter(from_domain) .last()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
package.json(1 hunks)src/cryptos.rs(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cryptos.rs (2)
src/regex/padding.rs (1)
pad_string(18-22)src/converters.rs (2)
hex_to_field(20-67)field_to_hex(76-79)
🔇 Additional comments (8)
package.json (1)
13-13: Verify the version format is appropriate for npm publishing.The version
0.4.66-16uses a non-standard format for npm (typically semantic versioning or standard pre-release identifiers like0.4.66-alpha.16or0.4.66-rc.16are preferred). Given that this package has public npm publish scripts, confirm that this format is intentional and won't cause issues when published to the registry.src/cryptos.rs (7)
111-128: LGTM! Excellent security improvement.The email length validation prevents collision attacks where different email addresses exceeding 256 bytes could produce identical padded representations. The error handling is clear and comprehensive.
494-557: LGTM! Well-structured partial SHA generation.The function properly handles regex compilation errors, UTF-8 conversion errors, and validates the remaining body length. The padding logic is correct, and the comment at line 549 helpfully explains why 64-byte alignment checking is unnecessary (circuit handles it).
873-890: LGTM! Proper error propagation throughout.The function correctly propagates errors from
PaddedEmailAddr::from_email_addr,hex_to_field, andAccountSalt::newusing the?operator. The Result return type is appropriate for these validation steps.
903-906: LGTM! Good defensive handling of header case variations.The fallback to lowercase "from" header improves robustness across different mail servers that may use different header capitalization.
674-744: LGTM! Comprehensive test coverage for email validation.The tests thoroughly cover valid cases, boundary conditions (exactly 256 bytes), error cases (exceeding max length), and security concerns (collision prevention). Well-structured test suite.
746-770: LGTM! Good test coverage for account salt calculation.The tests verify both successful salt calculation with valid inputs and proper error propagation when the email exceeds maximum length. The use of a properly formatted 32-byte hex account code is appropriate.
38-38: Add integration test for the DKIM API endpoint in CI.The
fetch_public_keysfunction depends onhttps://archive.zk.email/api/key/domainfor DKIM verification, but the existing test skips in CI environments. Either add an integration test with a mock endpoint/fixture, or document the runtime requirement and expected response schema.
Add rsa_key_bits parameter to allow specifying RSA key size (1024 or 2048). When not specified, key size is auto-detected from the public key length.
Change RSA key size auto-detection to explicitly match supported sizes (1024-bit and 2048-bit) and return an error for unsupported sizes. Previously, any key larger than 128 bytes would silently default to 2048-bit parameters, which would cause incorrect behavior for 4096-bit keys. Now unsupported key sizes return a clear error message.
…024-for-noir-in-our-registry feat: add support for 1024-bit RSA keys in Noir circuits
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/circuit/noir/mod.rs`:
- Around line 37-54: When computing key_bits in src/circuit/noir/mod.rs,
explicitly validate params.rsa_key_bits when it is Some(bits) to only accept
supported sizes (1024 or 2048); if bits is not one of those values return the
same error used for auto-detect (mentioning the unsupported size in bits and
bytes). Update the match arm for params.rsa_key_bits to check the provided bits
and return an anyhow::anyhow error on invalid values so callers passing 4096
cannot proceed, while keeping the existing auto-detection using
parsed_email.public_key.len() unchanged.
- Around line 274-288: The loop processing external_inputs can produce
value_as_byte_strings longer than the computed signal_length, which will yield
invalid circuit input sizes; inside the loop after computing signal_length and
before padding, check if value_as_byte_strings.len() > signal_length and fail
fast by returning an Err with a clear message (include identifying info such as
external_input.name or external_input.max_length and actual length) from the
enclosing function (the block that calls string_to_circom_bigint_bytes and
compute_signal_length), or—if truncation is desired—explicitly truncate
value_as_byte_strings to signal_length using truncate(signal_length) so the JSON
array never exceeds the circuit's expected length; update callers/tests
accordingly.
♻️ Duplicate comments (5)
src/circuit/noir/mod.rs (3)
31-34: Default body-hash behavior is still insecure and inconsistent.
Line 33 defaults to skipping body-hash verification, and lateris_some_andtreatsNoneas “don’t populate body”, even though circuit inputs were generated with defaults. Please unify defaults and reuse a resolved boolean.🛠️ Suggested fix
- let parsed_email = - ParsedEmail::new_from_raw_email(email, params.ignore_body_hash_check.unwrap_or(true)) - .await?; + let ignore_body_hash_check = params.ignore_body_hash_check.unwrap_or(false); + let parsed_email = + ParsedEmail::new_from_raw_email(email, ignore_body_hash_check).await?;- if params.ignore_body_hash_check.is_some_and(|x| !x) { + if !ignore_body_hash_check {
179-185: Avoidunwrap()onprecomputed_sha.
Line 183 can panic despite this function returningResult. Propagate an error instead.🛠️ Suggested fix
- let partial_hash = u8_to_u32(email_circuit_inputs.precomputed_sha.unwrap().as_slice())?; + let partial_hash = u8_to_u32( + email_circuit_inputs + .precomputed_sha + .as_ref() + .ok_or_else(|| anyhow::anyhow!("Precomputed SHA is missing"))? + .as_slice() + )?;
313-317: Handle missing body/decoded_body safely.
Theseunwrap()calls will panic if body generation was skipped (e.g., ignore_body_hash_check true).🛠️ Suggested fix
HaystackLocation::Body => { let body = if params.remove_soft_line_breaks { - noir_circuit_input.decoded_body.as_ref().unwrap() + noir_circuit_input.decoded_body.as_ref() + .ok_or_else(|| anyhow::anyhow!("Decoded body is missing when remove_soft_line_breaks is enabled"))? } else { - noir_circuit_input.body.as_ref().unwrap() + noir_circuit_input.body.as_ref() + .ok_or_else(|| anyhow::anyhow!("Body is missing when required for regex matching"))? };src/circuit/mod.rs (2)
171-184: Propagate partial-SHA errors instead of panicking.
This function returnsResult, but Line 181-183 panics on failure.🛠️ Suggested fix
- let (precomputed_sha, body_remaining_padded, body_remaining_length) = match result { - Ok((sha, remaining, len)) => (sha, remaining, len), - Err(e) => panic!("Failed to generate partial SHA: {:?}", e), - }; + let (precomputed_sha, body_remaining_padded, body_remaining_length) = result + .map_err(|e| anyhow::anyhow!("Failed to generate partial SHA: {:?}", e))?;
235-248: Same panic issue in legacy path.
generate_circuit_inputs_oldshould also propagate errors instead of panicking.🛠️ Suggested fix
- let (precomputed_sha, body_remaining_padded, body_remaining_length) = match result { - Ok((sha, remaining, len)) => (sha, remaining, len), - Err(e) => panic!("Failed to generate partial SHA: {:?}", e), - }; + let (precomputed_sha, body_remaining_padded, body_remaining_length) = result + .map_err(|e| anyhow::anyhow!("Failed to generate partial SHA: {:?}", e))?;
| // Public key bytes: 128 bytes = 1024 bits, 256 bytes = 2048 bits | ||
| let key_bits = match params.rsa_key_bits { | ||
| Some(bits) => bits, | ||
| None => { | ||
| let key_bytes = parsed_email.public_key.len(); | ||
| match key_bytes { | ||
| 128 => 1024, | ||
| 256 => 2048, | ||
| _ => { | ||
| return Err(anyhow::anyhow!( | ||
| "Unsupported RSA key size: {} bytes ({} bits). Only 1024-bit and 2048-bit keys are supported.", | ||
| key_bytes, | ||
| key_bytes * 8 | ||
| )); | ||
| } | ||
| } | ||
| } | ||
| }; |
There was a problem hiding this comment.
Validate explicitly provided rsa_key_bits.
Right now only auto-detect enforces supported sizes. If a caller passes 4096, it will slip through and likely produce incompatible limb sizes.
🛠️ Suggested fix
- let key_bits = match params.rsa_key_bits {
- Some(bits) => bits,
+ let key_bits = match params.rsa_key_bits {
+ Some(bits @ 1024 | bits @ 2048) => bits,
+ Some(bits) => {
+ return Err(anyhow::anyhow!(
+ "Unsupported RSA key size: {} bits. Only 1024-bit and 2048-bit keys are supported.",
+ bits
+ ));
+ }
None => {
let key_bytes = parsed_email.public_key.len();
match key_bytes {
128 => 1024,
256 => 2048,
_ => {🤖 Prompt for AI Agents
In `@src/circuit/noir/mod.rs` around lines 37 - 54, When computing key_bits in
src/circuit/noir/mod.rs, explicitly validate params.rsa_key_bits when it is
Some(bits) to only accept supported sizes (1024 or 2048); if bits is not one of
those values return the same error used for auto-detect (mentioning the
unsupported size in bits and bytes). Update the match arm for
params.rsa_key_bits to check the provided bits and return an anyhow::anyhow
error on invalid values so callers passing 4096 cannot proceed, while keeping
the existing auto-detection using parsed_email.public_key.len() unchanged.
| // Process external inputs and add them to the circuit inputs | ||
| for external_input in external_inputs { | ||
| // Use the existing utility function to convert the string to a Vec<String> of byte values | ||
| let mut value_as_byte_strings = | ||
| string_to_circom_bigint_bytes(&external_input.value.as_deref().unwrap_or(""))?; | ||
|
|
||
| let signal_length = compute_signal_length(external_input.max_length); | ||
|
|
||
| // Pad the Vec<String> with "0" strings if it's shorter than the target length | ||
| if value_as_byte_strings.len() < signal_length { | ||
| value_as_byte_strings.extend( | ||
| std::iter::repeat("0".to_string()) | ||
| .take(signal_length - value_as_byte_strings.len()), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Guard against oversized external inputs.
If value_as_byte_strings.len() > signal_length, the generated JSON array will exceed circuit expectations. Fail fast or truncate intentionally.
🛠️ Suggested fix
let signal_length = compute_signal_length(external_input.max_length);
+ if value_as_byte_strings.len() > signal_length {
+ return Err(anyhow::anyhow!(
+ "External input '{}' exceeds max length ({} > {})",
+ external_input.name,
+ value_as_byte_strings.len(),
+ signal_length
+ ));
+ }
+
// Pad the Vec<String> with "0" strings if it's shorter than the target length
if value_as_byte_strings.len() < signal_length {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Process external inputs and add them to the circuit inputs | |
| for external_input in external_inputs { | |
| // Use the existing utility function to convert the string to a Vec<String> of byte values | |
| let mut value_as_byte_strings = | |
| string_to_circom_bigint_bytes(&external_input.value.as_deref().unwrap_or(""))?; | |
| let signal_length = compute_signal_length(external_input.max_length); | |
| // Pad the Vec<String> with "0" strings if it's shorter than the target length | |
| if value_as_byte_strings.len() < signal_length { | |
| value_as_byte_strings.extend( | |
| std::iter::repeat("0".to_string()) | |
| .take(signal_length - value_as_byte_strings.len()), | |
| ); | |
| } | |
| // Process external inputs and add them to the circuit inputs | |
| for external_input in external_inputs { | |
| // Use the existing utility function to convert the string to a Vec<String> of byte values | |
| let mut value_as_byte_strings = | |
| string_to_circom_bigint_bytes(&external_input.value.as_deref().unwrap_or(""))?; | |
| let signal_length = compute_signal_length(external_input.max_length); | |
| if value_as_byte_strings.len() > signal_length { | |
| return Err(anyhow::anyhow!( | |
| "External input '{}' exceeds max length ({} > {})", | |
| external_input.name, | |
| value_as_byte_strings.len(), | |
| signal_length | |
| )); | |
| } | |
| // Pad the Vec<String> with "0" strings if it's shorter than the target length | |
| if value_as_byte_strings.len() < signal_length { | |
| value_as_byte_strings.extend( | |
| std::iter::repeat("0".to_string()) | |
| .take(signal_length - value_as_byte_strings.len()), | |
| ); | |
| } |
🤖 Prompt for AI Agents
In `@src/circuit/noir/mod.rs` around lines 274 - 288, The loop processing
external_inputs can produce value_as_byte_strings longer than the computed
signal_length, which will yield invalid circuit input sizes; inside the loop
after computing signal_length and before padding, check if
value_as_byte_strings.len() > signal_length and fail fast by returning an Err
with a clear message (include identifying info such as external_input.name or
external_input.max_length and actual length) from the enclosing function (the
block that calls string_to_circom_bigint_bytes and compute_signal_length), or—if
truncation is desired—explicitly truncate value_as_byte_strings to signal_length
using truncate(signal_length) so the JSON array never exceeds the circuit's
expected length; update callers/tests accordingly.
Note
Introduce a new regex runtime and split circuit code into circom/noir modules, update WASM/SDK integration, and add a browser proving app with stricter email validation and DKIM endpoint update.
regexmodule (extraction, padding, patterns, WASM) usingzk_regex_compiler; remove old inline regex logic and types.circomandnoirmodules; add Noir input generation and limb/REDc utilities; consolidate helpers.PaddedEmailAddr(returns Result); update DKIM key endpoint; improve padding/sha utilities.browser_test(Vite setup, worker, provers for Noir/Circom, assets) for in-browser proving.Written by Cursor Bugbot for commit bcf50a2. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.