[ZIPT Benchmark] Z3 c3 branch — 2026-03-28 #9149
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by ZIPT String Solver Benchmark. A newer discussion is available at Discussion #9214. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-28
Branch: c3 (commit
ebd35bc)Benchmark set: QF_S (200 randomly selected from 22,172 files in
tests/QF_S.tar.zst)Timeout: seq
-T:5+ outer 7 s; nseq-T:5+ outer 12 s; ZIPT-t:5000+ outer 12 sZ3 build: Debug (CMake, ninja), v4.17.0, commit ebd35bc
ZIPT:
parikhbranch, built againstMicrosoft.Z3.dll(net8.0)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 1
Key observations:
not-containsanddiseqbenchmarks (6 cases), emitting a DOT debug graph instead of a verdictNotable Issues
Soundness Disagreements (Critical)
coffee-can_lstar_non_incre_equiv_init_0_0.smt2— seq=unsat, nseq=unsat, ZIPT=sat:status unknown)unsat; ZIPT'ssatanswer is almost certainly a soundness bug in ZIPTre.*,re.++,re.union,re.compCrashes / Bugs
nseq-only crashes (6 files) —
not-containsanddiseqbenchmarks:not-contains-1-5-5-133.smt2,not-contains-1-4-5-135.smt2,not-contains-1-4-5-121.smt2,not-contains-1-5-6-125.smt2diseq-None-5-6-106.smt2,diseq-1-5-6-106.smt2nseq emits a DOT-format debug graph (
digraph G {) to stdout instead ofsat/unsatwhen encountering string disequality constraints. This is a debug output path being triggered rather than a solver verdict, indicating an incomplete or debugging code path in the nseq solver on these problem types.seq+nseq both produce non-standard output (3 files):
slog_stranger_3304_sink.smt2,slog_stranger_1530_sink.smt2,instance08332.smt2Both seq and nseq output something other than sat/unsat/unknown on these files (likely unsupported string constructs or internal assertion failures during trace).
ZIPT crashes (23 events, 22 unique files):
pcp_*,unsolved_pcp_*(Post Correspondence Problem), andbenchmark_0xxxfamiliesSlow Benchmarks (outer-killed > 8 s)
diseq-None-5-6-106.smt2(zipt: 12.016 s outer-killed)wildcard-matching-regex-30.smt2(zipt: 12.009 s outer-killed)diseq-1-5-6-106.smt2(zipt: 12.017 s outer-killed)Trace Analysis: seq-fast / nseq-slow Hypotheses
No files met the strict criterion (seq < 1.0 s AND nseq > 3× seq AND nseq > 0.5 s) in this 200-file sample.
The dominant pattern was the opposite: nseq consistently outperformed seq. Many cases where seq timed out at 5 s were solved by nseq in under 0.1 s — for example:
instance15640: seq=3.984 s vs nseq=0.045 s (88× faster)instance04470: seq=5.018 s (timeout) vs nseq=0.062 sslog_stranger_4749_sink: seq=5.009 s (timeout) vs nseq=0.043 sThis reflects nseq's Nielsen-graph + Parikh constraint architecture providing tighter early termination compared to seq's SMT-based sequence rewriting calculus, which can generate large numbers of intermediate lemmas before concluding.
The one notable case where both solvers were slow is
instance14567.smt2(seq=3.917 s, nseq=2.632 s, ZIPT=0.329 s), suggesting ZIPT's arithmetic/length constraint propagation is most effective on that instance.Per-File Results (200 benchmarks)
instance09421.smt2unsolved_pcp_instance_221.smt2instance11040.smt2instance05427.smt2query7313.smt2instance15599.smt2instance00980.smt2slog_stranger_1662_sink.smt2benchmark_0181.smt2instance12848.smt2instance15064.smt2instance14113.smt2slog_stranger_2087_sink.smt2instance05920.smt2instance02335.smt2instance15640.smt2unsolved_pcp_instance_146.smt2instance07354.smt2instance11494.smt2instance15774.smt2unsolved_pcp_instance_111.smt2not-contains-1-5-5-133.smt203_track_176.smt2slog_stranger_1407_sink.smt2instance05507.smt2instance02933.smt2instance05340.smt204_track_177.smt2pcp_instance_402.smt2instance06612.smt2query5196.smt2unsolved_pcp_instance_217.smt2benchmark_0424.smt2instance09835.smt2instance05566.smt2instance02380.smt2instance14478.smt2instance05831.smt2instance13449.smt2instance02409.smt2instance06258.smt2instance07891.smt2instance00905.smt2instance15772.smt2instance05191.smt2instance11816.smt2instance05164.smt2instance10954.smt2instance12741.smt2instance06256.smt204_track_60.smt2unsolved_pcp_instance_437.smt2instance14691.smt2instance03496.smt2instance07039.smt2instance07863.smt2pcp_instance_491.smt2unsolved_pcp_instance_356.smt2instance06488.smt2instance02576.smt2instance05555.smt2instance00418.smt2instance01579.smt2pcp_instance_15.smt2instance08228.smt2instance10441.smt2slog_stranger_3304_sink.smt2Lehmann-Rabin_sat_non_incre_equiv_trans_15_0.smt2slog_stranger_4552_sink.smt2instance06866.smt203_track_170.smt2instance01580.smt2instance06755.smt2pcp_instance_478.smt2instance15610.smt2instance14918.smt2instance07315.smt2instance14871.smt2slog_stranger_4234_sink.smt2instance06482.smt2instance05104.smt2instance13133.smt2slog_stranger_4749_sink.smt2slog_stranger_2525_sink.smt2instance04448.smt2instance00510.smt2instance10454.smt2instance15698.smt2slog_stranger_1559_sink.smt2instance04606.smt2query5997.smt2instance08332.smt2instance01649.smt2instance07156.smt2instance07015.smt2instance08027.smt2instance14689.smt2pcp_instance_136.smt2instance07200.smt2instance15965.smt2eqdist_lstar_non_incre_equiv_trans_0_22.smt2instance00970.smt2instance02430.smt2benchmark_0031.smt2instance07146.smt2instance04511.smt2instance13606.smt2pcp_instance_451.smt2instance10542.smt2instance04659.smt2instance06195.smt2instance00339.smt2slog_stranger_642_sink.smt2instance04470.smt2instance04479.smt2unsolved_pcp_instance_387.smt2benchmark_0488.smt2not-contains-1-4-5-135.smt2instance12253.smt2instance13373.smt2instance06310.smt2instance05896.smt2instance09391.smt2instance15678.smt2slog_stranger_753_sink.smt2slog_stranger_202_sink.smt2instance01817.smt2benchmark_0406.smt2diseq-None-5-6-106.smt2instance07978.smt2not-contains-1-4-5-121.smt2instance00749.smt2instance03293.smt201_track_4.smt2instance09474.smt2two_token_pass_lstar_non_incre_equiv_bad_0_1.smt2instance09671.smt2slog_stranger_5046_sink.smt2instance05133.smt2wildcard-matching-regex-30.smt2slog_stranger_1530_sink.smt2instance00340.smt2slog_stranger_2228_sink.smt2diseq-1-5-6-106.smt2instance09949.smt2instance05757.smt2instance03809.smt2instance02989.smt2instance07465.smt2instance13735.smt2slog_stranger_2174_sink.smt2instance07655.smt2instance11913.smt2slog_stranger_1562_sink.smt2benchmark_0082.smt2instance05339.smt2instance13000.smt2query6109.smt2instance07417.smt2benchmark_0145.smt2slog_stranger_3072_sink.smt2instance00461.smt2instance09105.smt2slog_stranger_1604_sink.smt2instance11019.smt2slog_stranger_2119_sink.smt2instance01329.smt2instance07709.smt2slog_stranger_4597_sink.smt203_track_34.smt2coffee-can_lstar_non_incre_equiv_init_0_0.smt2instance11261.smt2slog_stranger_5032_sink.smt2benchmark_0406.smt2instance14837.smt2instance03570.smt2instance06516.smt2unsolved_pcp_instance_181.smt2instance04518.smt2instance11386.smt2instance10627.smt2slog_stranger_2475_sink.smt2instance01542.smt2instance03846.smt2instance02004.smt2instance14567.smt2pcp_instance_226.smt2instance13782.smt2instance03029.smt2instance06065.smt2instance10834.smt2slog_stranger_5416_sink.smt2instance00299.smt2instance04520.smt2not-contains-1-5-6-125.smt2instance11056.smt2instance08238.smt2Lehmann-Rabin_lstar_non_incre_equiv_bad_0_1.smt2instance02951.smt2instance06924.smt2Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions