Implement XQuery 4.0 parser, functions, and runtime support#6139
Implement XQuery 4.0 parser, functions, and runtime support#6139joewiz wants to merge 20 commits intoeXist-db:developfrom
Conversation
99bab01 to
59cca34
Compare
|
[This comment was co-authored with Claude Code. -Joe] XQuery 4.0 Functions Status (updated 2026-03-16)Implemented (19 of 27)
Remaining unimplemented (8 of 27)
Summary: 19 implemented (177 XQTS tests, many at 100%). 8 remaining: 1 partially unblocked, 2 schema-blocked, 4 JNode-blocked. |
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal. fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg. fn:deep-equal: full XQ4 options engine, text node merging. fn:every/fn:some, fn:all-equal/different, fn:atomic-equal, fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right, fn:contains/starts-with/ends-with-subsequence. Fix: SequenceComparator o2Count typo, AtomicValueComparator cause preservation, Collations instanceof for non-RuleBasedCollator, BigInteger comparison via string (not truncating getLong()). XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri, fn:insert-separator, fn:replicate Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName, fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of, fn:is-NaN, fn:identity, fn:void Nav: fn:transitive-closure, fn:element-to-map, fn:siblings, fn:in-scope-namespaces, fn:distinct/ordered-nodes Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op, fn:subsequence-where Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime, fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible) Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest XQTS: fn-graphemes 1086/1189, fn-characters 45/45, misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
array:slice (4 overloads), array:index-where, array:sort-with, array:sort-by, array:empty, array:foot, array:trunk, array:items, array:members, array:build, array:index-of, array:of-members, array:split. Fix array:sort ClassCastException unwrap, ArraySortBy key validation, ArraySortWith RuntimeException unwrap. XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6, array-items 8/8, math-cosh/sinh/tanh 27/27 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh. Euler's number constant via Math.E. XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.
FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.
XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fractional seconds: left-aligned digit semantics. Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman. Timezone: picture-driven rewrite with digit family support. Era [E]/[C], calendar validation, grouping separators, optional digit validation, ordinal suffix teens fix, whitespace stripping, military TZ "J", name width truncation (max not min). XQTS: format-time 46→77/92, format-date 79→111/133 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…d-text, fn:json-doc Resolve relative URIs against file: base URI with direct file: handling. Only allow direct file: access for URIs resolved from relative paths (absolute file: URIs go through SourceFactory security checks). Separate FOJS0001 from FOUT1170 in fn:json-doc. Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text. XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json. Custom streaming CSV parser with configurable delimiter, quote char, header handling, and column naming. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- fnXQuery40.xql: tests for 50+ new XQ4 functions - deep-equal-options-test.xq: deep-equal options engine tests - Re-enable arr:get-invalid-type (XPTY0004 now works) - Update json-to-xml pending comments - fn:replace test updates Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Parser and tree walker extensions for XQ4: focus functions, keyword args, string templates, pipeline, mapping arrow, for member, otherwise, braced if, while, try/finally, ternary, QName/hex/binary literals, array/map filter, choice/union/enum types, method call, let destructure, fn() shorthand, record types, gnode(), 4 new axes, reservedKeywords sub-rules, expr split for code-too-large fix. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
New expression classes: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType. Modified: Function (keyword arg resolution), FunctionFactory (XQ4 no-namespace override, unknown type XPST0017), FunctionSignature (default params), UserDefinedFunction (default param binding), TryCatchExpression (finally), SwitchExpression (XQ4 version gating), StringConstructor (atomization fixes), XQueryContext (version 4.0, XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes), LocationStep (or-self axis evaluation with document node guard). Type infrastructure: Type.RECORD constant, SequenceType.RecordField, record type structural checking, record(*) and record() support. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files) - DoubleValue: NaN/INF→integer/decimal throws FOCA0002 - DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as) - DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as) - CastExpression: xs:anySimpleType→XPST0080 (was XPST0051) - StringValue: validation errors→FORG0001 (was generic ERROR) - Base64BinaryValueType: FORG0001 with proper ErrorCode - ErrorCodes: added convenience constructor XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Compile modules from provided source strings instead of loading from URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed version compatibility check for content-loaded modules. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Parse invisible XML grammars using the Markup Blitz iXML library. Two signatures: fn:invisible-xml(grammar) returns a parsing function, and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml with Markup Blitz dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Primitive long start/end instead of IntegerValue objects. Pre-computed size with overflow protection. O(1) count/isEmpty/contains. Prevents OOM on large ranges like 1 to 10000000000. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max (comparison function), fn:deep-equal (options map), fn:matches/ fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace (function replacement, ! flag), fn:round (3-arg mode). Collations: supplementary codepoint fix, ASCII case-insensitive collator. InspectModule: keyword arg introspection. DocUtils: URI resolution. Parameter name alignment across 59 fn: module files to match W3C XQuery 4.0 Functions and Operators catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Comprehensive fnXQuery40.xql with tests for all XQ4 features. Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm, InspectModuleTest.java. New deep-equal-options-test.xq and fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql. Updated map ordering test assertions for LinkedHashMap insertion order. XQSuite: 1341 tests, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
4a80095 to
0549468
Compare
|
[This comment was co-authored with Claude Code. -Joe] CI Status NotesSendEmailIT failure (macOS/ubuntu/windows integration): W3C XQTS CI failure: Unit tests: All pass (ubuntu). |
Grammar (XQuery.g): - fn() and function() type tests now accept named parameters: fn($name as xs:string, $age as xs:integer) as xs:boolean The names are parsed and discarded — only the sequence types matter for type checking. This matches the XQ4 spec. CastExpression/CastableExpression: - xs:anyType and xs:untyped now throw XPST0080 (was bypassing the abstract type check or using XPST0051) XQTS: misc-BuiltInKeywords 227→234 (+7 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The XQuery 4.0 spec requires mandatory whitespace after (# in pragma expressions: (# S EQName. This disambiguates from ( + #EQName (QName literal syntax). Previously, (# was always matched as PRAGMA_START regardless of what followed, causing function-lookup(#math:e, 0) to fail with XPST0003. Fix: PRAGMA_START now requires whitespace after (#, and the main lexer dispatch checks LA(3) for whitespace before attempting pragma matching. When (# is followed directly by a name character, the lexer matches ( as LPAREN and # as HASH separately. Added XQSuite tests for QName literals in function call arguments. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The previous fix required mandatory whitespace after (# (XQ4 spec), but this broke XQuery 3.1 pragma expressions like (#exist:optimize#) which have no whitespace after (#. New approach: isPragmaContext() scans past (# and the QName to check what follows. If followed by , or ) it's a QName literal argument (e.g., function-lookup(#math:e, 0)). Otherwise it's a pragma expression. This handles both XQ3.1 and XQ4 correctly. Fixes ValueIndexByQNameTest and ValueIndexTest failures caused by (#exist:optimize#) pragma expressions being rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary
Implements XQuery 4.0 parser and runtime support for eXist-db, covering the majority of the QT4CG specification draft syntax, 50+ new standard functions, enhanced existing functions, and W3C-compliant error codes. This brings eXist-db in line with the evolving XQuery 4.0 standard.
Based on the XQuery 4.0 Functions branch.
What Changed
1. Grammar — XQ4 syntax (XQuery.g + XQueryTree.g)
All major XQuery 4.0 syntax additions via ANTLR 2 grammar extensions:
fn { expr }name := expr`Hello {$name}`=>and mapping arrow=!>for member,whileclause,otherwise?? !!?[predicate]record(name as xs:string, age? as xs:integer, *)=?>, let destructuringfn(...)type shorthand,gnode()type test*-or-self,*-sibling-or-selfdeclare context value,xquery version "4.0"reservedKeywordssub-rules (merge-conflict reduction)exprrule split (code-too-large fix fornextbuilds)2. Expression classes (33 files)
New: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType.
Modified: Function, FunctionFactory, FunctionSignature, UserDefinedFunction, TryCatchExpression, SwitchExpression, StringConstructor, XQueryContext, Constants, LocationStep, SequenceType, Type.
3. Error code alignment (29 files)
convertTo()in 20 atomic typesDoubleValueNaN/INF castsDynamicCardinalityCheckDynamicTypeCheckTreatAsExpressionCastExpressionxs:anySimpleTypeFunctionFactoryunknown typesStringValuevalidationBase64BinaryValueType4. fn:load-xquery-module content option
XQ4
contentoption for dynamic module compilation from strings. Required by misc-Subtyping XQTS tests.5. fn:invisible-xml (Markup Blitz)
Parse invisible XML grammars using the Markup Blitz iXML library.
6. No-namespace function overriding (PR2200)
xquery version "4.0"allows declaring functions without namespace prefix, overriding fn: built-ins.7. RangeSequence optimization
Primitive long storage —
1 to 10000000000uses 24 bytes instead of OOM.8. Parameter name alignment (59 files)
W3C XQ4 catalog parameter names across fn: module for keyword argument support.
XQTS Results
QT4 XQTS results from run 22 (2026-03-16):
XQSuite: 1341 tests, 0 failures (across all test suites: 1676 tests, 0 failures)
Spec References
Limitations
Features not implemented: JNode data model, union node test syntax in axis steps, method calls (parsed but limited dispatch), version gating (XQ4 features available regardless of version declaration), XML Schema revalidation.
Test Plan
mvn teston CICo-Authored-By: Claude Opus 4.6 (1M context) [email protected]
CI Notes
Unit tests (ubuntu): Time out at 45-minute CI limit. All tests pass locally —
mvn test -pl exist-corecompletes in ~15 minutes with 1343 tests, 0 failures. The XQ4 grammar and expression class additions increase compilation and test time beyond CI's default timeout.W3C XQTS: Times out at 1-hour CI limit. XQTS compliance validated locally via exist-xqts-runner (QT4 88.0%, XQ31 91.8%).
Integration tests (macOS/ubuntu/windows): Pass. SendEmailIT flaky failure on some runs (pre-existing lexer state issue, not related to XQ4 changes — see comment).