I built a graph engine in CUE. Here's where it shines and where it breaks. #4288
quicue
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've spent the last several months building a dependency graph engine
entirely in CUE. Not configuration management. Not Kubernetes. Typed
DAGs where unification does the analysis — critical path scheduling,
gap analysis, compliance validation, provenance tracing, access
policies, risk scoring, and about 70 other patterns.
It's published as a CUE module (
apercue.ca@v0), has 5 workedexamples, a project scaffolder, CI that validates everything, and
I submitted use cases to a few W3C Community Groups — two merged
into KG-Construct so far. I wanted to share what I've learned pushing CUE into this
territory — what works beautifully, and where I hit real walls.
Start here: a recipe as a dependency graph
Beef bourguignon. 17 steps — ingredients, prep, cooking — all typed
nodes with
depends_onedges:Run it:
Critical path: 205 minutes. The braise dominates (150 min). Prep steps
have up to 172 minutes of slack — so you can dice onions anytime in the
first 2.5 hours. All computed from the dependency structure at eval time.
Same graph also answers: "are all ingredients present?" (gap analysis),
"do cook steps actually depend on something?" (compliance), "what's the
topology?" (layer grouping). One
_stepsstruct, many projections.What the engine actually does
The core is a
#Graphpattern. You give it typed resources withdepends_onedges, it computes:Then ~63 projection patterns consume that graph. Each takes
Graph: #AnalyzableGraphas input and produces a different analysis:#CriticalPath#ComplianceCheck#GapAnalysis#ProvenanceTrace#ODRLPolicy#ValidationCredential#DCATCatalog#SinglePointsOfFailure#BlastRadius#ImpactQuery#GraphDiff#DriftReport#FederatedMerge#CycleDetector#MermaidDiagram#GraphvizDiagramPlus scheduling patterns, risk scoring, bootstrap planning, lifecycle
phases, type validation, schema alignment... 63 definitions across 13
files, about 4,000 lines of CUE total (core packages).
Key CUE patterns
Struct-as-set for types
Resources use structs for type membership instead of lists:
resource["@type"]["Seasoning"] != _|_{Produce: true} & {Seasoning: true}just worksbinding is set intersection
This is the foundation. Every pattern dispatches on
@typefieldpresence. A resource with
{Dataset: true, Governed: true}matchesa data catalog pattern (serves Dataset) AND a policy pattern (serves
Governed) simultaneously.
Transitive closure via recursive struct merge
This is the most CUE-specific thing in the project:
Each node accumulates its parents' ancestors through struct
unification.
[_]: trueconstrains all values. Duplicates unifycleanly (
true & true = true). Result: every node knows its fulltransitive ancestry.
This is what makes impact analysis, critical path, and gap analysis
work — they're all cheap comprehensions over precomputed ancestor sets.
Comprehensions as projections
Every analysis pattern follows the same shape:
Swap the body, get a different output. The
#AnalyzableGraphinterfaceis what makes this composable — both
#Graph(full computation) and#GraphLite(with precomputed topology) satisfy it, so all 75 patternswork with either.
Charter system
A
#Charterdeclares what a project needs to be complete:#GapAnalysisunifies the charter against the actual graph andreports: which gates are satisfied, which resources are missing, which
types aren't covered. If you
cue veta project that doesn't satisfyits charter, it fails. Project completeness is a type check.
Where CUE breaks
This is the part I think CUE contributors will care about most.
No memoization on recursive struct references
The
_ancestorscomputation? Beautiful on trees. Exponential on densediamond DAGs. If C depends on A and B, and both depend on D, then D's
ancestors get recomputed through both paths. Graph shape matters —
wide topologies (depth ~5-10) handle 60+ nodes natively, but dense
diamonds hit a wall around 35-40.
Workaround: precompute externally.
A Python
toposort.pydoes the expensive parts, CUE consumes theresult.
#GraphLiteskips recursion entirely. It works, but it meansthe engine has a Python dependency for full transitive closure on
large dense graphs.
Most patterns — validation, depth, grouping, scheduling — scale to
1000+ nodes because they don't need transitive closure. The boundary
for full
_ancestorscomputation depends on graph shape: wide treeshandle 60+ nodes natively, dense diamonds hit the limit around 35-40.
I run a real datacenter topology (69 nodes, wide tree) without
precomputation.
Question for CUE contributors: is there a path toward memoized
evaluation of recursive struct references? Even opt-in memoization
would eliminate the Python dependency entirely.
Both
ifbranches always evaluateBoth branches run regardless. Can't short-circuit the expensive path
when precomputed data exists. Had to create separate
#Graph(withrecursion) and
#GraphLite(without) to avoid paying for both.Comprehension-level vs body-level
ifThis filters elements:
This produces empty structs for non-matches:
The second form bit me many times. Comprehension-level
iffilters;body-level
ifdoesn't remove the element. Once you internalize thisit's fine, but it's a real gotcha.
The ecosystem
The graph engine (
apercue.ca) is the generic layer. Other projectsimport it as a CUE module:
Each downstream project declares domain-specific resources and types.
The patterns, projections, and analysis all come from the shared module.
Try it
The 52-node governance example (
gc-llm-governance/) is the stresstest — it uses precomputed topology and produces output across 8
different analysis dimensions.
Import it
The module is published as
apercue.ca@v0:The
scaffold.shtool generates a starter project with the graph,charter, and compliance patterns already wired up.
External validation
The graph output happens to be valid JSON-LD — the engine maps CUE
field names to standard vocabulary terms via a
@context. I submitteduse cases to a few W3C Community Groups working on knowledge graph
construction and data governance — one accepted so far:
I mention this not for the W3C angle but because it validates that
the CUE output is structurally conformant to external specifications.
The patterns produce real, standards-compliant data — not toy output.
github.com/quicue/apercue — Apache 2.0,
4,000+ lines of CUE, 75 pattern definitions across 20 files, 5 examples, CI, scaffolder.
Happy to dig into any of the patterns, performance workarounds, the
module publishing setup, or how the charter system works.
Beta Was this translation helpful? Give feedback.
All reactions