feat: add NER entity extraction and knowledge graph to docsaf#10
Merged
feat: add NER entity extraction and knowledge graph to docsaf#10
Conversation
Add GliNER2-based named entity recognition to the docsaf documentation sync tool via termite's /api/recognize endpoint. Extracted entities are stored as both document metadata (for faceted keyword search) and as nodes in a knowledge graph index (for entity-based document traversal). Key changes: - New --ner-model, --ner-label, --ner-threshold, --ner-batch-size flags on prepare and sync commands - Entity extraction via termite client with batched NER inference - Normalized entity keys (entity:<label>:<name>) with unicode support - Graph index with field-based "mentions_entity" edge type that automatically creates edges from documents to referenced entities - Auto-detection of entity records in load command for graph index setup - Entity schema added to schemas.yaml
The test only severed the original leader→follower link, so a leader election after the follower restart could let the third node (now leader) replicate to the follower through an uncut link, making the require.Error convergence assertion flaky. Cut links from both peers to the follower and drop the hasSnapshotTransferEvent assertion that assumed the original leader would be the snapshot sender.
The pkg/generating module was introduced in 076d39b but missing from the e2e go.mod replace block and the Makefile GO_SUBMODULES list, causing e2e builds to fail with "unknown revision" errors.
- Split monolithic `antfly` job into `unit` (build + tests) and `e2e` (postgres + e2e tests) jobs that run in parallel with `sim-validate` - Remove ollama install, model pull, and server management — all tests that use ollama are already skipped in CI via env var guards - Remove GONOPROXY/GOPRIVATE config (antfly is now public) - Remove unnecessary `go clean -modcache` steps - Remove zstd install (Go uses bundled DataDog/zstd via CGO)
- Combine sim-validate and unit into a single job to avoid redundant runner spin-up - Fix EmbeddingsIndexConfig.Equal to compare all fields (DistanceMetric, Sparse, ChunkSize, MinWeight, TopK, Chunker) instead of only a subset - Normalize empty DistanceMetric to l2_squared default so omitempty round-trips don't cause false mismatches - Add tests for EmbeddingsIndexConfig.Equal and IndexConfig.Equal - Remove unused ai import from openapi.go
…ng, and CI Resolve merge conflicts and consolidate changes from origin/main: - Resolve conflicts in CI workflow and retrieval agent generator chain logic - Migrate resolveEffectiveGeneratorChain to use pkg/generating instead of lib/ai for ResolveGeneratorOrChain and GetDefaultChain (matching the broader refactor) - Add resolveProvider/resolveProviderName helpers for consistent provider resolution - Refactor GenerationError to pointer type with Unwrap support, preserving the original error chain through Cause field for proper errors.As/errors.Is usage - Add asGenerationErrorResponse and emitStreamError helpers to reduce repetitive error classification and streaming boilerplate in retrieval agent endpoints - Move git hook from pre-commit to pre-push, rewriting it to diff pushed commits rather than the staging area - Update CONTRIBUTING.md to reflect pre-push hook and add dependency hygiene docs - Fix retrieval_agent_test.go to use generating.Get/SetDefaultChain (not lib/ai) - Clean up redundant import alias (generating "..." where package already matches) - Fix import ordering in lib/scraping/scraping.go
kin-openapi v0.134.0 breaks oapi-codegen v2.5.1 (MappingRef type change) and panics in InternalizeRefs on schemas with empty refs. oasdiff/yaml v0.0.1 breaks kin-openapi v0.133.0 (OriginOpt API change). go-yit 20250909 pulls in go.yaml.in/yaml/v4 which breaks yaml-jsonpath. Pin all three via replace directives in root go.mod and downgrade across all sub-modules to restore a working dependency set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/api/recognizeendpointmentions_entityedge type that automatically creates edges from doc sections to referenced entity nodesNew flags (prepare/sync)
--ner-model— Termite recognizer model (e.g.,fastino/gliner2-base-v1)--ner-label— Zero-shot entity labels (repeatable)--ner-threshold— Confidence threshold (default 0.5)--ner-batch-size— Texts per NER batch (default 32)--termite-url— Termite API URL (defaulthttp://localhost:8088)Usage
Test plan
cd examples/docsaf && GOWORK=off go build ./...docsaf sync --dir ./docs --table docs --create-tabledocsaf sync --dir ./docs --table docs --create-table --ner-model fastino/gliner2-base-v1 --ner-label technology --ner-label conceptdocsaf prepare --dir ./docs --ner-model fastino/gliner2-base-v1 --ner-label technology && jq 'to_entries[] | select(.value._type == "entity")' docs.jsonknowledgegraph index formentions_entityedges--ner-modelwithout--ner-labelgives a clear error