feat(host): add trie-based host matching for scalability#122
Draft
LaurenceJJones wants to merge 2 commits intomainfrom
Draft
feat(host): add trie-based host matching for scalability#122LaurenceJJones wants to merge 2 commits intomainfrom
LaurenceJJones wants to merge 2 commits intomainfrom
Conversation
Implement a reverse domain trie for efficient host pattern matching, designed to scale for MSSP deployments with hundreds/thousands of hosts. Changes: - Add domainTrie data structure with O(m) lookup complexity - Hybrid approach: trie for simple patterns, filepath.Match fallback for complex - Priority system ensures most-specific-first matching behavior - Comprehensive tests and benchmarks Benchmark results (4 mixed lookups per iteration): | Hosts | Slice (old) | Trie (new) | Speedup | |---------|-------------|------------|--------------| | 10 | 4,901 ns | 432 ns | 11x faster | | 100 | 53,221 ns | 419 ns | 127x faster | | 1,000 | 414,463 ns | 428 ns | 968x faster | | 10,000 | 3,835,689 ns| 453 ns | 8,468x faster| Note: For small deployments (1-4 hosts), the existing cache provides sufficient performance. The trie optimization primarily benefits large-scale MSSP deployments.
5ed07f5 to
d274a85
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements a reverse domain trie for efficient host pattern matching, replacing the O(n) linear search with O(m) trie-based lookup where m is the domain depth. The optimization is designed to scale for large deployments with hundreds or thousands of host configurations.
- Introduces a hybrid matching system: trie for simple patterns (exact, prefix/suffix wildcards), filepath.Match fallback for complex patterns (middle/embedded wildcards)
- Implements a priority-based system to ensure most-specific-first matching regardless of insertion order
- Maintains backward compatibility with existing API and behavior
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/host/trie.go | New reverse domain trie implementation with priority-based matching, pattern classification, and efficient O(m) lookup |
| pkg/host/root.go | Integration of trie into Manager struct, updated MatchFirstHost to use trie, modified addHost/removeHost to manage trie and complexPatterns |
| pkg/host/root_test.go | Comprehensive integration tests covering single/multiple hosts, priority ordering, wildcards, caching, and removal |
| pkg/host/benchmark_test.go | Performance benchmarks comparing slice-based vs trie-based matching at various scales (10 to 10,000 hosts) |
| pkg/host/TRIE_IMPLEMENTATION.md | Technical documentation explaining the trie structure, matching algorithm, priority system, and pattern classification |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix exactMatchFound logic in trie findMatches - Clarify removeHost comments for complex patterns - Fix race condition: use sync.Map for thread-safe cache access - Add proper type assertion check for cache retrieval
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement a reverse domain trie for efficient host pattern matching, designed to scale for big deployments with hundreds/thousands of hosts.
Changes:
Benchmark results (4 mixed lookups per iteration):
Note: For small deployments (1-4 hosts), the existing cache provides sufficient performance. The trie optimization primarily benefits large-scale deployments.
note for team: keeping this draft until needed