Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses WAL free-list corruption auto-healing (Issue #1940) and concurrently introduces vector/BSON vector support, query enhancements (multi-ordering, grouping), shared-mutex hardening, and a new ReproRunner/CI workflow setup.
Changes:
- Add auto-heal regression test for legacy corrupt WAL free list and adjust disk/WAL flushing & test hooks.
- Introduce
BsonType.Vectorend-to-end (serialization, JSON output, expression operator/function, client vector APIs, vector index surface tests). - Add infrastructure upgrades: shared mutex naming/factory, query ordering/grouping API changes, GitVersion build plumbing, and ReproRunner tooling.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| LiteDB/Engine/Engine/Delete.cs | Deletes vector index entries when deleting documents. |
| LiteDB/Engine/Disk/Serializer/BufferWriter.cs | Adds vector write helpers; refactors string writers into partials. |
| LiteDB/Engine/Disk/Serializer/BufferWriter.NetStd.cs | Provides string-writing implementations for BufferWriter. |
| LiteDB/Engine/Disk/Serializer/BufferReader.cs | Adds vector read helpers; refactors string readers into partial type. |
| LiteDB/Engine/Disk/Serializer/BufferReader.NetStd.cs | Provides string-reading implementations for BufferReader. |
| LiteDB/Engine/Disk/DiskService.cs | Enables testing hook in more builds; flushes WAL writes. |
| LiteDB/Engine/Disk/DiskReader.cs | Enables testing hook in more builds. |
| LiteDB/Document/Json/JsonWriter.cs | Serializes BsonType.Vector as a JSON array. |
| LiteDB/Document/Expression/Parser/BsonExpressionType.cs | Adds VectorSim expression type. |
| LiteDB/Document/Expression/Parser/BsonExpressionParser.cs | Adds VECTOR_SIM function/operator parsing and scalar-result support. |
| LiteDB/Document/Expression/Parser/BsonExpressionOperators.cs | Fixes collation equality in IN; exposes VECTOR_SIM. |
| LiteDB/Document/Expression/Parser/BsonExpressionFunctions.cs | Adds function binding for VECTOR_SIM. |
| LiteDB/Document/Expression/Methods/Vector.cs | Implements cosine-distance computation for vectors/arrays. |
| LiteDB/Document/BsonVector.cs | Introduces BsonVector wrapper type. |
| LiteDB/Document/BsonValue.cs | Adds vector storage/accessors, implicit conversions, compare/size logic. |
| LiteDB/Document/BsonType.cs | Adds Vector BSON type code (100). |
| LiteDB/Document/BsonArray.cs | Adds copy constructor. |
| LiteDB/Client/Vector/VectorIndexOptions.cs | Adds vector index options model. |
| LiteDB/Client/Vector/VectorDistanceMetric.cs | Adds vector distance metric enum. |
| LiteDB/Client/Vector/LiteRepositoryVectorExtensions.cs | Adds repository extension surface for vector indexes. |
| LiteDB/Client/Vector/LiteQueryableVectorExtensions.cs | Adds query extensions for vector search (near/top-k). |
| LiteDB/Client/Vector/LiteCollectionVectorExtensions.cs | Adds collection extension surface for vector indexes. |
| LiteDB/Client/Structures/QueryAny.cs | Adds EndsWith/Contains helpers for ANY queries. |
| LiteDB/Client/Structures/Query.cs | Refactors All ordering; adds EndsWith helper and LIKE formatting tweaks. |
| LiteDB/Client/SqlParser/Commands/Select.cs | Supports multiple ORDER BY terms in SQL select parsing. |
| LiteDB/Client/Shared/SharedMutexNameFactory.cs | Adds robust, length-safe mutex name generation strategies. |
| LiteDB/Client/Shared/SharedMutexFactory.cs | Adds Windows ACL-aware global mutex creation and platform fallbacks. |
| LiteDB/Client/Shared/SharedEngine.cs | Switches shared mutex construction to new factory; adds vector index surface. |
| LiteDB/Client/Mapper/MemberMapper.cs | Tracks DbRef collection name and aligns IsDbRef behavior. |
| LiteDB/Client/Mapper/Linq/TypeResolver/MemoryExtensionsResolver.cs | Adds LINQ resolver for MemoryExtensions.Contains to ANY syntax. |
| LiteDB/Client/Mapper/Linq/TypeResolver/GroupingResolver.cs | Adds grouping-aware resolver mappings (Key/Count/etc). |
| LiteDB/Client/Mapper/BsonRefId.cs | Adds helper type for DbRef id assignment in expression trees. |
| LiteDB/Client/Mapper/BsonMapper.cs | Refactors DbRef registration and type discriminator serialization. |
| LiteDB/Client/Mapper/BsonMapper.Serialize.cs | Refactors serialize helpers; adds SerializeTypeName. |
| LiteDB/Client/Mapper/BsonMapper.Grouping.cs | Adds (de)serialization support for grouping results. |
| LiteDB/Client/Mapper/BsonMapper.Deserialize.cs | Improves dictionary key deserialization via TypeConverter. |
| LiteDB/Client/Database/LiteRepository.cs | Adds internal vector index entry points and obsolete shims. |
| LiteDB/Client/Database/LiteGrouping.cs | Adds concrete grouping materialization type. |
| LiteDB/Client/Database/LiteDatabase.cs | Forces eager checkpointing for stream-backed DB without log stream; restores on dispose. |
| LiteDB/Client/Database/ILiteQueryable.cs | Adds ThenBy/ThenByDescending and grouping API; adjusts Select return types. |
| LiteDB/Client/Database/Collections/Index.cs | Adds vector index creation API and obsolete shims; toggles multi-key conversion. |
| LiteDB.Tests/Utils/Models/Zip.cs | Adds payload field for synthetic rebuild sizing. |
| LiteDB.Tests/Utils/IsExternalInit.cs | Adds IsExternalInit shim for non-NETCOREAPP targets. |
| LiteDB.Tests/Utils/Faker.cs | Adjusts bool random helper implementation. |
| LiteDB.Tests/Utils/DatabaseFactory.cs | Centralizes DB creation for tests across in-memory/disk targets. |
| LiteDB.Tests/Utils/CpuBoundFactAttribute.cs | Adds CPU-gated xUnit attribute. |
| LiteDB.Tests/Shared/SharedDemoDatabaseCollection.cs | Adds shared DB fixture for non-parallel test scenarios. |
| LiteDB.Tests/Query/Where_Tests.cs | Adds timeouts/logging for query tests and async wrappers. |
| LiteDB.Tests/Query/VectorExtensionSurface_Tests.cs | Adds tests for vector query/index surface integration. |
| LiteDB.Tests/Query/Select_Tests.cs | Uses shared DB factory in select tests. |
| LiteDB.Tests/Query/QueryApi_Tests.cs | Adds Query API tests for StartsWith/EndsWith/Contains. |
| LiteDB.Tests/Query/Data/PersonQueryData.cs | Uses shared DB factory for query data setup. |
| LiteDB.Tests/Mapper/Mapper_Tests.cs | Uses shared DB factory in mapper tests. |
| LiteDB.Tests/Mapper/LinqBsonExpression_Tests.cs | Adds DbRef member-init tests using BsonRefId. |
| LiteDB.Tests/LiteDB.Tests.csproj | Multitargets tests, adds resources & testing constant, updates package refs. |
| LiteDB.Tests/Issues/Pull2468_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/IssueCheckpointFlush_Tests.cs | Adds regression tests for stream checkpoint/flush behavior. |
| LiteDB.Tests/Issues/Issue546_Tests.cs | Adds regression test for dictionary key types. |
| LiteDB.Tests/Issues/Issue2570_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue2534_Tests.cs | Serializes test execution via shared fixture. |
| LiteDB.Tests/Issues/Issue2494_Tests.cs | Uses shared DB factory for disk upgrade regression. |
| LiteDB.Tests/Issues/Issue2471_Test.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue2458_Tests.cs | Uses shared DB factory; fixes stream write for older TFMs. |
| LiteDB.Tests/Issues/Issue2298_Tests.cs | Adds TFM gating for System.Text.Json scenario; uses shared DB factory. |
| LiteDB.Tests/Issues/Issue2265_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue2129_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue2127_Tests.cs | Uses shared DB factory; replaces Contains with IndexOf for compatibility. |
| LiteDB.Tests/Issues/Issue1940_Tests.cs | Adds auto-heal regression test for corrupt WAL free list. |
| LiteDB.Tests/Issues/Issue1865_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue1860_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue1838_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue1701_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue1695_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Issues/Issue1651_Tests.cs | Uses shared DB factory for issue regression tests. |
| LiteDB.Tests/Internals/Sort_Tests.cs | Updates sort service invocation for multi-order signature; adjusts dataset sizing. |
| LiteDB.Tests/Internals/ExtendedLength_Tests.cs | Uses shared DB factory for internal tests. |
| LiteDB.Tests/Internals/Document_Tests.cs | Fixes Count usage for document CopyTo sizing. |
| LiteDB.Tests/Internals/BufferWriter_Tests.cs | Adds multi-segment terminator regression tests for CString/spec strings. |
| LiteDB.Tests/Engine/UserVersion_Tests.cs | Uses shared DB factory for disk tests. |
| LiteDB.Tests/Engine/Recursion_Tests.cs | Serializes execution via shared fixture. |
| LiteDB.Tests/Engine/Rebuild_Tests.cs | Refactors rebuild tests to deterministic synthetic dataset + payload. |
| LiteDB.Tests/Engine/Rebuild_Crash_Tests.cs | Improves fault injection determinism and test diagnostics. |
| LiteDB.Tests/Engine/Index_Tests.cs | Uses shared DB factory for index tests and in-memory connection strings. |
| LiteDB.Tests/Engine/Collation_Tests.cs | Updates query ordering construction to new multi-order API. |
| LiteDB.Tests/Document/Json_Tests.cs | Adds JSON vector output test. |
| LiteDB.Tests/Document/Implicit_Tests.cs | Disambiguates BsonValue references in tests. |
| LiteDB.Tests/Document/Decimal_Tests.cs | Disambiguates BsonValue references in tests. |
| LiteDB.Tests/Database/Writing_While_Reading_Test.cs | Uses shared DB factory for disk tests. |
| LiteDB.Tests/Database/Upgrade_Tests.cs | Uses shared DB factory for upgrade tests; removes unused using. |
| LiteDB.Tests/Database/Storage_Tests.cs | Uses shared DB factory for storage tests. |
| LiteDB.Tests/Database/Snapshot_Upgrade_Tests.cs | Uses shared DB factory for snapshot tests. |
| LiteDB.Tests/Database/Site_Tests.cs | Uses shared DB factory for sample tests. |
| LiteDB.Tests/Database/Query_Min_Max_Tests.cs | Uses shared DB factory for query tests. |
| LiteDB.Tests/Database/NonIdPoco_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests/Database/MultiKey_Mapper_Tests.cs | Uses shared DB factory; expands ANY LIKE helpers coverage. |
| LiteDB.Tests/Database/IndexSortAndFilter_Tests.cs | Uses shared DB factory; removes temp file management. |
| LiteDB.Tests/Database/FindAll_Tests.cs | Uses shared DB factory for disk tests. |
| LiteDB.Tests/Database/Document_Size_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests/Database/Delete_By_Name_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests/Database/DeleteMany_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests/Database/Database_Pragmas_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests/Database/Create_Database_Tests.cs | Uses shared DB factory for disk tests. |
| LiteDB.Tests/Database/AutoId_Tests.cs | Uses shared DB factory for DB tests. |
| LiteDB.Tests.SharedMutexHarness/SharedMutexHarness.csproj | Adds cross-process harness project for shared mutex validation. |
| LiteDB.Tests.SharedMutexHarness/README.md | Documents how to run the shared mutex harness. |
| LiteDB.Stress/Test/TestExecution.cs | Makes stress threads background threads. |
| LiteDB.Stress/LiteDB.Stress.csproj | Normalizes target framework string. |
| LiteDB.Shell/LiteDB.Shell.csproj | Updates shell to net8.0 and removes hard-coded version metadata. |
| LiteDB.ReproRunner/Repros/Issue_2614_DiskServiceDispose/repro.json | Adds repro manifest for Issue 2614. |
| LiteDB.ReproRunner/Repros/Issue_2614_DiskServiceDispose/README.md | Documents Issue 2614 repro. |
| LiteDB.ReproRunner/Repros/Issue_2614_DiskServiceDispose/Issue_2614_DiskServiceDispose.csproj | Adds Issue 2614 repro project. |
| LiteDB.ReproRunner/Repros/Issue_2586_RollbackTransaction/repro.json | Adds repro manifest for Issue 2586. |
| LiteDB.ReproRunner/Repros/Issue_2586_RollbackTransaction/README.md | Documents Issue 2586 repro. |
| LiteDB.ReproRunner/Repros/Issue_2586_RollbackTransaction/Issue_2586_RollbackTransaction.csproj | Adds Issue 2586 repro project. |
| LiteDB.ReproRunner/Repros/Issue_2561_TransactionMonitor/repro.json | Adds repro manifest for Issue 2561. |
| LiteDB.ReproRunner/Repros/Issue_2561_TransactionMonitor/README.md | Documents Issue 2561 repro. |
| LiteDB.ReproRunner/Repros/Issue_2561_TransactionMonitor/Issue_2561_TransactionMonitor.csproj | Adds Issue 2561 repro project. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/ReproContext.cs | Adds environment-based repro execution context. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/ReproConfigurationReporter.cs | Adds config handshake reporter utilities. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/Messaging/ReproJsonOptions.cs | Centralizes JSON serializer options for repro messaging. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/Messaging/ReproHostMessageTypes.cs | Defines structured host message type constants. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/Messaging/ReproHostLogLevel.cs | Adds log severity enum for structured messaging. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/Messaging/ReproHostConfigurationPayload.cs | Adds configuration payload model for structured messaging. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Shared/LiteDB.ReproRunner.Shared.csproj | Adds shared repro runner library project. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Properties/AssemblyInfo.cs | Makes CLI internals visible to runner tests. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Program.cs | Adds CLI entry point and DI bootstrap. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ReproVariantOutcomeExpectations.cs | Adds expected outcome structure for repro variants. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ReproState.cs | Adds repro state enum (red/green/flaky). |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ReproOutcomeKind.cs | Adds outcome kind enum (repro/no-repro/hard-fail). |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ReproOutcomeExpectation.cs | Adds expected-outcome model. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ReproOsConstraints.cs | Adds OS constraint model for runner selection. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/ManifestValidationResult.cs | Adds manifest validation error collector. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Manifests/DiscoveredRepro.cs | Adds discovered repro model with validation results. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/LiteDB.ReproRunner.Cli.csproj | Adds repro runner CLI project. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Infrastructure/TypeRegistrar.cs | Bridges Spectre DI to Microsoft DI container. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Infrastructure/ReproRootLocator.cs | Resolves repro root directory with overrides/discovery. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Execution/RunReportModels.cs | Adds report models for run results. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/ValidateCommandSettings.cs | Adds validate command settings. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/ValidateCommand.cs | Adds validate command to check manifests. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/ShowCommandSettings.cs | Adds show command settings. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/ShowCommand.cs | Adds show command to display a manifest. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/RunCommandSettings.cs | Adds run command settings and validation. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/RootCommandSettings.cs | Adds shared --root option base class. |
| LiteDB.ReproRunner/LiteDB.ReproRunner.Cli/Commands/ListCommandSettings.cs | Adds list command settings. |
| LiteDB.ReproRunner.Tests/ReproOutcomeEvaluatorTests.cs | Adds unit tests for outcome evaluation logic. |
| LiteDB.ReproRunner.Tests/ReproHostClientTests.cs | Adds unit tests for structured host client messaging. |
| LiteDB.ReproRunner.Tests/ReproExecutorTests.cs | Adds unit tests for executor structured output parsing. |
| LiteDB.ReproRunner.Tests/ManifestValidatorTests.cs | Adds unit tests for manifest validation behaviors. |
| LiteDB.ReproRunner.Tests/LiteDB.ReproRunner.Tests.csproj | Adds test project for repro runner components. |
| LiteDB.ReproRunner.Tests/CliApplicationTests.cs | Adds tests for validate command exit codes/output. |
| LiteDB.Demo.Tools.VectorSearch/Utilities/VectorMath.cs | Adds cosine similarity/distance helpers for demo tool. |
| LiteDB.Demo.Tools.VectorSearch/Readme.md | Documents vector search demo CLI usage. |
| LiteDB.Demo.Tools.VectorSearch/Program.cs | Adds vector demo tool entry point. |
| LiteDB.Demo.Tools.VectorSearch/Models/IndexedDocumentChunk.cs | Adds demo model for chunked documents + embeddings. |
| LiteDB.Demo.Tools.VectorSearch/Models/IndexedDocument.cs | Adds demo model for documents + embeddings. |
| LiteDB.Demo.Tools.VectorSearch/LiteDB.Demo.Tools.VectorSearch.csproj | Adds demo project dependencies and target framework. |
| LiteDB.Demo.Tools.VectorSearch/Embedding/IEmbeddingService.cs | Adds embedding service interface contract for demo. |
| LiteDB.Demo.Tools.VectorSearch/Configuration/GeminiEmbeddingOptions.cs | Adds Gemini/Vertex embedding configuration model. |
| LiteDB.Demo.Tools.VectorSearch/Commands/VectorSearchCommandSettings.cs | Adds shared CLI settings for ingest/search commands. |
| LiteDB.Benchmarks/Models/Generators/FileMetaGenerator.cs | Populates synthetic vectors in benchmark data. |
| LiteDB.Benchmarks/Models/FileMetaBase.cs | Adds vector field to benchmark model. |
| LiteDB.Benchmarks/LiteDB.Benchmarks.csproj | Updates benchmarks to net8.0 and latest language version. |
| LiteDB.Benchmarks/Benchmarks/Queries/QueryWithVectorSimilarity.cs | Adds benchmarks comparing vector indexed vs unindexed queries. |
| GitVersion.yml | Adds GitVersion configuration for semver computation. |
| Directory.Build.props | Enables GitVersion MSBuild integration and version property wiring. |
| ConsoleApp1/ConsoleApp1.csproj | Allows switching between local project and package reference. |
| AGENTS.md | Adds repository guidelines documentation. |
| .github/workflows/tag-version.yml | Adds workflow to create annotated version tags. |
| .github/workflows/publish-prerelease.yml | Adds prerelease publishing pipeline using GitVersion. |
| .github/workflows/ci.yml | Adds CI workflow using reusable pipelines + reprorunner. |
| .github/os-matrix.json | Adds OS matrix configuration for workflows. |
| .config/dotnet-tools.json | Adds GitVersion dotnet tool configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
VECTOR_SIM reads like it returns a similarity score, but the implementation returns a cosine distance (1 - similarity). Consider renaming the method/operator to reflect distance (e.g., VECTOR_DIST / COSINE_DISTANCE) or change the return value to be the similarity (dot / (|a||b|)) and keep distance as a separate operator/function to avoid API confusion.
There was a problem hiding this comment.
VECTOR_SIM reads like it returns a similarity score, but the implementation returns a cosine distance (1 - similarity). Consider renaming the method/operator to reflect distance (e.g., VECTOR_DIST / COSINE_DISTANCE) or change the return value to be the similarity (dot / (|a||b|)) and keep distance as a separate operator/function to avoid API confusion.
| var similarity = dot / (Math.Sqrt(magQ) * Math.Sqrt(magC)); | |
| return double.IsNaN(similarity) ? BsonValue.Null : similarity; |
There was a problem hiding this comment.
If BsonValue already defines a virtual Clone() (common for BSON value types), this method currently hides it rather than overriding, which can break cloning via base-class references. Change this to an override (and keep the deep-clone behavior) so polymorphic clone calls correctly clone vector values.
| public override BsonValue Clone() |
There was a problem hiding this comment.
member.DbRefCollectionName can be null when users set member.IsDbRef = true via custom mapping without also providing the referenced collection name. This will serialize \"$ref\": null (or potentially throw later) and produce invalid DbRef data. Add an explicit validation here (or earlier during mapping) that throws a clear NotSupportedException when DbRefCollectionName is missing, so the failure mode is deterministic and actionable.
There was a problem hiding this comment.
This test writes to a hard-coded on-disk file (demo.db) in the working directory, which can cause flakiness and leave artifacts behind (especially under parallel test runners / multi-targeting). Use a temp file (e.g., TempFile) or DatabaseFactory.Create() with an in-memory connection string to keep tests isolated and deterministic.
| using LiteDatabase dataBase = new(":memory:"); |
There was a problem hiding this comment.
This documentation is now inconsistent with the build changes in this PR: the repo introduces GitVersion (GitVersion.yml, Directory.Build.props, dotnet tool config, workflows) rather than MinVer. Update this section to reflect GitVersion as the version source so contributors don't follow outdated release instructions.
| Semantic versions are derived by GitVersion from Git history and the `GitVersion.yml` configuration; create annotated tags like `v6.0.0` on the main branch and let GitVersion flow those into builds rather than editing project files manually. Before tagging, ensure Release builds are clean, pack outputs land in `artifacts_temp/`, and update any shell or benchmark usage notes affected by the change. Update this guide whenever you discover repository practices worth sharing. |
💡 Codex ReviewLiteDB/LiteDB/Engine/Services/VectorIndexService.cs Lines 94 to 96 in 0dc995c In
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds auto-healing for legacy corrupted free-empty page lists in LiteDB databases running in WAL mode. Recovery methods detect and repair corrupted page allocation metadata during database open, with corresponding unit tests validating the repair process using a pre-corrupted dataset. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can generate a title for your PR based on the changes.Add |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
LiteDB/Engine/LiteEngine.cs (1)
143-147: This compatibility scan is now on the startup hot path for every WAL open.
HealCorruptedFreeEmptyPageList()walks the entire free-empty chain beforeOpen()completes, so healthy databases with large free lists now pay O(n) extra page reads on every startup. Since this is a legacy-corruption fix, consider gating it to affected files/versions or deferring validation until allocation actually touches the list.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@LiteDB/Engine/LiteEngine.cs` around lines 143 - 147, The call to HealCorruptedFreeEmptyPageList() in LiteEngine.Open startup is causing an O(n) scan on every WAL open; change the startup path to avoid always walking the free-empty chain by gating or deferring the call: detect legacy/affected databases (e.g. check header version/flags exposed by _header or _walIndex.RestoreIndex) and call HealCorruptedFreeEmptyPageList() only when those indicators show possible legacy corruption, or remove it from the Open hot path and invoke it lazily from allocation routines that traverse the free-empty list (so HealCorruptedFreeEmptyPageList() runs only when allocation touches the list). Ensure you update the logic around the call site in LiteEngine.Open (where _walIndex.RestoreIndex and this.HealCorruptedFreeEmptyPageList are invoked) to implement the chosen gating/deferment approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@LiteDB.Tests/Issues/Issue1940_Tests.cs`:
- Around line 23-25: After extracting the ZIP with ZipFile.ExtractToDirectory,
add explicit assertions that the expected fixture files were actually extracted:
check File.Exists(Path.Combine(tempDirectory, "issue1940.db")) and
File.Exists(Path.Combine(tempDirectory, "issue1940-log.db")) (and optionally
that their lengths > 0) immediately after the ExtractToDirectory call so
LiteEngine.Open / new LiteDatabase(databasePath) will run against the real
WAL/db files; apply the same existence checks to the other extraction sites
referenced around lines 40-43 to prevent regressions in ZIP layout.
In `@LiteDB/Engine/Engine/Recovery.cs`:
- Around line 33-49: The scan currently calls ReadLatestPage(...) outside the
try/finally so any read or decode exception escapes Open() instead of being
treated as corruption; move the try (and its finally that releases page.Buffer)
to wrap the ReadLatestPage(...) invocation so that exceptions thrown by
ReadLatestPage or subsequent page parsing are caught and handled by calling
RepairFreeEmptyPageList(current, page.PageType) (or the appropriate overload)
and then rethrow or return as the existing logic expects; apply the same change
to the other loop block referenced (the similar code at lines handling the
second scan) so both page-read/page-decode failures are routed through
RepairFreeEmptyPageList(...) and buffer.Release() still runs in the finally.
---
Nitpick comments:
In `@LiteDB/Engine/LiteEngine.cs`:
- Around line 143-147: The call to HealCorruptedFreeEmptyPageList() in
LiteEngine.Open startup is causing an O(n) scan on every WAL open; change the
startup path to avoid always walking the free-empty chain by gating or deferring
the call: detect legacy/affected databases (e.g. check header version/flags
exposed by _header or _walIndex.RestoreIndex) and call
HealCorruptedFreeEmptyPageList() only when those indicators show possible legacy
corruption, or remove it from the Open hot path and invoke it lazily from
allocation routines that traverse the free-empty list (so
HealCorruptedFreeEmptyPageList() runs only when allocation touches the list).
Ensure you update the logic around the call site in LiteEngine.Open (where
_walIndex.RestoreIndex and this.HealCorruptedFreeEmptyPageList are invoked) to
implement the chosen gating/deferment approach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f599c9f3-9790-4421-b529-49a0f2449493
⛔ Files ignored due to path filters (1)
LiteDB.Tests/Resources/Issue1940_CorruptFreeEmptyList.zipis excluded by!**/*.zip
📒 Files selected for processing (4)
LiteDB.Tests/Issues/Issue1940_Tests.csLiteDB.Tests/LiteDB.Tests.csprojLiteDB/Engine/Engine/Recovery.csLiteDB/Engine/LiteEngine.cs
| ZipFile.ExtractToDirectory( | ||
| Path.Combine(AppContext.BaseDirectory, "Resources", "Issue1940_CorruptFreeEmptyList.zip"), | ||
| tempDirectory); |
There was a problem hiding this comment.
Assert that the corrupt fixture was really extracted.
LiteEngine.Open() only enters the new heal path when the WAL exists, but this test never proves that issue1940.db and issue1940-log.db came out of the archive. If the zip layout regresses, new LiteDatabase(databasePath) just creates a clean database and this still passes.
Add explicit fixture checks right after extraction
ZipFile.ExtractToDirectory(
Path.Combine(AppContext.BaseDirectory, "Resources", "Issue1940_CorruptFreeEmptyList.zip"),
tempDirectory);
+
+ File.Exists(databasePath).Should().BeTrue("the corrupt datafile fixture must be present");
+ File.Exists(logPath).Should().BeTrue("the auto-heal path only runs when the WAL fixture is present");
+ new FileInfo(logPath).Length.Should().BeGreaterThan(0, "the fixture should start with a non-empty WAL");Also applies to: 40-43
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@LiteDB.Tests/Issues/Issue1940_Tests.cs` around lines 23 - 25, After
extracting the ZIP with ZipFile.ExtractToDirectory, add explicit assertions that
the expected fixture files were actually extracted: check
File.Exists(Path.Combine(tempDirectory, "issue1940.db")) and
File.Exists(Path.Combine(tempDirectory, "issue1940-log.db")) (and optionally
that their lengths > 0) immediately after the ExtractToDirectory call so
LiteEngine.Open / new LiteDatabase(databasePath) will run against the real
WAL/db files; apply the same existence checks to the other extraction sites
referenced around lines 40-43 to prevent regressions in ZIP layout.
| var page = this.ReadLatestPage(current, reader); | ||
|
|
||
| try | ||
| { | ||
| if (page.PageType != PageType.Empty) | ||
| { | ||
| this.RepairFreeEmptyPageList(current, page.PageType); | ||
| return; | ||
| } | ||
|
|
||
| current = page.NextPageID; | ||
| } | ||
| finally | ||
| { | ||
| page.Buffer.Release(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Unreadable pages still bypass the heal path.
Because the try begins after ReadLatestPage(current, reader) returns, any read/parse failure from the corrupt chain still escapes Open() instead of resetting the free list. Please treat page-read/page-decode failures during this scan as corruption and route them through RepairFreeEmptyPageList(...) too.
Also applies to: 52-61
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@LiteDB/Engine/Engine/Recovery.cs` around lines 33 - 49, The scan currently
calls ReadLatestPage(...) outside the try/finally so any read or decode
exception escapes Open() instead of being treated as corruption; move the try
(and its finally that releases page.Buffer) to wrap the ReadLatestPage(...)
invocation so that exceptions thrown by ReadLatestPage or subsequent page
parsing are caught and handled by calling RepairFreeEmptyPageList(current,
page.PageType) (or the appropriate overload) and then rethrow or return as the
existing logic expects; apply the same change to the other loop block referenced
(the similar code at lines handling the second scan) so both
page-read/page-decode failures are routed through RepairFreeEmptyPageList(...)
and buffer.Release() still runs in the finally.
Fixes #1940 (comment)
Summary by CodeRabbit
Bug Fixes
Tests