Feature/Vector Search & Chatbot by rogerb831 · Pull Request #17 · lukasbach/pensieve

rogerb831 · 2025-09-13T23:02:02Z

✨ Features Added

🔍 SQLite Vector Search

Persistent vector storage using SQLite database
Vector similarity search using embeddings
FTS5 text search for keyword matching
Hybrid search combining vector and text results
Automatic transcript indexing during post-processing

�� Conversational RAG Chat

New "Chat" tab in the UI
LLM-powered conversational responses using LangChain
Context-aware answers using retrieved transcript chunks
Integration with existing LLM settings (Ollama/OpenAI)

🎛️ Transcript Filtering

Searchable dropdown to filter chat context by specific transcripts
Visual indicators showing currently selected transcript
Seamless switching between filtered and unfiltered modes

🛠️ Technical Implementation

Vector Store: SQLite with FTS5 for hybrid search
Embeddings: Uses existing LLM embedding models
RAG Pipeline: LangChain for document retrieval and response generation
UI: React components with Radix UI for transcript filtering
Integration: Seamlessly integrated into existing post-processing pipeline

📁 Files Changed

src/main/domain/vector-search.ts - Core vector search implementation
src/renderer/chat/chat-screen.tsx - Chat UI with transcript filtering
src/main/ipc/history-api.ts - API endpoints for vector search and chat
forge.config.ts & vite.base.config.ts - Native module bundling for SQLite
package.json - Added sqlite3 dependency

🧪 Testing

✅ Vector search working with semantic similarity
✅ Conversational chat generating natural responses
✅ Transcript filtering restricting context appropriately
✅ SQLite database persistence across app restarts
✅ Integration with existing LLM settings

🎉 Benefits

Free & Scalable: Uses local SQLite, no external dependencies
Persistent: Data survives app restarts
Conversational: Natural AI responses vs. just search results
Focused: Can filter to specific meeting transcripts
Integrated: Works with existing LLM infrastructure

This enhancement transforms Pensieve from a simple transcript viewer into an intelligent meeting assistant that can answer questions about your conversations!

@lukasbach Please consider for merge.

- Add SQLite-based vector search with persistent storage - Implement conversational chat using LLM with RAG (Retrieval-Augmented Generation) - Add Chat tab with semantic search and conversational responses - Configure Electron Forge for SQLite native module bundling - Add vector search to post-processing pipeline - Merge vector-search-sqlite.ts into vector-search.ts for cleaner architecture - Remove old JSON-based vector storage files - Clean up unused dependencies (rxdb, fake-indexeddb) Features: - Vector similarity search using embeddings - FTS5 text search for keyword matching - Hybrid search combining vector and text results - Conversational AI responses using LangChain - Persistent SQLite database for vector storage - Automatic transcript indexing during post-processing

- Add searchable transcript dropdown to restrict chat context - Implement TextField-based search with dropdown results - Add visual indicator showing currently selected transcript - Update vector search calls to use recording filter parameter - Add clear filter functionality to return to all transcripts - Improve chat UX with focused, transcript-specific responses Features: - Search transcripts by name or ID - Filter chat context to specific recordings - Visual feedback with green badge for active filter - Seamless switching between filtered and unfiltered modes

- Fix vector database path to use cross-platform user data directory - Use app.getPath('userData') for packaged apps, fallback to process.cwd() for development - Fix forge.config.ts ignore patterns to include .vite build directory - Ensures vector database persists in proper location across all platforms - Resolves sqlite3 module loading issues in packaged apps

lukasbach

Really like the idea! Unfortunately I get an error when trying to build and run the app locally with yarn start: An unhandled rejection has occurred inside Forge: Error: Could not detect abi for version 37.2.4 and runtime electron. Updating "node-abi" might help solve this issue if it is a new release of electron.
The error doesn't happen for me on the current main branch, and running yarn add node-abi@latest doesn't seem to fix it, any idea why?

lukasbach · 2025-09-21T09:27:06Z

src/main/ipc/history-api.ts

+
+  // Conversational chat functions
+  generateConversationalResponse: async (query: string, searchResults: any[]) => {
+    const { getChatModel, getEmbeddings } = await import("../domain/llm");


Can you change these to be normal top-level imports?

lukasbach · 2025-09-21T09:33:32Z

src/main/ipc/history-api.ts

+  getVectorStoreStats: async () => vectorSearch.getVectorStoreStats(),
+
+  // Conversational chat functions
+  generateConversationalResponse: async (query: string, searchResults: any[]) => {


I guess the type for searchResults would be VectorSearchResult[]?

lukasbach · 2025-09-21T09:38:43Z

src/main/domain/vector-search.ts

+    }
+  }
+
+  private async loadSqliteVec() {


Please don't add unfinished code, if this is a plan for implementation in the future, it should come in a future PR, but we shouldn't have stubs of unfinished concepts in the code.

lukasbach · 2025-09-21T09:39:21Z

src/main/domain/vector-search.ts

+    if (!this.db) return;
+
+    return new Promise<void>((resolve, reject) => {
+      this.db!.serialize(() => {


Please use checks instead of non-null-assertions, like calling if (!this.db) return; within the promise.

lukasbach · 2025-09-21T09:41:26Z

src/main/domain/vector-search.ts

+    });
+  }
+
+  async addTranscript(recordingId: string) {


Maybe run invalidateUiKeys within this and other methods that change underlying data, to allow the UI to refetch what has changed.

Sync with main

This reverts commit f03b2b7.

- Convert dynamic imports to top-level imports in history-api.ts - Fix TypeScript types: change any[] to VectorSearchResult[] - Remove unfinished loadSqliteVec() stub code - Replace non-null assertions with proper null checks - Add invalidateUiKeys() calls for data modification methods - Move VectorSearchResult type to main types file for better organization All PR feedback from lukasbach#17 has been addressed.

- Add progress callback parameter to addTranscriptToVectorStore() - Update embedding loop to report progress after each chunk (0-100%) - Integrate progress tracking into post-processing pipeline - Add 'Building search index' step to progress card UI - Users now see real-time progress during vector embedding This makes the vector search indexing process visible and provides clear feedback on how many transcript chunks have been processed.

rogerb831 · 2025-10-10T17:34:28Z

Address PR feedback: fix imports, types, and add UI invalidation

Convert dynamic imports to top-level imports in history-api.ts
Fix TypeScript types: change any[] to VectorSearchResult[]
Remove unfinished loadSqliteVec() stub code
Replace non-null assertions with proper null checks
Add invalidateUiKeys() calls for data modification methods
Move VectorSearchResult type to main types file for better organization

Add progress tracking for vector embedding step

Add progress callback parameter to addTranscriptToVectorStore()
Update embedding loop to report progress after each chunk (0-100%)
Integrate progress tracking into post-processing pipeline
Add 'Building search index' step to progress card UI
Users now see real-time progress during vector embedding

@lukasbach the failures you experienced were likely from the lack of lock files. This should be corrected. Testing good on my end. Please consider for merge.

- Set microphone recording checkbox to be checked by default when starting a new recording - Added placeholder microphone in initial state to ensure checkbox is checked - Updated useEffect to replace placeholder with actual default microphone when available - Improves user experience by reducing steps needed to start recording with microphone

- Add defaultRecordScreenAudio and defaultRecordMicrophone settings - Connect recorder state to settings for persistence across app restarts - Remove restrictions preventing unchecking both recording options - Initialize recorder with user's saved preferences on startup - Both checkboxes are now fully toggleable and state persists

lukasbach · 2025-10-10T23:07:25Z

If I try to run yarn install on this PR locally, I get the same error as in the CI pipeline for this PR (https://github.com/lukasbach/pensieve/actions/runs/18413849571/job/52472695256?pr=17):

➤ YN0001: @electron/node-gyp@git+https://github.com/electron/node-gyp.git#06b29aafb7708acef8b3669835c8a7857ebc92d2: Failed listing refs
➤ YN0001:   Repository URL: ssh://[email protected]/electron/node-gyp.git
➤ YN0001:   Fatal Error: Could not read from remote repository.
➤ YN0001:   Exit Code: 128

I think you might be using an older version of yarn, the changes to the yarn lockfile suggest that the PR migrates the lockfile to an older lockfile version. The repo has pinned yarn and nodejs versions with Volta, you can use Volta to automatically use the pinned versions, or manually install what is configured for this repo, [email protected] and [email protected]. If I revert the lockfile changes and run a yarn reinstall, I still get the same node-abi error unfortunately.

- Add 'Chat' tab to meeting screen next to transcript/summary/notes - Create RecordingChat component with recording-scoped vector search - Implement chat interface with source citations and jump-to functionality - Enable focused Q&A about specific recordings with transcript context

- Fix @electron/node-gyp git dependency URL from ssh to https in yarn.lock - Rebuild yarn.lock using upstream as foundation with vector search deps - Add ESM compatibility resolutions (p-map, is-fullwidth-code-point, slice-ansi, string-width, node-abi) - Configure rebuildConfig to skip native module rebuilding (avoids node-abi Electron 37.2.4 compatibility issue) - Add Vite server wait fallback to prevent blank screen on startup - Add rebuild:sqlite3 script for manual native module rebuilding

- Replace console.log/error with log.info/error in src/main.ts and postprocess.ts (upstream code) - Fix TypeScript consistent-return errors in vector-search.ts - Fix return-await errors in async methods - Fix promise executor return value error in windows.ts - Escape quotes and apostrophes in JSX (recording-chat.tsx) - Fix import formatting and class member spacing in vector-search.ts - Remove unused getSettings import - Fix comment formatting in forge.config.ts All critical linting errors resolved. Remaining warnings (dependency cycles, console statements) are acceptable.

- Fix Badge color type error in recording-chat.tsx (use style prop instead) - Fix 'this.changes' access in vector-search.ts (use regular function for sqlite3 callback) - Fix 'possibly null' error in vector-search.ts (capture db reference and add null check) - Auto-fix prettier formatting issues All TypeScript errors resolved. Typecheck now passes.

…ing when WAV is missing - Add fallback to use MP3 file when WAV file doesn't exist in doWhisperStep - Skip WAV creation if file already exists in doWavStep - Skip MP3 creation if file already exists in doMp3Step - Only remove WAV file if it was actually used (not MP3 fallback) This fixes post-processing failures when WAV files have been removed but MP3 files are still available for transcription.

rogerb831 · 2025-11-05T21:28:55Z

@lukasbach This is now passing the pipeline. Please test and merge if you see fit.

- Add macOS to verify.yml workflow matrix to test builds on both Windows and macOS - Enable macOS in publish.yml workflow matrix to publish macOS distributables - Both workflows now run on windows-latest and macos-latest This enables building and testing macOS DMG and ZIP distributables in CI.

- Add packageAfterPrune hook to remove problematic .bin symlinks that break ASAR packaging - Recursively remove all .bin directories and symlinks from node_modules - Add architecture matrix to workflows for building both arm64 and x64 macOS distributables - Windows builds unchanged (no architecture specified) - macOS builds now test both arm64 and x64 architectures The symlink removal is safe as .bin directories are only needed for development (npm/yarn scripts), not at runtime.

- Change x64 macOS builds to use macos-15-intel runner (Intel hardware) - macos-latest runners are ARM64 and cannot natively build x64

- Update forge.config.ts to read repository owner and name from environment variables - Owner: Uses GITHUB_REPOSITORY_OWNER or parses GITHUB_REPOSITORY - Name: Parses GITHUB_REPOSITORY to extract repo name - Falls back to hardcoded 'lukasbach/pensieve' if env vars not set - Update publish.yml workflow to pass repository info as environment variables - Sets GITHUB_REPOSITORY_OWNER from github.repository_owner - Sets GITHUB_REPOSITORY from github.repository - Allows publishing releases to any fork that runs the workflow

- Move version bump to dedicated 'bump-version' job that runs first - Add 'needs: bump-version' to publish job so it waits for version bump - Prevents conflicts when multiple matrix jobs try to bump version simultaneously - Each publish job will checkout the code with the updated version already committed

- Split workflow into three jobs: build, bump-version, publish - Build job runs all platforms in parallel, uploads distributables as artifacts - Bump-version job only runs if all builds succeed (needs: build) - Publish job downloads pre-built distributables and publishes them - Version is only incremented if builds succeed, preventing version bumps on failures - Uses artifact_suffix in matrix for consistent artifact naming

- Add condition to bump-version job: only runs if github.repository == 'lukasbach/pensieve' - Update publish job to depend on both build and bump-version - Publish job runs if bump-version succeeded or was skipped (for forks) - Allows forks to publish releases without version bumps - Version bumps should only happen in upstream CI

- Remove build-first approach that separated build and publish jobs - Keep fork-friendly logic from bca725a (skip version bump for non-upstream) - Restore simpler workflow: bump-version -> publish (builds and publishes) - Version bump happens before builds, ensuring correct version in distributables

@2x

- Generate .icns file for macOS using iconutil - Create all required macOS icon sizes (16x16 through 1024x1024 with @2x variants) - Update packagerConfig to use base icon path for automatic platform detection - Update MakerDMG to use .icns format - Rename Windows ICO from [email protected] to icon.ico for standard naming - Generate both .ico and .icns formats for cross-platform compatibility

Toggleable recorder checkboxes

Fix/mac distributable

rogerb831 added 3 commits September 13, 2025 18:53

Initial vector search

db2cffb

github-project-automation bot added this to Pensieve Backlog Sep 13, 2025

github-project-automation bot moved this to Backlog in Pensieve Backlog Sep 13, 2025

rogerb831 changed the title ~~Feature/vector search~~ Feature/Vector Search & Chatbot Sep 13, 2025

rogerb831 added 2 commits September 13, 2025 19:13

Add package-lock.json to .gitignore

f03b2b7

lukasbach reviewed Sep 21, 2025

View reviewed changes

rogerb831 and others added 5 commits October 10, 2025 11:57

Merge pull request #1 from lukasbach/main

929cbc4

Sync with main

Revert "Add package-lock.json to .gitignore"

a6fe749

This reverts commit f03b2b7.

Config fixes

752ec99

rogerb831 added 2 commits October 10, 2025 14:45

rogerb831 added 5 commits October 14, 2025 14:10

rogerb831 added 6 commits November 5, 2025 16:36

Fix CI: use macos-15-intel for x64 builds and fix console statements

d2c608d

- Change x64 macOS builds to use macos-15-intel runner (Intel hardware) - macos-latest runners are ARM64 and cannot natively build x64

Fix linting error: format repository name on single line

77ddd08

rogerb831 and others added 10 commits November 5, 2025 18:13

Fix Prettier formatting errors in forge.config.ts

b71b9d9

Fix Prettier formatting errors in recorder state.ts

d4c1d1a

Merge pull request #2 from rogerb831/toggleable-recorder-checkboxes

8f462e5

Toggleable recorder checkboxes

Merge pull request #3 from rogerb831/fix/mac-distributable

958d55c

Fix/mac distributable

Merge branch 'lukasbach:main' into feature/vector-search

1b8198f

Merge branch 'main' into feature/vector-search

8855e38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/Vector Search & Chatbot#17

Feature/Vector Search & Chatbot#17
rogerb831 wants to merge 33 commits intolukasbach:mainfrom
rogerb831:feature/vector-search

rogerb831 commented Sep 13, 2025

Uh oh!

lukasbach left a comment

Uh oh!

lukasbach Sep 21, 2025

Uh oh!

lukasbach Sep 21, 2025

Uh oh!

lukasbach Sep 21, 2025

Uh oh!

lukasbach Sep 21, 2025

Uh oh!

lukasbach Sep 21, 2025

Uh oh!

rogerb831 commented Oct 10, 2025

Uh oh!

lukasbach commented Oct 10, 2025

Uh oh!

rogerb831 commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rogerb831 commented Sep 13, 2025

✨ Features Added

🔍 SQLite Vector Search

�� Conversational RAG Chat

🎛️ Transcript Filtering

🛠️ Technical Implementation

📁 Files Changed

🧪 Testing

🎉 Benefits

Uh oh!

lukasbach left a comment

Choose a reason for hiding this comment

Uh oh!

lukasbach Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

lukasbach Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

lukasbach Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

lukasbach Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

lukasbach Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

rogerb831 commented Oct 10, 2025

Uh oh!

lukasbach commented Oct 10, 2025

Uh oh!

rogerb831 commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants