Add resume upload and PDF text extraction functionality#1
Merged
Conversation
- Add user_resume table migration (010) to store PDF files as BLOB - Create UserResume model with file metadata and formatted size display - Create ResumeRepository for CRUD operations on user resume - Add resume section to SettingsView with upload/replace/delete functionality - Validate PDF files using magic bytes and enforce 10MB size limit - Display upload status with progress indicator and success/error messages
- Add UserResume model tests for formattedFileSize property - Add ResumeRepository tests for CRUD operations (save, get, update, delete) - Add PDF validation tests for magic bytes verification - Add file size validation tests for 10MB limit enforcement - Add ResumeError tests for error descriptions
- Add Zoni package dependency for PDF processing - Add database migrations for extracted_text column and resume_chunks table - Create ResumeTextService using Zoni PDFLoader and SentenceChunker - Update UserResume model with extraction status and extracted text fields - Add ResumeChunk model for storing text chunks - Extend ResumeRepository with text extraction and chunk operations - Update SettingsView to show extraction status and trigger extraction - Add comprehensive tests for new functionality Text extraction features: - Automatic extraction after PDF upload - Sentence-based chunking (500 char target, 50-1000 range) - Status tracking (pending/processing/completed/failed) - Retry button for failed extractions - Chunk count display in settings
- Build macOS app on macos-14 runner with Xcode 15.4 - Resolve Swift Package dependencies - Run unit tests - Trigger on pushes to main and claude/* branches - Trigger on PRs to main
Change SSH URLs to HTTPS URLs for package dependencies: - html2md: [email protected] -> https://github.com - SwiftAgents: [email protected] -> https://github.com SSH URLs require authentication keys not available in CI environments.
- Use macos-15 runner for latest Xcode support - Auto-detect latest available Xcode version - Add project info listing step - Use clonedSourcePackagesDirPath for SPM cache - Add build log artifact upload on failure - Improve error detection and logging
Zoni doesn't have releases yet, so use branch-based dependency instead of version-based.
…action - Remove Zoni package dependency from Xcode project - Rewrite ResumeTextService to use PDFKit for PDF text extraction - Use NaturalLanguage NLTokenizer for sentence-based text chunking - This eliminates the external dependency that was causing CI build failures
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive resume management capabilities to JobScout, allowing users to upload PDF resumes, extract text content, and store the data locally for use in job applications.
Key Changes
Database & Models
010_AddUserResume: Createsuser_resumetable for storing PDF files and metadata011_AddResumeExtractedText: Adds text extraction status tracking columns012_AddResumeChunks: Createsresume_chunkstable for storing chunked text segmentsUserResumemodel with extraction status tracking (pending/processing/completed/failed)ResumeChunkmodel for managing text chunks with metadataNew Services & Repositories
UI Updates
Infrastructure
build-and-test.yml)Implementation Details