Skip to content

Add resume upload and PDF text extraction functionality#1

Merged
iliasaz merged 9 commits intomainfrom
claude/resume-upload-feature-olzhi
Jan 25, 2026
Merged

Add resume upload and PDF text extraction functionality#1
iliasaz merged 9 commits intomainfrom
claude/resume-upload-feature-olzhi

Conversation

@iliasaz
Copy link
Owner

@iliasaz iliasaz commented Jan 25, 2026

Summary

This PR adds comprehensive resume management capabilities to JobScout, allowing users to upload PDF resumes, extract text content, and store the data locally for use in job applications.

Key Changes

Database & Models

  • Added three new database migrations:
    • 010_AddUserResume: Creates user_resume table for storing PDF files and metadata
    • 011_AddResumeExtractedText: Adds text extraction status tracking columns
    • 012_AddResumeChunks: Creates resume_chunks table for storing chunked text segments
  • Created UserResume model with extraction status tracking (pending/processing/completed/failed)
  • Created ResumeChunk model for managing text chunks with metadata

New Services & Repositories

  • ResumeRepository: Actor-based repository for CRUD operations on resumes and chunks
    • Supports saving, updating, and deleting resumes (one resume at a time)
    • Manages text extraction status and error tracking
    • Handles chunk storage and retrieval with transaction support
  • ResumeTextService: Actor-based service for PDF text extraction and intelligent chunking
    • Uses PDFKit for PDF parsing
    • Implements sentence-based chunking with configurable size constraints (50-1000 characters, target 500)
    • Uses NaturalLanguage framework for sentence tokenization
    • Provides word count and character count metadata for each chunk

UI Updates

  • Enhanced SettingsView with new Resume section:
    • Display current resume with file name, size, and upload date
    • Shows text extraction status with visual indicators
    • Upload, replace, and delete resume functionality
    • Retry extraction button for failed extractions
    • File picker limited to PDF files with 10 MB size limit
    • Real-time status updates during upload and extraction

Infrastructure

  • Added GitHub Actions CI/CD workflow (build-and-test.yml)
    • Runs on macOS 15 with automatic Xcode version selection
    • Resolves Swift Package dependencies
    • Builds and tests the project with detailed error logging
    • Uploads build logs on failure for debugging
  • Updated package repository URLs from SSH to HTTPS for better CI/CD compatibility

Implementation Details

  • Single Resume Model: Only one resume is stored at a time; uploading a new resume replaces the previous one
  • Async/Await Architecture: All database and service operations use Swift's actor model for thread safety
  • Intelligent Chunking: Text is chunked by sentences to preserve semantic boundaries while maintaining size constraints
  • Status Tracking: Extraction status is tracked with error messages for failed operations
  • Transaction Safety: Combined operations (save text + chunks) execute atomically
  • File Validation: PDFs are validated by magic bytes before processing

- Add user_resume table migration (010) to store PDF files as BLOB
- Create UserResume model with file metadata and formatted size display
- Create ResumeRepository for CRUD operations on user resume
- Add resume section to SettingsView with upload/replace/delete functionality
- Validate PDF files using magic bytes and enforce 10MB size limit
- Display upload status with progress indicator and success/error messages
- Add UserResume model tests for formattedFileSize property
- Add ResumeRepository tests for CRUD operations (save, get, update, delete)
- Add PDF validation tests for magic bytes verification
- Add file size validation tests for 10MB limit enforcement
- Add ResumeError tests for error descriptions
- Add Zoni package dependency for PDF processing
- Add database migrations for extracted_text column and resume_chunks table
- Create ResumeTextService using Zoni PDFLoader and SentenceChunker
- Update UserResume model with extraction status and extracted text fields
- Add ResumeChunk model for storing text chunks
- Extend ResumeRepository with text extraction and chunk operations
- Update SettingsView to show extraction status and trigger extraction
- Add comprehensive tests for new functionality

Text extraction features:
- Automatic extraction after PDF upload
- Sentence-based chunking (500 char target, 50-1000 range)
- Status tracking (pending/processing/completed/failed)
- Retry button for failed extractions
- Chunk count display in settings
- Build macOS app on macos-14 runner with Xcode 15.4
- Resolve Swift Package dependencies
- Run unit tests
- Trigger on pushes to main and claude/* branches
- Trigger on PRs to main
Change SSH URLs to HTTPS URLs for package dependencies:
- html2md: [email protected] -> https://github.com
- SwiftAgents: [email protected] -> https://github.com

SSH URLs require authentication keys not available in CI environments.
- Use macos-15 runner for latest Xcode support
- Auto-detect latest available Xcode version
- Add project info listing step
- Use clonedSourcePackagesDirPath for SPM cache
- Add build log artifact upload on failure
- Improve error detection and logging
Zoni doesn't have releases yet, so use branch-based dependency
instead of version-based.
…action

- Remove Zoni package dependency from Xcode project
- Rewrite ResumeTextService to use PDFKit for PDF text extraction
- Use NaturalLanguage NLTokenizer for sentence-based text chunking
- This eliminates the external dependency that was causing CI build failures
@iliasaz iliasaz merged commit b077b71 into main Jan 25, 2026
0 of 2 checks passed
@iliasaz iliasaz deleted the claude/resume-upload-feature-olzhi branch January 25, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants