Add resume upload and PDF text extraction functionality by iliasaz · Pull Request #1 · iliasaz/JobScout

iliasaz · 2026-01-25T21:39:15Z

Summary

This PR adds comprehensive resume management capabilities to JobScout, allowing users to upload PDF resumes, extract text content, and store the data locally for use in job applications.

Key Changes

Database & Models

Added three new database migrations:
- 010_AddUserResume: Creates user_resume table for storing PDF files and metadata
- 011_AddResumeExtractedText: Adds text extraction status tracking columns
- 012_AddResumeChunks: Creates resume_chunks table for storing chunked text segments
Created UserResume model with extraction status tracking (pending/processing/completed/failed)
Created ResumeChunk model for managing text chunks with metadata

New Services & Repositories

ResumeRepository: Actor-based repository for CRUD operations on resumes and chunks
- Supports saving, updating, and deleting resumes (one resume at a time)
- Manages text extraction status and error tracking
- Handles chunk storage and retrieval with transaction support
ResumeTextService: Actor-based service for PDF text extraction and intelligent chunking
- Uses PDFKit for PDF parsing
- Implements sentence-based chunking with configurable size constraints (50-1000 characters, target 500)
- Uses NaturalLanguage framework for sentence tokenization
- Provides word count and character count metadata for each chunk

UI Updates

Enhanced SettingsView with new Resume section:
- Display current resume with file name, size, and upload date
- Shows text extraction status with visual indicators
- Upload, replace, and delete resume functionality
- Retry extraction button for failed extractions
- File picker limited to PDF files with 10 MB size limit
- Real-time status updates during upload and extraction

Infrastructure

Added GitHub Actions CI/CD workflow (build-and-test.yml)
- Runs on macOS 15 with automatic Xcode version selection
- Resolves Swift Package dependencies
- Builds and tests the project with detailed error logging
- Uploads build logs on failure for debugging
Updated package repository URLs from SSH to HTTPS for better CI/CD compatibility

Implementation Details

Single Resume Model: Only one resume is stored at a time; uploading a new resume replaces the previous one
Async/Await Architecture: All database and service operations use Swift's actor model for thread safety
Intelligent Chunking: Text is chunked by sentences to preserve semantic boundaries while maintaining size constraints
Status Tracking: Extraction status is tracked with error messages for failed operations
Transaction Safety: Combined operations (save text + chunks) execute atomically
File Validation: PDFs are validated by magic bytes before processing

- Add user_resume table migration (010) to store PDF files as BLOB - Create UserResume model with file metadata and formatted size display - Create ResumeRepository for CRUD operations on user resume - Add resume section to SettingsView with upload/replace/delete functionality - Validate PDF files using magic bytes and enforce 10MB size limit - Display upload status with progress indicator and success/error messages

- Add UserResume model tests for formattedFileSize property - Add ResumeRepository tests for CRUD operations (save, get, update, delete) - Add PDF validation tests for magic bytes verification - Add file size validation tests for 10MB limit enforcement - Add ResumeError tests for error descriptions

- Add Zoni package dependency for PDF processing - Add database migrations for extracted_text column and resume_chunks table - Create ResumeTextService using Zoni PDFLoader and SentenceChunker - Update UserResume model with extraction status and extracted text fields - Add ResumeChunk model for storing text chunks - Extend ResumeRepository with text extraction and chunk operations - Update SettingsView to show extraction status and trigger extraction - Add comprehensive tests for new functionality Text extraction features: - Automatic extraction after PDF upload - Sentence-based chunking (500 char target, 50-1000 range) - Status tracking (pending/processing/completed/failed) - Retry button for failed extractions - Chunk count display in settings

- Build macOS app on macos-14 runner with Xcode 15.4 - Resolve Swift Package dependencies - Run unit tests - Trigger on pushes to main and claude/* branches - Trigger on PRs to main

Change SSH URLs to HTTPS URLs for package dependencies: - html2md: [email protected] -> https://github.com - SwiftAgents: [email protected] -> https://github.com SSH URLs require authentication keys not available in CI environments.

- Use macos-15 runner for latest Xcode support - Auto-detect latest available Xcode version - Add project info listing step - Use clonedSourcePackagesDirPath for SPM cache - Add build log artifact upload on failure - Improve error detection and logging

Zoni doesn't have releases yet, so use branch-based dependency instead of version-based.

…action - Remove Zoni package dependency from Xcode project - Rewrite ResumeTextService to use PDFKit for PDF text extraction - Use NaturalLanguage NLTokenizer for sentence-based text chunking - This eliminates the external dependency that was causing CI build failures

claude added 9 commits January 11, 2026 07:13

Add GitHub Actions workflow for build and test

c8b3cc0

- Build macOS app on macos-14 runner with Xcode 15.4 - Resolve Swift Package dependencies - Run unit tests - Trigger on pushes to main and claude/* branches - Trigger on PRs to main

Fix package URLs for GitHub Actions compatibility

6ae821c

Change SSH URLs to HTTPS URLs for package dependencies: - html2md: [email protected] -> https://github.com - SwiftAgents: [email protected] -> https://github.com SSH URLs require authentication keys not available in CI environments.

Update Zoni dependency to use main branch

29d766d

Zoni doesn't have releases yet, so use branch-based dependency instead of version-based.

Add verbose error output to CI workflow

b8d69a4

iliasaz merged commit b077b71 into main Jan 25, 2026
0 of 2 checks passed

iliasaz deleted the claude/resume-upload-feature-olzhi branch January 25, 2026 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resume upload and PDF text extraction functionality#1

Add resume upload and PDF text extraction functionality#1
iliasaz merged 9 commits intomainfrom
claude/resume-upload-feature-olzhi

iliasaz commented Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iliasaz commented Jan 25, 2026

Summary

Key Changes

Database & Models

New Services & Repositories

UI Updates

Infrastructure

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants