Skip to content

Commit 50a8540

Browse files
authored
Merge branch 'master' into feat/ssa-ass-precise-positioning
2 parents 9ad9155 + 82daa7f commit 50a8540

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+2257
-85
lines changed

.github/workflows/release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ jobs:
109109
run: Compress-Archive -Path ./installer/* -DestinationPath ./CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}_win_portable.zip
110110
working-directory: ./windows
111111
- name: Build installer
112-
run: wix build -ext WixToolset.UI.wixext -d "AppVersion=${{ steps.get_version.outputs.VERSION }}" -o CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.msi installer.wxs CustomUI.wxs
112+
run: wix build -arch x64 -ext WixToolset.UI.wixext -d "AppVersion=${{ steps.get_version.outputs.VERSION }}" -o CCExtractor.${{ steps.get_version.outputs.DISPLAY_VERSION }}.msi installer.wxs CustomUI.wxs
113113
working-directory: ./windows
114114
- name: Upload as asset
115115
uses: AButler/upload-release-assets@v3.0

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ CVS
1717
mac/ccextractor
1818
linux/ccextractor
1919
linux/depend
20+
linux/build_scan/
2021
windows/x86_64-pc-windows-msvc/**
2122
windows/Debug/**
2223
windows/Debug-OCR/**
@@ -28,6 +29,7 @@ windows/Debug-Full/**
2829
windows/x64/**
2930
windows/ccextractor.VC.db
3031
build/
32+
build_*/
3133

3234
####
3335
# Python

OpenBSD/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MAINTAINER = Marc Espie <espie@openbsd.org>
44
CATEGORIES = multimedia
55
COMMENT = closed caption subtitles extractor
66
HOMEPAGE = https://ccextractor.org
7-
V = 0.96
7+
V = 0.96.3
88
DISTFILES = ccextractor.${V:S/.//}-src.zip
99
MASTER_SITES = ${MASTER_SITE_SOURCEFORGE:=ccextractor/}
1010
DISTNAME = ccextractor-$V

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
# CCExtractor
44

5-
<a href="https://travis-ci.org/CCExtractor/ccextractor"><img src="https://raw.githubusercontent.com/CCExtractor/ccextractor-org-media/master/static/macOS-build-badge-logo.png" width="20"></a> [![Build Status](https://travis-ci.org/CCExtractor/ccextractor.svg?branch=master)](https://travis-ci.org/CCExtractor/ccextractor)
65
[![Sample-Platform Build Status Windows](https://sampleplatform.ccextractor.org/static/img/status/build-windows.svg?maxAge=1800)](https://sampleplatform.ccextractor.org/test/master/windows)
76
[![Sample-Platform Build Status Linux](https://sampleplatform.ccextractor.org/static/img/status/build-linux.svg?maxAge=1800)](https://sampleplatform.ccextractor.org/test/master/linux)
87
[![SourceForge](https://img.shields.io/badge/SourceForge%20downloads-213k%2Ftotal-brightgreen.svg)](https://sourceforge.net/projects/ccextractor/)

docs/CHANGES.TXT

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,26 @@
1+
Unreleased
2+
----------
3+
- New: Added ASS/SSA \pos-based positioning for CEA-608 captions when layout
4+
5+
0.96.3 (2025-12-29)
6+
-------------------
7+
- New: VOBSUB subtitle extraction with OCR support for MP4 files
8+
- New: VOBSUB subtitle extraction support for MKV/Matroska files
9+
- New: Native SCC (Scenarist Closed Caption) input file support - CCExtractor can now read SCC files
10+
- New: Configurable frame rate (--scc-framerate) and styled PAC codes for SCC output
11+
- Fix: Apply --delay option to DVB/bitmap subtitles (previously only worked with text-based subtitles)
12+
- Fix: 200ms timing offset in MOV/MP4 caption extraction
13+
- Fix: utf8proc include path for system library builds
14+
- Fix: Use fixed-width integer types in MP4 bswap functions for better portability
15+
- Fix: Guard ocr_text access with ENABLE_OCR preprocessor check
16+
- Fix: Preserve FFmpeg libs when building with -system-libs -hardsubx
17+
- Build: Add vobsub_decoder to Windows and autoconf build systems
18+
- Build: Add winget and Chocolatey packaging workflows for Windows distribution
19+
- Docs: Add VOBSUB extraction documentation and subtile-ocr Dockerfile
20+
121
0.96.2 (2025-12-26)
222
-------------------
3-
- New: Added ASS/SSA \pos-based positioning for CEA-608 captions when layout
23+
- Fix: Resolve utf8proc header include path when building against system libraries on Linux.
424
- Rebundle Windows version to include required runtime files to process hardcoded subtitles
525
(hardcodex mode).
626
- New: Add optional -system-libs flag to Linux build script for package manager compatibility

docs/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,14 @@ Running ccextractor without parameters shows the help screen. Usage is
2626
trivial - you just need to pass the input file and (optionally) some
2727
details about the input and output files.
2828

29+
Example:
30+
31+
ccextractor input_video.ts
32+
33+
This command extracts subtitles from the input video file and generates a subtitle output file
34+
(such as .srt) in the same directory.
35+
36+
2937

3038
## Languages
3139
Usually English captions are transmitted in line 21 field 1 data,

docs/VOBSUB.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# VOBSUB Subtitle Extraction from MKV Files
2+
3+
CCExtractor supports extracting VOBSUB (S_VOBSUB) subtitles from Matroska (MKV) containers. VOBSUB is an image-based subtitle format originally from DVD video.
4+
5+
## Overview
6+
7+
VOBSUB subtitles consist of two files:
8+
- `.idx` - Index file containing metadata, palette, and timestamp/position entries
9+
- `.sub` - Binary file containing the actual subtitle bitmap data in MPEG Program Stream format
10+
11+
## Basic Usage
12+
13+
```bash
14+
ccextractor movie.mkv
15+
```
16+
17+
This will extract all VOBSUB tracks and create paired `.idx` and `.sub` files:
18+
- `movie_eng.idx` + `movie_eng.sub` (first English track)
19+
- `movie_eng_1.idx` + `movie_eng_1.sub` (second English track, if present)
20+
- etc.
21+
22+
## Converting VOBSUB to SRT (Text)
23+
24+
Since VOBSUB subtitles are images, you need OCR (Optical Character Recognition) to convert them to text-based formats like SRT.
25+
26+
### Using subtile-ocr (Recommended)
27+
28+
[subtile-ocr](https://github.com/gwen-lg/subtile-ocr) is an actively maintained Rust tool that provides accurate OCR conversion.
29+
30+
#### Option 1: Docker (Easiest)
31+
32+
We provide a Dockerfile that builds subtile-ocr with all dependencies:
33+
34+
```bash
35+
# Build the Docker image (one-time)
36+
cd tools/vobsubocr
37+
docker build -t subtile-ocr .
38+
39+
# Extract VOBSUB from MKV
40+
ccextractor movie.mkv
41+
42+
# Convert to SRT using OCR
43+
docker run --rm -v $(pwd):/data subtile-ocr -l eng -o /data/movie_eng.srt /data/movie_eng.idx
44+
```
45+
46+
#### Option 2: Install subtile-ocr Natively
47+
48+
If you have Rust and Tesseract development libraries installed:
49+
50+
```bash
51+
# Install dependencies (Ubuntu/Debian)
52+
sudo apt-get install libleptonica-dev libtesseract-dev tesseract-ocr tesseract-ocr-eng
53+
54+
# Install subtile-ocr
55+
cargo install --git https://github.com/gwen-lg/subtile-ocr
56+
57+
# Convert
58+
subtile-ocr -l eng -o movie_eng.srt movie_eng.idx
59+
```
60+
61+
### subtile-ocr Options
62+
63+
| Option | Description |
64+
|--------|-------------|
65+
| `-l, --lang <LANG>` | Tesseract language code (required). Examples: `eng`, `fra`, `deu`, `chi_sim` |
66+
| `-o, --output <FILE>` | Output SRT file (stdout if not specified) |
67+
| `-t, --threshold <0.0-1.0>` | Binarization threshold (default: 0.6) |
68+
| `-d, --dpi <DPI>` | Image DPI for OCR (default: 150) |
69+
| `--dump` | Save processed subtitle images as PNG files |
70+
71+
### Language Codes
72+
73+
Install additional Tesseract language packs as needed:
74+
75+
```bash
76+
# Examples
77+
sudo apt-get install tesseract-ocr-fra # French
78+
sudo apt-get install tesseract-ocr-deu # German
79+
sudo apt-get install tesseract-ocr-spa # Spanish
80+
sudo apt-get install tesseract-ocr-chi-sim # Simplified Chinese
81+
```
82+
83+
## Technical Details
84+
85+
### .idx File Format
86+
87+
The index file contains:
88+
1. Header with metadata (size, palette, alignment settings)
89+
2. Language identifier line
90+
3. Timestamp entries with file positions
91+
92+
Example:
93+
```
94+
# VobSub index file, v7 (do not modify this line!)
95+
size: 720x576
96+
palette: 000000, 828282, ...
97+
98+
id: eng, index: 0
99+
timestamp: 00:01:12:920, filepos: 000000000
100+
timestamp: 00:01:18:640, filepos: 000000800
101+
...
102+
```
103+
104+
### .sub File Format
105+
106+
The binary file contains MPEG Program Stream packets:
107+
- Each subtitle is wrapped in a PS Pack header (14 bytes) + PES header (15 bytes)
108+
- Subtitles are aligned to 2048-byte boundaries
109+
- Contains raw SPU (SubPicture Unit) bitmap data
110+
111+
## Troubleshooting
112+
113+
### Empty output files
114+
- Ensure the MKV file actually contains VOBSUB tracks (check with `mediainfo` or `ffprobe`)
115+
- CCExtractor will report "No VOBSUB subtitles to write" if the track is empty
116+
117+
### OCR quality issues
118+
- Try adjusting the `-t` threshold parameter
119+
- Ensure the correct language pack is installed
120+
- Use `--dump` to inspect the processed images
121+
122+
### Docker permission issues
123+
- The output files may be owned by root; use `sudo chown` to fix ownership
124+
- Or run Docker with `--user $(id -u):$(id -g)`
125+
126+
## See Also
127+
128+
- [OCR.md](OCR.md) - General OCR support in CCExtractor
129+
- [subtile-ocr GitHub](https://github.com/gwen-lg/subtile-ocr) - OCR tool documentation

linux/Makefile.am

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,8 @@ ccextractor_SOURCES = \
151151
../src/lib_ccx/list.h \
152152
../src/lib_ccx/matroska.c \
153153
../src/lib_ccx/matroska.h \
154+
../src/lib_ccx/vobsub_decoder.c \
155+
../src/lib_ccx/vobsub_decoder.h \
154156
../src/lib_ccx/mp4.c \
155157
../src/lib_ccx/myth.c \
156158
../src/lib_ccx/networking.c \

linux/build

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,6 @@ if [ "$USE_SYSTEM_LIBS" = true ]; then
6565

6666
PKG_CFLAGS="$(pkg-config --cflags libpng zlib freetype2 libutf8proc)"
6767
PKG_LIBS="$(pkg-config --libs libpng zlib freetype2 libutf8proc)"
68-
69-
UTF8PROC_COMPAT=""
70-
if [ ! -d /usr/include/utf8proc ] && [ -f /usr/include/utf8proc.h ]; then
71-
mkdir -p ./utf8proc_compat/utf8proc
72-
ln -sf /usr/include/utf8proc.h ./utf8proc_compat/utf8proc/utf8proc.h
73-
UTF8PROC_COMPAT="-I./utf8proc_compat"
74-
fi
7568
fi
7669

7770
BLD_FLAGS="$BLD_FLAGS -std=gnu99 -Wno-write-strings -Wno-pointer-sign -D_FILE_OFFSET_BITS=64 -DVERSION_FILE_PRESENT -DENABLE_OCR -DGPAC_DISABLE_VTT -DGPAC_DISABLE_OD_DUMP -DGPAC_DISABLE_REMOTERY -DNO_GZIP"
@@ -140,7 +133,7 @@ if [ "$USE_SYSTEM_LIBS" = true ]; then
140133
GPAC_CFLAGS="$(pkg-config --cflags --silence-errors gpac)"
141134

142135
BLD_INCLUDE="-I../src -I../src/lib_ccx -I../src/lib_ccx/zvbi -I../src/thirdparty/lib_hash \
143-
$UTF8PROC_COMPAT $PKG_CFLAGS $LEPTONICA_CFLAGS $TESSERACT_CFLAGS $GPAC_CFLAGS"
136+
$PKG_CFLAGS $LEPTONICA_CFLAGS $TESSERACT_CFLAGS $GPAC_CFLAGS"
144137

145138
BLD_SOURCES="../src/ccextractor.c $SRC_CCX $SRC_HASH"
146139
# Preserve FFmpeg libraries if -hardsubx was specified

linux/configure.ac

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Process this file with autoconf to produce a configure script.
33

44
AC_PREREQ([2.71])
5-
AC_INIT([CCExtractor], [0.96], [carlos@ccextractor.org])
5+
AC_INIT([CCExtractor], [0.96.3], [carlos@ccextractor.org])
66
AC_CONFIG_AUX_DIR([build-conf])
77
AC_CONFIG_SRCDIR([../src/ccextractor.c])
88
AM_INIT_AUTOMAKE([foreign subdir-objects])

0 commit comments

Comments
 (0)