🔍 Bug Summary
Using multipages PDFs in local LLMs fails, because the content is too big, not truncated
📖 Description
When I use bigger PDFs like 10++ pages I see the analysis failing reproduceable.
Paperless AI runs in an LXC (on ProxMox) from https://community-scripts.github.io/ProxmoxVE/scripts?id=paperless-ai
🔄 Steps to Reproduce
Install, the LXC via the community script.
Link it to Paperless NGX
Use Qwen3-4B-GGUF (served by a Lemonade server)
Import a multi-page PDF to paperless NGX
Trigger Paperless AI to start processing or wait for the cron
Open Paperless AI logs (/opt/paperless-ai/logs/logs.txt)
Wait for the ERROR (for me within a minute) - you will not see an error in the UI only in the logs
✅ Expected Behavior
Run even bigger documents without problems
❌ Actual Behavior
big documents could not be analysed with local LLM at all
🏷️ Paperless-AI Version
3.0.9
📜 Docker Logs
logs from /opt/paperless-ai/logs/logs.txt:
[2026-01-25T10:55:56.976Z] [INFO] [DEBUG] Found own user ID: 4
[2026-01-25T10:55:57.251Z] [INFO] [DEBUG] Fetched page 1, got 35 tags. [DEBUG] Total so far: 35
[2026-01-25T10:55:57.258Z] [INFO] [DEBUG] Found own user ID: 4
[2026-01-25T10:55:57.261Z] [INFO] [DEBUG] Fetched page 1, got 100 documents. [DEBUG] Total so far: 100
[2026-01-25T10:55:57.590Z] [INFO] [DEBUG] Fetched page 2, got 73 documents. [DEBUG] Total so far: 173
[2026-01-25T10:55:57.691Z] [INFO] [DEBUG] Finished fetching. Found 173 documents.
[2026-01-25T10:55:57.764Z] [INFO] [DEBUG] Document 182 rights for AI User - processed
[2026-01-25T10:55:57.885Z] [INFO] Thumbnail not cached, fetching from Paperless
[2026-01-25T10:55:57.919Z] [INFO] [DEBUG] Using character-based token estimation for model: Qwen3-4B-GGUF
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Token calculation - Prompt: 605, Reserved: 2605, Available: 125395
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Use existing data: yes, Restrictions applied based on useExistingData setting
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] External API data: none
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Using character-based truncation for model: Qwen3-4B-GGUF
[2026-01-25T10:56:51.409Z] [INFO] [DEBUG] [25.01.26, 11:55] Custom OpenAI request sent
[2026-01-25T10:56:51.409Z] [INFO] [DEBUG] [25.01.26, 11:55] Total tokens: 4089
[2026-01-25T10:56:51.410Z] [ERROR] Failed to parse JSON response: SyntaxError: Unexpected token '*', "**Warranty"... is not valid JSON
at JSON.parse (<anonymous>)
at CustomOpenAIService.analyzeDocument (/opt/paperless-ai/services/customService.js:219:31)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async processDocument (/opt/paperless-ai/routes/setup.js:1603:16)
at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [ERROR] Failed to analyze document: Error: Invalid JSON response from API
at CustomOpenAIService.analyzeDocument (/opt/paperless-ai/services/customService.js:226:15)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async processDocument (/opt/paperless-ai/routes/setup.js:1603:16)
at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [INFO] Repsonse from AI service: {
document: { tags: [], correspondent: null },
metrics: null,
error: 'Invalid JSON response from API'
}
[2026-01-25T10:56:51.411Z] [ERROR] [ERROR] processing document 182: Error: [ERROR] Document analysis failed: Invalid JSON response from API
at processDocument (/opt/paperless-ai/routes/setup.js:1607:11)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [INFO] [INFO] Task completed
📜 Paperless-ngx Logs
🖼️ Screenshots of your settings page
🖥️ Desktop Environment
Windows
💻 OS Version
Win 11
🌐 Browser
Firefox
🔢 Browser Version
No response
🌐 Mobile Browser
No response
📝 Additional Information
📌 Extra Notes
My approach to fix this: Adding a new variable in /opt/paperless-ai/data/.env
CONTENT_MAX_LENGTH=200
Adapting /opt/paperless-ai/services/serviceUtils.js added the following lines at line 105:
if (process.env.CONTENT_MAX_LENGTH) {
console.log('[DEBUG] Truncating content to max length (CONTENT_MAX_LENGTH):', process.env.CONTENT_MAX_LENGTH);
const truncedText = text.substring(0, process.env.CONTENT_MAX_LENGTH);
// Try to break at a word boundary if possible
const lstSpaceIndex = truncedText.lastIndexOf(' ');
return truncedText.substring(0, lstSpaceIndex);
}
console.log('[DEBUG] CONTENT_MAX_LENGTH not defined, going ahead');
Would be nice to have a way to set this CONTENT_MAX_LENGTH in UI, because every time I change the config I have to add it manually... And of course to have the bug fixed ;-)
🔍 Bug Summary
Using multipages PDFs in local LLMs fails, because the content is too big, not truncated
📖 Description
When I use bigger PDFs like 10++ pages I see the analysis failing reproduceable.
Paperless AI runs in an LXC (on ProxMox) from https://community-scripts.github.io/ProxmoxVE/scripts?id=paperless-ai
🔄 Steps to Reproduce
Install, the LXC via the community script.
Link it to Paperless NGX
Use Qwen3-4B-GGUF (served by a Lemonade server)
Import a multi-page PDF to paperless NGX
Trigger Paperless AI to start processing or wait for the cron
Open Paperless AI logs (/opt/paperless-ai/logs/logs.txt)
Wait for the ERROR (for me within a minute) - you will not see an error in the UI only in the logs
✅ Expected Behavior
Run even bigger documents without problems
❌ Actual Behavior
big documents could not be analysed with local LLM at all
🏷️ Paperless-AI Version
3.0.9
📜 Docker Logs
📜 Paperless-ngx Logs
🖼️ Screenshots of your settings page
🖥️ Desktop Environment
Windows
💻 OS Version
Win 11
🌐 Browser
Firefox
🔢 Browser Version
No response
🌐 Mobile Browser
No response
📝 Additional Information
📌 Extra Notes
My approach to fix this: Adding a new variable in /opt/paperless-ai/data/.env
CONTENT_MAX_LENGTH=200
Adapting /opt/paperless-ai/services/serviceUtils.js added the following lines at line 105:
Would be nice to have a way to set this CONTENT_MAX_LENGTH in UI, because every time I change the config I have to add it manually... And of course to have the bug fixed ;-)