[Bug]: Using local LLM to import larger documents fails

### 🔍 Bug Summary

Using multipages PDFs in local LLMs fails, because the content is too big, not truncated

### 📖 Description

When I use bigger PDFs like 10++ pages I see the analysis failing reproduceable.
Paperless AI runs in an LXC (on ProxMox) from https://community-scripts.github.io/ProxmoxVE/scripts?id=paperless-ai

### 🔄 Steps to Reproduce

Install, the LXC via the community script. 
Link it to Paperless NGX
Use Qwen3-4B-GGUF (served by a Lemonade server)
Import a multi-page PDF to paperless NGX
Trigger Paperless AI to start processing or wait for the cron
Open Paperless AI logs (/opt/paperless-ai/logs/logs.txt)
Wait for the ERROR  (for me within a minute) - you will not see an error in the UI only in the logs

### ✅ Expected Behavior

Run even bigger documents without problems

### ❌ Actual Behavior

big documents could not be analysed with local LLM at all

### 🏷️ Paperless-AI Version

3.0.9

### 📜 Docker Logs

```shell
logs from /opt/paperless-ai/logs/logs.txt:
[2026-01-25T10:55:56.976Z] [INFO] [DEBUG] Found own user ID: 4
[2026-01-25T10:55:57.251Z] [INFO] [DEBUG] Fetched page 1, got 35 tags. [DEBUG] Total so far: 35
[2026-01-25T10:55:57.258Z] [INFO] [DEBUG] Found own user ID: 4
[2026-01-25T10:55:57.261Z] [INFO] [DEBUG] Fetched page 1, got 100 documents. [DEBUG] Total so far: 100
[2026-01-25T10:55:57.590Z] [INFO] [DEBUG] Fetched page 2, got 73 documents. [DEBUG] Total so far: 173
[2026-01-25T10:55:57.691Z] [INFO] [DEBUG] Finished fetching. Found 173 documents.
[2026-01-25T10:55:57.764Z] [INFO] [DEBUG] Document 182 rights for AI User - processed
[2026-01-25T10:55:57.885Z] [INFO] Thumbnail not cached, fetching from Paperless
[2026-01-25T10:55:57.919Z] [INFO] [DEBUG] Using character-based token estimation for model: Qwen3-4B-GGUF
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Token calculation - Prompt: 605, Reserved: 2605, Available: 125395
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Use existing data: yes, Restrictions applied based on useExistingData setting
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] External API data: none
[2026-01-25T10:55:57.920Z] [INFO] [DEBUG] Using character-based truncation for model: Qwen3-4B-GGUF
[2026-01-25T10:56:51.409Z] [INFO] [DEBUG] [25.01.26, 11:55] Custom OpenAI request sent
[2026-01-25T10:56:51.409Z] [INFO] [DEBUG] [25.01.26, 11:55] Total tokens: 4089
[2026-01-25T10:56:51.410Z] [ERROR] Failed to parse JSON response: SyntaxError: Unexpected token '*', "**Warranty"... is not valid JSON
    at JSON.parse (<anonymous>)
    at CustomOpenAIService.analyzeDocument (/opt/paperless-ai/services/customService.js:219:31)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async processDocument (/opt/paperless-ai/routes/setup.js:1603:16)
    at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [ERROR] Failed to analyze document: Error: Invalid JSON response from API
    at CustomOpenAIService.analyzeDocument (/opt/paperless-ai/services/customService.js:226:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async processDocument (/opt/paperless-ai/routes/setup.js:1603:16)
    at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [INFO] Repsonse from AI service: {
  document: { tags: [], correspondent: null },
  metrics: null,
  error: 'Invalid JSON response from API'
}
[2026-01-25T10:56:51.411Z] [ERROR] [ERROR] processing document 182: Error: [ERROR] Document analysis failed: Invalid JSON response from API
    at processDocument (/opt/paperless-ai/routes/setup.js:1607:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async /opt/paperless-ai/routes/setup.js:1527:28
[2026-01-25T10:56:51.411Z] [INFO] [INFO] Task completed
```

### 📜 Paperless-ngx Logs

```shell
not relevant
```

### 🖼️ Screenshots of your settings page

<img width="1570" height="3134" alt="Image" src="https://github.com/user-attachments/assets/6eefecc1-3aa6-40aa-904c-a0a6e96d98bf" />

### 🖥️ Desktop Environment

Windows

### 💻 OS Version

Win 11

### 🌐 Browser

Firefox

### 🔢 Browser Version

_No response_

### 🌐 Mobile Browser

_No response_

### 📝 Additional Information

- [x] I have checked existing issues and this is not a duplicate
- [x] I have tried debugging this issue on my own
- [ ] I can provide a fix and submit a PR
- [ ] I am sure that this problem is affecting everyone, not only me
- [x] I have provided all required information above

### 📌 Extra Notes

My approach to fix this: Adding a new variable in /opt/paperless-ai/data/.env
CONTENT_MAX_LENGTH=200

Adapting /opt/paperless-ai/services/serviceUtils.js added the following lines at line 105:

            if (process.env.CONTENT_MAX_LENGTH) {
               console.log('[DEBUG] Truncating content to max length (CONTENT_MAX_LENGTH):', process.env.CONTENT_MAX_LENGTH);
               const truncedText = text.substring(0, process.env.CONTENT_MAX_LENGTH);

               // Try to break at a word boundary if possible
               const lstSpaceIndex = truncedText.lastIndexOf(' ');
               return truncedText.substring(0, lstSpaceIndex);
            }
            console.log('[DEBUG] CONTENT_MAX_LENGTH not defined, going ahead');


Would be nice to have a way to set this CONTENT_MAX_LENGTH in UI, because every time I change the config I have to add it manually... And of course to have the bug fixed ;-) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Using local LLM to import larger documents fails #836

🔍 Bug Summary

📖 Description

🔄 Steps to Reproduce

✅ Expected Behavior

❌ Actual Behavior

🏷️ Paperless-AI Version

📜 Docker Logs

📜 Paperless-ngx Logs

🖼️ Screenshots of your settings page

🖥️ Desktop Environment

💻 OS Version

🌐 Browser

🔢 Browser Version

🌐 Mobile Browser

📝 Additional Information

📌 Extra Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Using local LLM to import larger documents fails #836

Description

🔍 Bug Summary

📖 Description

🔄 Steps to Reproduce

✅ Expected Behavior

❌ Actual Behavior

🏷️ Paperless-AI Version

📜 Docker Logs

📜 Paperless-ngx Logs

🖼️ Screenshots of your settings page

🖥️ Desktop Environment

💻 OS Version

🌐 Browser

🔢 Browser Version

🌐 Mobile Browser

📝 Additional Information

📌 Extra Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions