Skip to content

feat: Update Qwen3 to Qwen3.5#181

Draft
r-dh wants to merge 2 commits intomainfrom
rd-qwen3.5
Draft

feat: Update Qwen3 to Qwen3.5#181
r-dh wants to merge 2 commits intomainfrom
rd-qwen3.5

Conversation

@r-dh
Copy link
Contributor

@r-dh r-dh commented Mar 6, 2026

Summary

  • Upgrade default local LLM from Qwen 3 (8B/4B) to Qwen 3.5 (9B/4B)
  • Bump llama-cpp-python optional dep from >=0.3.9 to >=0.3.16
  • Fix division by zero in _limit_chunkspans when context budget is exhausted

Blocker

Depends on abetlen/llama-cpp-python#2133, which adds Qwen 3.5 GDN (Gated Delta Network) support to llama-cpp-python. That PR also fixes a prefix-caching bug affecting all hybrid architecture models. Until it is merged and released as >=0.3.16, this PR cannot be merged.

Known issue

test_self_query fails: Qwen 3.5 returns {'topic': ['Physics']} instead of {} for an off-topic query ("What is the price of a Bugatti Chiron?"). The model applies a metadata filter even when the query is unrelated to the dataset. This is a behavioral regression compared to Qwen 3 and should be addressed separately, either by tuning the self-query prompt or by accepting the looser behavior.

Test notes

  • Tests use n_ctx=6144 instead of the default 8192 because the GDN model plus the embedding model together exceed Metal GPU memory at 8192
  • All other non-slow tests pass (31 passed, 1 pre-existing OpenAI API failure unrelated to this change)
  • All slow function-calling tests pass (16 passed, 2 skipped for PostgreSQL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant