Checklist
Bug Description
For example:
OpenAI-compactible adapter has a hradcoded max tokens cap 4096
I don't get why
|
max_new_tokens = min(request_gen_kwargs.get("max_new_tokens", 1024), 4096) |
using chat-completion request hardcap the max tokens to 4096. Many thinking model output length requries far more length than this.
And there is no commenct explain why overwrite user's passed parameter max_new_tokens
Here are some other known issues existed:
- The chat OpenAI adapter receives ctx but ignores it. That silently drops task description/fewshot context for text tasks.
- Sample logs can show input that differs from the actual backend request, which is misleading for debugging.
- --gen_kwargs is accepted globally, but the OpenAI adapter forwards only part of it.
- The hard 4096 cap is inappropriate for OpenAI-compatible local backends unless clearly documented/configurable.
- The regex filter crash is a straightforward robustness bug.
Steps to Reproduce
Error Message / Traceback
Environment
NA
Additional Context
No response
Checklist
Bug Description
For example:
OpenAI-compactible adapter has a hradcoded max tokens cap 4096
I don't get why
lmms-eval/lmms_eval/models/chat/openai.py
Line 180 in d6cc2b5
And there is no commenct explain why overwrite user's passed parameter max_new_tokens
Here are some other known issues existed:
Steps to Reproduce
Error Message / Traceback
Environment
NA
Additional Context
No response