Description
Currently, the vLLM server is started with --dtype=bfloat16 hardcoded in multiple places. We need to implement logic that automatically selects the appropriate dtype based on the model's native precision.
Proposed Logic
If model is FP16:
--dtype=bfloat16
Else:
--dtype=auto
Implementation Locations
The dtype parameter is currently hardcoded in:
-
Ansible playbooks:
automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml (lines 89, 117)
automation/test-execution/ansible/start-vllm-server.yml (lines 58, 84)
-
Workload configurations:
automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml (multiple workload definitions)
Required Changes
- Add model precision detection logic to determine if a model is FP16
- Update vLLM server startup tasks to conditionally set dtype based on model precision
- Update workload configurations to support dynamic dtype selection
- Update any hardcoded
--dtype=bfloat16 references to use the new logic
Documentation
Documentation has been updated to reflect this change:
docs/methodology/testing-phases.md - All three phases now document the conditional dtype logic
models/llm-models/model-matrix.yaml - Common parameters updated
automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml - Comments updated
Benefits
- Improved model compatibility across different precision types
- Better performance for non-FP16 models by using native precision
- Maintains current bfloat16 behavior for FP16 models
Related
Description
Currently, the vLLM server is started with
--dtype=bfloat16hardcoded in multiple places. We need to implement logic that automatically selects the appropriate dtype based on the model's native precision.Proposed Logic
Implementation Locations
The dtype parameter is currently hardcoded in:
Ansible playbooks:
automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml(lines 89, 117)automation/test-execution/ansible/start-vllm-server.yml(lines 58, 84)Workload configurations:
automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml(multiple workload definitions)Required Changes
--dtype=bfloat16references to use the new logicDocumentation
Documentation has been updated to reflect this change:
docs/methodology/testing-phases.md- All three phases now document the conditional dtype logicmodels/llm-models/model-matrix.yaml- Common parameters updatedautomation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml- Comments updatedBenefits
Related